The storage capacities of laptop and desktop computers has been growing rapidly, but the growth may not be fast enough. According to IBM, we create 2.5 quintillion bytes of data every day. Perhaps quintillions of bytes are not meaningful to most of us, but it is the growth rate that is staggering — 90% of all the data in the world has been created in the last two years. Where does all the data come from? Data comes from everywhere: from sensors used to gather climate information, physiological readings taken 1,000 times per second from a patient, posts to social media sites, digital pictures and videos posted online, transaction records of online purchases, and cell phone GPS coordinates to name just a few. Collectively, the phenomenon is called “big data”. (See IBM Big data and information integration for smarter computing).
Note: Data is plural. The singular term is datum. Should we say data is or data are? There are many views on which is right. IBM describes big data as spanning three dimensions: Variety, Velocity and Volume. Variety refers to the fact that big data extends beyond structured data like we might find in a spread sheet. It includes unstructured data such as text documents, email, audio and video recordings, click streams from the web, log files that record financial and business transactions, and much more. Velocity of data refers to the fact that data can be time-sensitive such as bid and ask data in a financial market or physiological data that affect the lives of patients. In these cases, historical data is interesting but real-time data is critical. The third parameter is volume. IBM says that big data comes in one size: large. Organizations are flooded with data — terabytes, petabytes, or even yottabytes. Big data is a challenge in various technical ways, but more importantly, it is an opportunity to find insight in new and emerging types of data and to answer questions that, in the past, were not possible to analyze effectively. Data that has been hidden can be surfaced and acted upon. The result can be a more agile organization or in the case of health care, better outcomes for patients. Picture a hospital neonatal environment where a plethora of medical monitors connected to babies are used to alert hospital staff to potential health problems before patients develop clinical signs of infection or other issues. There are breakthroughs on the horizon for how this will be done. Today the instrumentation generates huge amounts of information — up to 1,000 readings per second — which is summarized into one reading every 30 to 60 minutes. The information is stored for up to 72 hours and is then discarded. If the stream of data could be captured, stored and analyzed in real-time there could be a huge opportunity to improve the quality of care for special-care babies.