Big Data

January 23, 2012

The storage capacities of laptop and desktop computers has been growing rapidly, but the growth may not be fast enough. According to IBM, we create 2.5 quintillion bytes of data every day. Perhaps quintillions of bytes are not meaningful to most of us, but it is the growth rate that is staggering — 90% of all the data in the world has been created in the last two years. Where does all the data come from? Data comes from everywhere: from sensors used to gather climate information, physiological readings taken 1,000 times per second from a patient, posts to social media sites, digital pictures and videos posted online, transaction records of online purchases, and cell phone GPS coordinates to name just a few. Collectively, the phenomenon is called “big data”. (See IBM Big data and information integration for smarter computing).

Note: Data is plural. The singular term is datum. Should we say data is or data are? There are many views on which is right.
IBM describes big data as spanning three dimensions: Variety, Velocity and Volume. Variety refers to the fact that big data extends beyond structured data like we might find in a spread sheet. It includes unstructured data such as text documents, email, audio and video recordings, click streams from the web, log files that record financial and business transactions, and much more. Velocity of data refers to the fact that data can be time-sensitive such as bid and ask data in a financial market or physiological data that affect the lives of patients. In these cases, historical data is interesting but real-time data is critical. The third parameter is volume. IBM says that big data comes in one size: large. Organizations are flooded with data — terabytes, petabytes, or even yottabytes.
Big data is a challenge in various technical ways, but more importantly, it is an opportunity to find insight in new and emerging types of data and to answer questions that, in the past, were not possible to analyze effectively. Data that has been hidden can be surfaced and acted upon. The result can be a more agile organization or in the case of health care, better outcomes for patients. Picture a hospital neonatal environment where a plethora of medical monitors connected to babies are used to alert hospital staff to potential health problems before patients develop clinical signs of infection or other issues. There are breakthroughs on the horizon for how this will be done. Today the instrumentation generates huge amounts of information — up to 1,000 readings per second — which is summarized into one reading every 30 to 60 minutes. The information is stored for up to 72 hours and is then discarded. If the stream of data could be captured, stored and analyzed in real-time there could be a huge opportunity to improve the quality of care for special-care babies.

The Hospital for Sick Children in Ontario, Canada developed such a vision and is acted on it. Dr. Carolyn McGregor, Canada research chair in health informatics at the University of Ontario Institute of Technology visited researchers at the IBM T. J. Watson Research Center who are working on a new stream-computing platform to support healthcare analytics. A three-way collaboration was established, with each group bringing a unique perspective — the hospital focus on patient care, the university’s ideas for using the data stream, and IBM providing the advanced analysis software and information technology expertise needed to turn the vision into reality. The result of the collaboration was Project Artemis which pairs IBM scientists with clinicians and`researchers to explore how emerging technologies can solve real-world business problems, in this case developing a highly flexible platform that aims to help physicians make better, faster decisions regarding patient care for a wide range of conditions. At the Children’s hospital the focus is real-time detection of the onset of nosocomial infection (often called hospital-acquired infection). Regulatory, ethical, privacy, and safety issues were addressed and then two infant beds were instrumented and connected to the system for data collection. The team then created an algorithm that deciphered the streaming data. By establishing the impact of moving a baby or changing its diaper, those things can be filtered out to help spot the telltale signs of nosocomial infection.
Dr. Andrew James, staff neonatologist, at the Hospital for Sick Children is optimistic that as they learn more they will be able to account for variations in individual patients and eventually be able to integrate data inputs such as lab results and observational notes. In the future, any condition that can be detected through subtle changes in the underlying data streams can be the target of the system’s early-warning capabilities. It is likely that sensors attached to or even implanted in the body will allow monitoring of important conditions from home or anywhere. Big data has the potential to improve the health of patients whever they may be.

Other healthcare-related stories on patrickWeb

Related

Quick Links

Support

Subscribe to John’s Weekly Blog Posts