The better your Hadoop-based processing, the faster people want it. This is a case study of complementing MapReduce with stream-based processing of complex healthcare data. We start with raw input and end with rich, indexed content served in Solr. Along the way we look at how we use Hadoop, HBase, Crunch, and Twitter's Storm project to help make big data fast.
Healthcare data is often fragmented across institutions or in formats not easily explored. Here we look at a system that securely brings together related pieces of health information and processes it into a variety of data models useful to clinicians. This talk will include:
This talk draws from the Hadoop, HBase, and Healthcare talk given at the 2012 Hadoop World, but goes deeper into the technologies and techniques used. A basic understanding of Hadoop and MapReduce is assumed. Working knowledge of HBase and Solr may be helpful but is not required.