ApacheCon NA 2013

Wednesday 1:45 p.m.–2:45 p.m.

Near-Realtime Processing over HBase

Ryan Brush

Track:: Big Data
Audience level:: Intermediate

Description

The better your Hadoop-based processing, the faster people want it. This is a case study of complementing MapReduce with stream-based processing of complex healthcare data. We start with raw input and end with rich, indexed content served in Solr. Along the way we look at how we use Hadoop, HBase, Crunch, and Twitter's Storm project to help make big data fast.

Abstract

Healthcare data is often fragmented across institutions or in formats not easily explored. Here we look at a system that securely brings together related pieces of health information and processes it into a variety of data models useful to clinicians. This talk will include:

Low-latency data ingestion from multiple sources into HBase
A reliable, scalable change notification system over HBase
Processing those incremental changes using the Storm project
Bulk processing data using an incubator build of Apache Crunch
Building Solr indexes in MapReduce and incrementally updating them
Serving resulting data to clinical applications out of HBase and Solr

This talk draws from the Hadoop, HBase, and Healthcare talk given at the 2012 Hadoop World, but goes deeper into the technologies and techniques used. A basic understanding of Hadoop and MapReduce is assumed. Working knowledge of HBase and Solr may be helpful but is not required.

Portland, Oregon

February 26th – 28th, 2013

Wednesday 1:45 p.m.–2:45 p.m.

Near-Realtime Processing over HBase

Ryan Brush

Description

Abstract

Wednesday 1:45 p.m.–2:45 p.m.

Near-Realtime Processing over HBase

Ryan Brush

Description

Abstract

Sponsors

Principal

Platinum

Gold

Silver

Silver

Conference Wear

Lanyard

Evening Events

Evening Events

Evening Events

Exhibitor - Double

Exhibitor - Single

Exhibitor - Single

Exhibitor - Single

Supporting

Community

Media

Media

Media

Media

Media

Media

Media

Media

Media

Media

Production