ApacheCon NA 2013

Portland, Oregon

February 26th – 28th, 2013

Register Now!

Tuesday 4:15 p.m.–5:15 p.m.

Getting Hadoop, Hive and HBase up and running in less than 15 minutes

Mark Grover

Track:
Overture and Beginners
Audience level:
Beginner

Description

If you have ever wanted to dabble with Apache Hadoop, Hive, HBase or other projects in the Hadoop ecosystem but have been discouraged by the painful process of installation and configuration of these projects, this talk is for you.

We will learn how to install Hadoop, Hive and HBase on a cluster by making use of various packages from Apache Bigtop.

Abstract

Abstract This talk introduces the audience to Apache Bigtop – a project aimed at developing packaging and tests within the Hadoop ecosystem. By making use of various packages available through Apache Bigtop, we would learn how to set up a cluster with Hadoop, Hive and HBase installed and configured in under 15 minutes. Subsequently, we will run some example MapReduce jobs, Hive and HBase queries to validate the setup.

Apache Bigtop is a project aimed at development of packaging and tests within the Hadoop ecosystem. Bigtop packages various Big Data related open source projects like Hadoop, Hive, Hbase, etc. and makes them available as architecture specific deb/rpm packages. These packages can then be easily installed using available Operating System installers like apt-get, zypper or yum. One of the longer term goals of Apache Bigtop is to serve as a reference to eventually get Hadoop introduced into most Linux distributions.

Time permitting, we will introduce another important function of Apache Bigtop – interoperability testing. Given the dependencies between various projects and their sheer number of versions, it is a daunting task test the interoperability of various components. One of goals of Bigtop is to address this problem and we will learn how Bigtop does so.

The talk will end with a short Q/A session.

Biography Mark Grover is a involved with Apache Bigtop and a contributor to the Apache Hive project. He is also a section author of O'Reilly's book - Programming Hive. He is presently a software developer at Cloudera Inc. in Palo Alto, CA.