ApacheCon NA 2010 Session
Apache Mahout - making data analysis easy.
The amount of digital data available to businesses has been exploding in recent years: User interaction data is stored for online shops and used to improve search results. Shopping cart contents are archived to learn more on what users tend to buy in one session. User generated content from blogs and micro-blogs can be stored analysed. With such amounts of data at their fingertips software developers are more than ever in need for a scalable, easy to use framework for extracting knowledge from the data. Apache Mahout offers scalable implementations of algorithms for data mining and machine learning. Scalable here means "scalable community" helping new users with their problem settings while still actively driving project development. Scalable also means a commercially friendly license to facilitate implementation of various business models. Of course scalable also means scalable in terms of amount of data to process: Apache Mahout is easy to start with but scales to increasing data volumn due to its use of Apache Hadoop. After motivating the need for machine learning the talk gives an overview of Apache Mahout. It shows the tremendous improvements that have been implemented in recent past - including the addition of several algorithms, performance improvements. Last but not least Apache Mahout graduated to a top level project this year.