ApacheCon NA 2010 Session

Open Source Enterprise Search & Retrieval Platform

This technical talk describes the usage of Apache based open source software (Apache Lucene/SOLR, Apache Nutch, Apache Tika and Apache ServiceMix) within the implementation of an enterprise search and retrieval platform. The platform is the result of years of experience with enterprise search technologies combined with enterprise application integration and semantic (web) technologies, both within commercial and open source based environments. The talk will dive into the conceptual architecture of a typical search solutions based upon a real world use case, and will then present the accompanying framework that makes easy and swift implementations of enterprise search solutions possible, based upon this architecture. The architecture describes an innovative enterprise search solution, specifying all necessary components for collecting and indexing content (known in the architecture as the collection process, which consist of inbound, splitting, validating, filtering, enriching and indexing components) and publishing the content (known in the architecture as the publication process, which consists of inbound, validating, request enriching, searching, grouping, response enriching and presentation components). The framework can be seen as an orchestrator framework and contains all tools, components and default configurations and flow descriptions necessary to build enterprise solutions according to this architecture. The framework is entirely based upon open source technologies and are mainly Apache based