How to index and search for things like PDF documents, Excel spreadsheets or Keynote presentations? The Apache Tika toolkit allows you to easily extract the text content from these and dozens of other document formats. This talk shows how to use Tika to feed your Lucene or SOLR -based full text search index.
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. To show how the toolkit can be used with a Lucene or Solr search index, this talk covers
This talk assumes basic knowledge of Lucene or Solr and of Java programming.