Inside your files are pieces of metadata and other information that normally remain hidden, but can be extracted with the right tools. This talk shows how to use Apache Tika to detect and extract such bits and how to use such information to make your applications more perceptive.
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. In this talk you'll learn to:
This talk assumes basic understanding of common file formats and the Internet media type system. Knowledge of Java programming is assumed for some examples, but not required for the overall presentation.