This talk will give an overview of some improvements for future versions of Apache Lucene, including major efforts underway in feature branches and work being done during 2012's Google Summer of Code.
This talk will give an overview of some improvements for future versions of Apache Lucene, including major efforts underway in feature branches and work being done during 2012's Google Summer of Code.
Talk will provide a summary of each feature, why it is important or useful, feature's current status, and how you can help contribute/test.
Major themes:
- Intblock Compression
- Likely the new default index format for future 4.x release
- Better compression for structured data (e.g. database content)
- Separate payloads/offsets from the prox stream.
- You don't "pay" for payloads except when you need them.
- Positions Iterators
- Fold Span*Query functionality into basic queries.
- Enable efficient proximity scoring
- Faster, more relevant highlighting (result snippets)
- Docstore improvements
- StoreableField API improvements
- Efficient compressed stored fields
- New possibilities for term vectors
- Pipe dreams (future-future)
- Additional suggester implementations
- Updatable documents in lucene