Describing important query processing techniques actual for eCommerce sites.
eShop visitors are intended to input compound terms like brands, or product families without phrase demarcation. Common practice is use DisMaxQParser and tokenize during indexing, but this leads to false positive matches i.e. on “calvin klein jeans dress” you should not show other brands jeans or dresses even by Anne Klein. I’d like to present query parsing technique and/or special type of query which provides outstanding precision. At Lucene Eurocon’11 Stump the Chump session it occurs that it’s demanded by eCommerce for many years but community has lack of vision of this solution. In this session I want also present, the following two pearls:
Basing on the technique above we can recognize what particular entities which shopper has in her mind eg. from “calvin klein jeans dress” we can conclude that dress is demanded - not jeans, and therefore we can propose some related offers, or provide proper ranking. Technically it’s somehow related to LUCENE-1999.
Usually when you tuning relevance by query parsing you almost always have to trade precision for recall and vice versa. We’ve found a trivial way how to get both of them excellent fast.