We’ve been to the Solr Lucene Revolution 2017 in Viva Las Vegas and expections were pretty high to learn about learn to rank (L2R) and to learn from others how they measured search success. In addition to that, I was invited to present our take on “Context Driven Ranking and Faceting”.
In the last few months we’ve invested a lot of effort in finding out how we can measure our search success to do (in the best case) an offline evaluation of new ranking combinations. At the last Berlin Buzzwurds conference Criteo presented their implementation of a new Word2Vec driven metric. To measure and know the exact value of a new feature sounds like a satisfying development process. At Lucene Solr Revolution a nice talk that corroborated our thoughts was the one‚ from REI Co-Op. Chris and Dale experienced that they have had to give up hope in their development process and rely on real metrics.
We’re currently on our way to find out which one is the best metric that, in the best case, correlates with our company goals (it’s not natural that this is given and tested). Some talks mentioned NDCG or Mean Reciprocal Rank – MRR as a search metric. Others like the Modified Damerau-Levenshtein Distance will help you a lot, if you have a perfect result set and you want to measure the difference to another one. You can use this metric in an offline scenario, A/B test or – as mentioned in the talk – as a shadow query metric (send a second async request to your test system and gather there metrics).
In some presentations and between them, we heard that most of the people are using online metrics like the average click position, time to first click or the conversion rate to measure the relevance of their search result. Reddit mentioned in their talk that they use the average click position as a metric to measure the success of the process from their old search engine to Lucidworks Fusion. Salesforce uses an extended MRR and defined the The reciprocal MRR (RMRR) as the inverse of the MRR. This inversion will help managers to easier to understand what the metric means.
We really looking forward to more advanced search metrics and a more metric driven development. Current papers are very promising in simulating user behaviour in other markets to create these online metrics in offline environments.
CONTEXT DRIVEN RANKING & FACETING
It was my great pleasure to give a sneak preview of a feature we built for Solr to adjust ranking and faceting during the document collection phase. I described the steps to reproduce such a category context detection using a custom post filter and collector.
RELEVANCE & LEARN TO RANK
“Learn to Rank” was maybe the most covered topic of the conference. Christine Poerschke gave a great introduction on the L2R plugin Bloomberg contributed to Solr. It (almost) coveres your full L2R lifecycle from feature extraction using Solr queries and the application of a trained model to your query results. The model training has to be done using external tools though. The source code is a great starting point and available on Github.
Some talks covered the application of trained models on search relevance. The Home Depot topped this by applying machine learning to their typeahead. They used multiple layers of models regarding personalization, saisonality and geolocation to optimize the typeahead and search result pages.
But maybe the most important takeaways from the conference were the talks by Simon Hughes (dice.com), Jake Mannix (Lucidworks), Ted Sullivan (Lucidworks) and Doug Turnbull (Open Source Connections), who all pointed out that machine learning has its advantages but you should first fix and improve your relevancy using the data present in your index. They presented knifty tricks how to apply pseudo relevance feedback, do query expansion and how to use facets and tf/idf to your favor. Instead of “learning” things out of your data, you should focus on the user-machine impedance mismatch to satisfy your user’s needs.
For us, those talks were a game changer. We certainly will apply machine learning algorithms to our product database in order to tackle down some self fulfilling prophecies that appear when you start applying user interactions in your search rankings. But the main part of relevance optimization has to be taken care of in your index. At real time.
THE MOMENTS BETWEEN
You know what Las Vegas is famous for: Right, the incredible State and National Parks it is surrounded by! We snug in a day of hiking in the gorgeous Valley Of Fire State Park and a full weekend hiking in breathtaking Zion and Bryce Canyon National Park. We climbed Angels Landing for sunrise and wandered in the Virgin River up the Narrows in Zion National Park – gorgeous!
Lucene Solr Revolution was a great experience, not only as a speaker but also as an attendee. Most of the presentations were think pieces and got us started an intense discussion on how to continue and improve our search experience. We met incredibly talented people and the time in between talks was perhaps the most valuable one. Even during our hikes before and after the conference, discussion topics were mostly search (or heat) related.
If we’d be asked for improvements, we’d suggest to replace the Lightning Talks with a format named “Confessions of a Search Developer”, where people tell their worst story using or extending Solr and Lucene. Honest and unmerciful insights about failures would help everyone to improve their knowledge of Solr. Also an additional social event (e.g. soccer, hiking) the day before the conference would be a nice format to connect to some interesting people.
We’ll update this post with the slides of the talks mentioned above as soon as they come available. Also, all talks of the conference have been recorded!