So let's run through this process in a practice play, see how it feels. If I have these four documents with these soccer teams in it and their unique underscore ID field, if I were to look through those documents for Manchester United, it would lowercase everything, remove all the punctuation, I'm left with two different tokens, Manchester and United, using the standard analyzer in Lucene. Those are my two tokens.
So when I look through these documents for those things, my tokens or my terms will map two documents, one in two for Manchester and one in three for United. So my inverted index will hold my tokens or my terms, what documents and other helpful metadata, frequency, position, etc. Now having the right tokens or the right terms can make or break a good search experience for you. So it's important to use the right analyzer to get the right terms.
Now I'm going to show you what I mean in this example. So this is an app that I wrote called Atlas Search Soccer. It uses Atlas Search. In it, I used the FIFA Player Database so you can find lots of different search options to find your FIFA dream team and you can put your own players on there. It'll also show you the code on the queries on how to do that. Now I know it's called football everywhere else in the entire world except the United States, but I already bought the domain name so we're just going to stick with that. So in this one I am looking for players from Manchester United. I'm using the standard analyzer. As you remember, it's Manchester United. So I'm going to get 697 players when I look for Manchester United because it's giving me Manchester United and West Ham United and Manchester City and anything else with either Manchester United. If, however, I change to the keyword analyzer as I'm doing here, I find 33 matching players and they are all indeed from Manchester United because when I pass, this is using the keyword analyzer which takes everything, it keeps the punctuation, it keeps the capital letters, all the casing, and it gives me that one token. So keyword analyzers are fantastic if you're using check boxes.
So your tokens matter which means your analyzers matter. The next thing you need to consider are your query operators, whether it's regex, phrase, text, whether using facets or if you're using auto-complete. This is a way to let your users take their best shot. Every user is different. Every user has a different preference of how they're going to go through things so you want to, in your application, give them as many options as possible. And of course I can't talk about giving your users those best shot at finding your data without talking about scoring. Relevant scoring is so important in search. All search engines are going to grade all your documents based on how well they match the search query and that is called relevance. And it is going to return your documents to you with the score in descending order. So it's going to give you what it thinks the best matches are, the most relevant matches first.
In this example, for instance, I'm looking for Cristiano Ronaldo for my FIFA dream team and I just look for Ronaldo and I get this lovely gentleman first but that's not the Ronaldo I want because it's looking for relevance first. I want the overall FIFA score to be very high in that. So in this one, in the query, I'm going to take the overall FIFA score which is the field in every one of my documents. I'm going to factor it in to my relevance score and now I get Cristiano Ronaldo first. He's very difficult personality-wise but he's very good and I want him on my dream team. So with that, you want scoring matters, custom scoring because you want everything right, think about your data, think about your user interface, think about your tokens and I'm not clicking through, I had such a nice interactive, oh there it goes. Think about your tokens, goes in, pick your analyzer accordingly that goes in your index inside of your queries and all of that will be served up to your users so they have their best shot at finding your data before they find it at your competitors.
Comments