Apologies to designers and UX researchers, to whom this is probably pretty boring. Like most content on this blog, this post is aimed at data scientists.
The following question caught a couple of my colleagues off guard: If you type “lisb” into Google, how are the predicted searches rendered?
Option 1:
lisbon
lisbon portugal
lisbon weather
lisbon by night
lisbon to porto
Option 2:
lisbon
lisbon portugal
lisbon weather
lisbon by night
lisbon to porto
Option 3:
lisbon
lisbon portugal
lisbon weather
lisbon by night
lisbon to porto
Option 4:
lisbon
lisbon portugal
lisbon weather
lisbon by night
lisbon to porto
Try it out. If you were surprised by the answer, you’ll be even more surprised to learn it’s nearly ubiquitous. Amazon, Facebook, iTunes and YouTube all follow the same convention. There’s an aphorism that the best design goes unnoticed, and I think this is a nice example.
Why is this pattern effective? User DaveAlger on UX Stackexchange puts it concisely:
Highlighting what I typed over and over again doesn’t really add any value while highlighting the differences helps a user quickly find what they are looking for.
(Taken from What to highlight in autosuggest options?)
It’s interesting to consider some special cases, and the options for handling them. For example, suppose one word is a fuzzy match. Here Google’s behaviour is to highlight the entire word. If you enter “best restaurant lisbom”, the top predicted search is:
best restaurant lisbon
Or consider the case in which our query matches the middle of the suggestion, rather than the beginning. Google appears to treat words individually: highlighting those that are missing from our query, and highlighting the completion of those that appear partially. Suggestions for the query “lisbon venu” include:
lisbon music venues
Given a user input string and a suggestion, here’s a first stab at how we might implement this behaviour:
- Loop through the words in the suggestion
- For each word, Find the longest exact prefix in the input query
- Highlight the remainder of the word
- If there is no exact prefix, highlight the entire word
This works on the cases we’ve seen so far, but it turns out this isn’t exactly what Google is doing. Suppose you type in “lisbon to lisb”, and the top suggestion is “lisbon to lisbon airport”. Our algorithm will highlight it as:
lisbon to lisbon airport
By contrast, Google produces:
lisbon to lisbon airport
We might conclude that this means we are only allowed to use a given prefix once, and then we have to remove it from our list. We would be wrong, however, as shown by the example “distance lisbon to lisb”, which Google completes to:
distance from lisbon airport to lisbon city center
(In the suggested query, neither instance of “lisbon” contains highlights, even though “lisbon” only occurs once in the user input.)
If you want to know exactly what Google is doing you probably have to convince them to hire you, but here’s my guess.
- If the input is an exact prefix of the suggestion, then highlight the completion
- Otherwise, back off to a word-level highlighting algorithm
The word-level algorithm is probably similar to the one I’ve described, at least in terms of behaviour. (The actual implementation likely happens at a lower level, as the suggestions are being searched over.)