If your org is on Cerro, Smart Search is automatically enabled.
These are some of the benefits of implementing Smart search:
- Stemming- words that stem from the search term
- More cohesive UX - results in Autocomplete will more closely match the actual search results
- Auto correction- if what you search for has no results but something similar does it will auto-correct your search term
- Suggestion- for typos it will suggest ‘Did you mean’ in results
- Results will be returned faster
How does Smart search work?
There are various ways to weight the search, the main types we are using are relevancy, recency, popularity and personalization. We’ll discuss what these mean further below.
These can be cumulatively added or work independently. When the different methods are combined together the overall outcome is often quite complicated to predict.
|Search Type||Weighting being used|
|Content||Relevancy > Recency|
|People||Relevancy > Recency|
|Sites||Relevancy > Recency|
|Files||Relevancy > Recency|
Relevancy is the starting point for all searches, this is the process of matching the query to results.
Some analysers are processed while indexing the documents and while making the query:-
- Words made into lowercase to increase chance of matches
- Remove special characters
- E.g. ‘Wi-fi’ matches to ‘wifi’ and ‘wi fi’
- E.g. ‘runs’ matches to ‘runs’, ‘running’, ‘runners’
- Stop words removed
- ‘a’, ‘to’, ‘be’, etc… all common words are removed to retrieve more accurate results
Some analyzers just happen while querying: -
- TFIDF (Term Frequency Inverse Document Frequency)
- If a word is repeated a lot in documents it is given much less weight, this helps key words have more prominence in the results.
- Mismatched spellings
- Fuzzy matching - covers mismatching and matches data even when there are multiple differences
- The longer the query, the greater the threshold for mistakes
- Can be any letters that are incorrectly added
- We have gone with the standard fuzzy matching rules of Elastic-search
If there is only one result from the search, then Relevancy is all that is needed, when there are multiple results then we will consider other factors.
Prefix matching is used to complete a search term by predicting the ending based on the prefix you’ve typed.
Although this is a powerful feature in search it increase the index size vastly:
- E.g. To use Prefix matching for the term Adam the index will need to store A, Ad, Ada, Adam.
If Prefix matching was used on the whole index it would slow the search function down vastly, and potentially return very confusing results. Because of this limitation Prefix matching is used sparingly.
The Autocomplete function uses Prefix matching for all titles:
- Site names
- People names
- Content titles
The global search only uses Prefix matching on:
- People names
We added Prefixing on People names in the global search because without it the results seemed odd at times:
- E.g. Typing ‘Jo’ into Autocomplete would return the results, ‘Joe’, ‘Jonathan’, ‘Jovita’. But if you then did a global search using ‘Jo’ there would be no results.
- Now that we have Prefix matching in the global search this isn’t a problem.
- However for the sake of index size and search speed we made the decision not to include it for Site names and Content titles in the global search.
Having spent some time experimenting with different combinations of search functions, we have come to the decision that within the confines of an Intranet, for the vast majority of Content, the most important factor for weighting results (apart from Relevancy) is Recency.
- If I search for a ‘Company Update’ there may be hundreds of results in the search but the one that I am most likely to want to view is the most recent
- If I search for ‘Benefits’, I want to know that it’s the most recent version of the Benefits policy that I’m getting at the top of the results.
Although there may be some occasions where the Recency of a piece of Content is not as important as it’s Popularity we feel this will be an exception.
The result of this is that when making a search that returns Content, the results should be in chronological order.
- There may be some discrepancies due to one piece of Content being more relevant to the initial term than others, so it ends up with a higher weighting.
- E.g. A piece of Content contains the initial search term multiple times in the title and summary
How do you enable Smart Search?
Go to Manage App > Integrations > Search. Select Smart Search. Click Save. Click Connect. A modal will appear. Click Allow.