Simpplr Search

Overview

Simpplr Search (otherwise known as global search) is one of the most powerful features in Simpplr. It's prominently displayed at the top left of every page because data shows that when it comes to intranets, users want to get in, complete their task, and get out. For these reasons we put Search up front and center, and we’ve invested a lot of resources into making our search best of class. Search is powerful for a number of reasons: It’s smart, federated, and curated. Search results are also faceted. More on this below.

 

Search features

Smart

Simpplr Search is smart. That means it's powered by artificial intelligence (AI) and adaptive machine learning (ML). It takes into account your profile data on Simpplr such as geographic location, department, and more to help serve personalized search results. So the results from one user's search may not be the same results as another user.

The adaptive machine learning also ensures your search results will get better over time the more you use it. The search algorithms continue to learn as you use Simpplr. 

Simpplr search will also show you recent searches you've made each time you go to search. This makes it easy to find recently used information.

Finally, our auto-suggested results feature will suggest results based on the initial characters you type into the search box. This saves time and allows users to be more efficient.

 

Federated

Simpplr Search is federated, meaning in addition to searching content from the entire intranet, it also searches any integrations plugged into Simpplr, such as file repositories like Dropbox, Google MyDrive, or SharePoint. Again, this allows your team members to find any content from one centralized location.

 

How does Simpplr search work?

There are various ways to weigh the search. The main types we use are relevancy, recency, popularity and personalization. We’ll discuss what these mean below.

These can be cumulatively added or work independently. When the different methods are combined together, the overall outcome is often quite complicated to predict.

Search Type Weighting being used
Content Relevancy > Recency
People Relevancy > Recency 
Sites Relevancy > Recency 
Files Relevancy > Recency

The fields/values below have a differential weight in Search, making them appear higher or lower in the search results if there is a match with the search keyword.

  • Page title 
  • Body HTML 
  • Site name
  • Site category 
  • Page category name 
  • Topic names 
  • Question titles 
  • Expertise 

Once the search calculates relevancy scores, the app further tweaks the relevancy using the following 'boosts': 

    • People in the same department - boosted 1.1x 

    • Content publish date before 18 months - boosted 0.1x

    • Page publish date in last 12 months - boosted 2 .5x

    • User location is the title or body of content published in the last 12 months - boosted 2x 

    • Topics matching for content published in the last 12 months - boosted 2x 

    • Page category for content published in the last 12 months - boosted 2x

    • Event name (last 3 months or next 3 months) - boosted 1.1x

    • Album (edited in the last 12 months) - boosted 1.1x

Note:

Currently Simpplr Search does not support Boolean operators.

 

Relevancy 

Relevancy is the starting point for all searches. This is the process of matching the query to results.

Some analyzers are processed while indexing the documents and while making the query:

  • Lowercase 
    • Words made into lowercase to increase chance of matches
  • Remove special characters 
    • E.g. ‘Wi-fi’ matches to ‘wifi’ and ‘wi fi’
  • Stemming 
    • E.g. ‘runs’ matches to ‘runs’, ‘running’, ‘runners’
  • 'Stop' words removed all comm to retrieve more accurate results:
    • a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with 

Some analyzers just happen while querying:

  • TFIDF (Term Frequency Inverse Document Frequency) 
    • If a word is repeated a lot in documents it is given much less weight, this helps key words have more prominence in the results
  • Mismatched spellings
    • Fuzzy matching - covers mismatching and matches data even when there are multiple differences
    • The longer the query, the greater the threshold for mistakes
    • Can be any letters that are incorrectly added 
    • We have gone with the standard fuzzy matching rules of Elastic-search 

If there is only one result from the search, then relevancy is all that's needed. When there are multiple results, we will consider other factors.

As of the 23.06 release, we have improved relevancy of Search results.

  • Relevancy ranking is more accurate with a greater likelihood of finding what you need without navigating multiple pages.
  • User location is factored into relevancy ranking for finding content with the user’s location in the title.
  • When searching for certain users, people from the same department as you are ranked higher in relevancy.

Prefix matching

Prefix matching is used to complete a search term by predicting the ending based on the prefix you’ve typed. 

Although this is a powerful feature in search it increases the index size vastly:

  • E.g. To use prefix matching for the term 'Adam', the index will need to store A, Ad, Ada and Adam. 

If prefix matching was used on the whole index, it would slow the search function down vastly, and potentially return very confusing results. Because of this limitation, prefix matching is used sparingly.

The autocomplete function uses prefix matching for all titles: 

  • Site names 
  • People names 
  • Content titles 

The global search only uses prefix matching on:

  • People names

We added prefixing on people names in the global search because without it, the results seemed odd at times:

  • E.g. Typing ‘Jo’ into autocomplete would return the results, ‘Joe’, ‘Jonathan’, ‘Jovita’. But if you then did a global search using ‘Jo’ there would be no results.
  • Now that we have prefix matching in the global search this isn’t a problem. 
  • However for the sake of index size and search speed we made the decision not to include it for site names and content titles in the global search.

Recency

Having spent some time experimenting with different combinations of search functions, Simpplr decided that within the confines of an intranet, for the vast majority of content, the most important factor for weighting results (apart from relevancy) is recency.

  • Examples:
    • If you search for a ‘Company Update’ there may be hundreds of results in the search ,but the one that you most likely to want to view is the most recent
    • If you search for ‘Benefits’, you want to know it’s the most recent version of the Benefits policy that appears at the top of the results

Although there may be some occasions where the recency of a piece of content is not as important as its popularity, we feel this will be an exception. 

The result of this is that when users are searching, the results should be in chronological order.

  • There may be some discrepancies due to one piece of content being more relevant to the initial term than others, so it ends up with a higher weighting. 
    • E.g. A piece of content contains the initial search term multiple times in the title and summary 

Numeric matching

Search can find relevant content based on numeric strings you input. For example, if you search 8765, the top results will include content with numeric values containing that string of digits. Often, this can be a policy number you only remember the first few digits of. Search results also include characters found in files uploaded to the intranet, such as PDFs.

 

‘Did you mean’ suggestions

Suggestions for similar content based on your search query will be made when no results are found. Simpplr uses phrase suggestion features of elastic search to display more contextual and relevant suggestions.

What's included in Global Search and Auto-complete suggestions?

Global search and auto-complete suggestions includes results from the following:

  • Links tile
  • Rich text tile
  • HTML tile
  • Site information tile
  • Apps tab

Coveo Search integration

Simpplr’s search is integrated with Coveo. Coveo searches for content not just against Simpplr data, but across multiple sources integrated with Coveo. For more information on our Coveo integration, check out this article.

Searching for files

You can search for files using the search bar in the top left of the main navigation bar. Search results will include files contained in pages, files uploaded to a Simpplr site, and files linked to a Simpplr site from file storage. 

When searching for internally stored files (i.e., files uploaded and stored directly in Simpplr), Simpplr Search looks at the file name and text within the file to determine keyword matching. All text within the file is searched, regardless of length of file text/content. If there is a match in the file name, that will weigh higher than a match in the body of file content. This is applicable to all file types.

File search results appear in a separate tab from the main list of content-related search results. The following files types are not searched: "jpg","gif","png","jpeg","JPG","GIF","PNG","JPEG". 

For third party-connected file storage integration apps, Simpplr will send the vendor the keyword/phrase you're searching, and the vendor provides the results in whatever order they determine. We don't know the logic they're using (i.e., relevancy/weight, etc.). The only exception to this is for the Confluence integration. With Confluence, wherever there's a match in the title of the Confluence doc, it sorts first in the top 10 results.

Best practices

Keep Search clean and functional by being specific with your topics. General practice is not to exceed adding six topics to any one piece of content. While App managers can always go in and manage topics to keep them from getting overcrowded, this job can be made easier by spreading awareness to your users to limit their topic additions.

The reason we recommend this is because eventually, if you have 100+ topics throughout your intranet, your Simpplr Search can get muddy and not work as efficiently due to searching too many topics for similar content. 

Was this article helpful?
2 out of 3 found this helpful
Have more questions? Submit a request

Comments

4 comments
  • Hi KB Team!
    Could you please advise if there are any special characters that allow for different ways of searching using the global search?
    i.e. using quotations to define a results list of exact matches for the "keyword"
    or % to search for part of a word to see what results it will pull (i.e. %doc - to see if the page you're looking for had a topic document or documents, etc...)
    those are the two that I know of, but since the Hotaka/Ida release, I'm not sure if these search features have been abandoned or if there are new ones, etc...

    0
    Comment actions Permalink
  • Hi Aileen. We currently do not support these functionalities. However I've filed an Idea with our Engineering team to take this into consideration.

    1
    Comment actions Permalink
  • Hey folks, In the section, " How does Simpplr search work?" where there's a list of factors that boost search, is "publish date" referencing initial publish date or does it include published edits?

    Ex. "Content publish date before 18 months - boosted 0.1x" - Is only content that was originally published <18 months ago boosted, or is content that was edited within 18 months also boosted?

    Thank you!

     

    0
    Comment actions Permalink
  • Hi Maeve. For now, only the initial publish date is considered in the algorithm. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Articles in this section

See more