Solr and autocomplete (part 3)

In the previous parts (part 1, part. 2) of the cycle, we learned how to configure and query Solr to get the autocomplete functionality. In today’s entry I will show you how to add the dictionary to the Suggester, and thus have an impact on the generated suggestions.

Component configuration

To configure the component presented in the previous part of the cycle add the following parameter:

<str name="sourceLocation">dict.txt</str>

Thus our configuration should look like this:

<searchComponent name="suggest" class="solr.SpellCheckComponent">
 <lst name="spellchecker">
  <str name="name">suggest</str>
  <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
  <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
  <str name="field">name_autocomplete</str>
  <str name="sourceLocation">dict.txt</str>

With the parameter we informed the component to use the dictionary named dict.txt which should be placed in the Solr configuration directory.

Handler configuration

The handler configuration also gets one additional parameter which is:

<str name="spellcheck.onlyMorePopular">true</str>

So our configuration should be as follows:

<requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchComponent">
 <lst name="defaults">
  <str name="spellcheck">true</str>
  <str name="spellcheck.dictionary">suggest</str>
  <str name="spellcheck.count">10</str>
  <str name="spellcheck.onlyMorePopular">true</str>
 <arr name="components">

This parameter tell Solr, to return only those suggestions for which the number of results is greater than the number of results for the current query.


We told Solr to use the dictionary, but how should this dictionary look like ? For the purpose of this post I defined the following dictionary:

# sample dict
Hard disk hitachi
Hard disk wd    2.0
Hard disk jjdd    3.0

What is the construction of a dictionary? Each of the phrases (or single words) is located in a separate line. Each line ends with the weight of the phrase (between the weight and the phrase is a TAB character) which is used together with the parameter spellcheck.onlyMorePopular=true (the higher the weight, the higher the suggestion will be). The default weight value is 1.0. A dictionary should be saved in UTF-8 encoding. Lines beginning with # character are skipped.


In this case we don’t need data – we will only use the defined dictionary.

Let’s check how it works

To check how our mechanism behaves I sent the following query to Solr, of course after rebuilding of the Suggester index:


As a result we get the following:

<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
<lst name="spellcheck">
  <lst name="suggestions">
    <lst name="Dys">
      <int name="numFound">3</int>
      <int name="startOffset">0</int>
      <int name="endOffset">3</int>
      <arr name="suggestion">
        <str>Hard disk jjdd</str>
        <str>Hard disk hitachi</str>
        <str>Hard disk wd</str>

A few words at the end

As you can see the suggestions are sorted by on the basis of weight, as expected. It is worth noting that the query was passed with a capital letter, which is also important – the lowercased query will return empty suggestion list.

What can you say about the method – if we have a very good dictionaries generated on the basis of weights such as customer behavior this is the method for you and your customers will love it. I would not recommend it if you don’t have good dictionaries – there is a very high chance that your suggestions will be of poor quality.

What will be next ?

The number of tasks this week didn’t let me finish the performance tests and that’s why, in the next part of the cycle, I’ll try to show you how each method behaves with various index structure and size.

This entry was posted on Monday, November 29th, 2010 at 08:43 and is filed under About Solr, Autocomplete. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 Responses to “Solr and autocomplete (part 3)”

  1. varun Says:

    I was looking at the performance metrics.
    Were you able to work on that ? can you please provide the link

  2. Pablo Viquez Says:


    How do you maintain the dictionary? are you manually adding terms/phrases constantly based on what you think are the popular searches?

    Or also, can you use a schema field as the input for the dictionary?



  3. gr0 Says:

    You usually maintain the dictionary manually from some external application. You can either look at the searches popularity yourself or use one of the commercial tools – i.e. Sematext, the company I work for provide such tool.

    As for the schema field – there is no way currently, but it would also be possible to do. For example, you could build an index with popular queries and use DirectSolrSpellChecker ( in that index and that should work without problems.