Solr and autocomplete (part 1)

Almost everyone has seen how the autocomplete feature looks like. No wonder, then, Solr provides mechanisms by which we can build such functionality. In today’s entry I will show you how you can add autocomplete mechanism using faceting.

Indeks

Suppose you want to show some hints to the user in the on-line store, for example you want to show products name. Suppose that our index is composed of the following fields:

A text type is defined as follows:

Configuration

To start, consider what you want to achieve – do we want to suggest only individual words that make up a name, or maybe full names that begin with the letters specified by the user. Depending on our choices we have to prepare the appropriate field on which we will build hints.

Prompting individual words that make up the name

In the case of single words, we should use a field that is tokenized. In our case, the field named name will be sufficient. However, note that if you want to use for example stemming you should define another type, which do not use stemming because of how this analysis operates on the contents of the field.

Prompting full name

For full names of the products suggestions we need a different field configuration – the best for this will be a untokenized field. But we can not use string based field for that. For this purpose, we define the field as follows:

This type is defined as follows:

To not modify the format of the data we also add the appropriate definition of the copy information:

How do we use it ?

To use the data we prepared we use a fairly simple query:

Where:

  • FIELD – field on the basis of which we intend to make suggestions. In our case the field named name_auto.
  • USER_QUERY – letters entered by the user.

It is worth noting rows=0 parameter is added here to only show the faceting result without the query results. Of course, this is not a necessity.

An example query would look like that:

The result of this query might look like this::

Additional features

It is worth to mention the additional opportunities which are inherent to this method.

The first possibility is to show the user additional information such as number of results that you get when you select an appropriate hint. If you want to show such information it will certainly be an interesting option.

The next thing is sorting with the use of facet.sort parameter. Depending on your needs, we can sort the results by the number of documents (the default behavior, parameter set to true) or alphabetically (value set to false).

We may limit the suggestions to those which have more results than a specified number. To take advantage of this opportunity pass in a parameter facet.mincount with the appropriate number.

And as for me the biggest advantage of this method is the possibility of getting only those suggestions that not only match the letters that the user typed but also some other parameters, like category for example. For example, we want to show hints for the user who is in the household section of our store. We suspect that at this moment the user will not be interested in DVD-type products, and therefore we add a parameter fq=department:homeAppliances (assuming that we have such a department). After such a modified query, you do not get hints generated from the entire index, we only get those narrowed to the selected department.

A few words at the end

As other method, this one too, have its advantages and disadvantages. The advantage of this solution is its ease of use, no additional components requirement, and that the result hints can be easily narrowed to be generated only from those documents that match the query entered by the user. As a big plus is that the method includes number of result that will be shown after selecting the hint (of course with the same search parameters). For the downside is definitely need to have additional types and fields, quite limited abilities and the load caused by the use of faceting mechanism.

The next entry about the autocomplete will try to expand on and show a further methods of generating hints using Solr.

18 thoughts on “Solr and autocomplete (part 1)

  • 24 October 2010 at 01:38
    Permalink

    While I did faceting I never played with autocomplete in Solr. Definitely using it in the next project.

    (Bookmarked)

    Reply
    • 27 October 2010 at 19:15
      Permalink

      Hheheheh first “spam” that matches the topic 😉 But seriously – if you are reading this and you need a good auto complete solution you should look at Sematext AutoComplete. On the other hand we are using some additions to the standard Solr distribution (which we made) like stemming components or “did you mean” functionalities 😉

      Reply
  • 7 April 2011 at 00:47
    Permalink

    Thanks for this write-up. I’ve been trying to reproduce your results, but when I use the q=*.* in my search, I get a match for everything. Would you mind posting your schema.xml file as well as the process to load data?

    Thanks!

    Reply
  • 7 April 2011 at 09:14
    Permalink

    Hi!

    You are doing almost everything well, but faceting based autocomplete has some disadvantages. One of the disadvantage is that you have to lowercase the string passed to the facet.prefix parameter yourself. Remember that you are interested in the facet results not the search results – that’s why you should make the rows parameter set to 0. The number of documents is right in your example. To be perfectly clear, I’ve made the following changes in the schema.xml:

    <fieldType name=”text_auto” class=”solr.TextField”>
    <analyzer>
    <tokenizer class=”solr.KeywordTokenizerFactory”/>
    <filter class=”solr.LowerCaseFilterFactory”/>
    </analyzer>
    </fieldType>

    <field name=”name_auto” type=”text_auto” indexed=”true” stored=”false”/>

    And also remember to add copy the name to name_auto or add it to your data:

    <copyField source=”name” dest=”name_auto”/>

    That should do the trick.

    Reply
  • 27 July 2012 at 11:21
    Permalink

    Can anyone plz tell me how autocomplete works in solr, in textbox.

    Reply
  • 28 July 2012 at 10:10
    Permalink

    What do you mean by ‘how autocomplete works’ ? Do you mean how to configure Solr to return autocomplete suggestions or how to build the complete solution ?

    Reply
  • 4 September 2012 at 03:52
    Permalink

    Can any one here upload the complete xml file, please? I follow the tutorial but still can’t get the result.

    Reply
    • 4 September 2012 at 07:56
      Permalink

      It’s been a long time now and I don’t have the whole schema, although all the relevant parts that are needed are in the tutorial. Maybe you can say what problems you have ?

      Reply
  • 4 October 2012 at 13:05
    Permalink

    Hello,
    I have followed your instructions and got this schema.xml

    id

    But doing the query:

    q=*:*&fl=id&facet=true&facet.field=autocomplete&facet.mincount=1&facet.prefix=har&rows=0

    I’m getting this response:

    0385trueid1*:*harautocomplete0

    Solr version is 3.6

    Please, what is wrong?

    Reply
  • 4 October 2012 at 13:07
    Permalink

    Sorry, here’s the schema.xml:

    <?xml version="1.0" encoding="UTF-8" ?>
    <schema name="999" version="1.5">
    <types>
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
    <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="text_ro" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ro.txt" enablePositionIncrements="true"/>
    <filter class="solr.SnowballPorterFilterFactory" language="Romanian"/>
    </analyzer>
    </fieldType>
    <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball" enablePositionIncrements="true"/>
    <filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
    </analyzer>
    </fieldType>
    <fieldType name="text_auto" class="solr.TextField">
    <analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    </fieldType>
    </types>
    <fields>
    <field name="id" type="string" indexed="true" stored="true" required="true"/>
    <field name="date" type="int" indexed="true" stored="false" required="true"/>
    <field name="category_id" type="int" indexed="true" stored="false" required="true"/>
    <field name="user_id" type="string" indexed="true" stored="false" required="true"/>
    <field name="t_ro" type="text_ro" indexed="true" stored="false" required="false"/>
    <field name="t_ru" type="text_ru" indexed="true" stored="false" required="false"/>
    <field name="autocomplete" type="text_auto" indexed="true" stored="false" multiValued="true" />
    <dynamicField name="i_*" type="int" indexed="true" stored="false"/>
    <dynamicField name="f_*" type="float" indexed="true" stored="false"/>
    <dynamicField name="d_*" type="string" indexed="false" stored="true"/>
    </fields>
    <uniqueKey>id</uniqueKey>
    <copyField source="t_ro" dest="autocomplete" />
    <copyField source="t_ru" dest="autocomplete" />
    </schema>

    and here’s solr response:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
    <lst name="responseHeader"><int name="status">0</int><int name="QTime">385</int><lst name="params"><str name="facet">true</str><str name="fl">id</str><str name="facet.mincount">1</str><str name="q">*:*</str><str name="facet.prefix">har</str><str name="facet.field">autocomplete</str><str name="rows">0</str></lst></lst><result name="response" numFound="9613" start="0"/><lst name="facet_counts"><lst name="facet_queries"/><lst name="facet_fields"><lst name="autocomplete"/></lst><lst name="facet_dates"/><lst name="facet_ranges"/></lst>
    </response>

    Reply
    • 18 October 2012 at 09:35
      Permalink

      Can you provide an example document ?

      Reply
  • 10 September 2013 at 09:28
    Permalink

    Could you please provide an example document for solr 4.4?

    Reply
    • 10 September 2013 at 11:21
      Permalink

      We will try to provide an updated post this week, but I can’t promise that we will make it.

      Reply
  • 12 February 2014 at 09:04
    Permalink

    could you please tell me how to get the result and put into the text box?

    Thank you very much

    Reply
  • 5 March 2018 at 10:58
    Permalink

    Hi, Is there a running example that you can provide for a better understanding.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.