Solr and autocomplete (part 1)

Almost everyone has seen how the autocomplete feature looks like. No wonder, then, Solr provides mechanisms by which we can build such functionality. In today’s entry I will show you how you can add autocomplete mechanism using faceting.

Indeks

Suppose you want to show some hints to the user in the on-line store, for example you want to show products name. Suppose that our index is composed of the following fields:

<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="name" type="text" indexed="true" stored="true" multiValued="false" />
<field name="description" type="text" indexed="true" stored="true" multiValued="false" />

A text type is defined as follows:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
 <analyzer>
  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
  <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

Configuration

To start, consider what you want to achieve – do we want to suggest only individual words that make up a name, or maybe full names that begin with the letters specified by the user. Depending on our choices we have to prepare the appropriate field on which we will build hints.

Prompting individual words that make up the name

In the case of single words, we should use a field that is tokenized. In our case, the field named name will be sufficient. However, note that if you want to use for example stemming you should define another type, which do not use stemming because of how this analysis operates on the contents of the field.

Prompting full name

For full names of the products suggestions we need a different field configuration – the best for this will be a untokenized field. But we can not use string based field for that. For this purpose, we define the field as follows:

<field name="name_auto" type="text_auto" indexed="true" stored="true" multiValued="false" />

This type is defined as follows:

<fieldType name="text_auto" class="solr.TextField">
 <analyzer>
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

To not modify the format of the data we also add the appropriate definition of the copy information:

<copyField source="name" dest="name_auto" />

How do we use it ?

To use the data we prepared we use a fairly simple query:

q=*:*&facet=true&facet.field=FIELD&facet.mincount=1&facet.prefix=USER_QUERY

Where:

  • FIELD – field on the basis of which we intend to make suggestions. In our case the field named name_auto.
  • USER_QUERY - letters entered by the user.

It is worth noting rows=0 parameter is added here to only show the faceting result without the query results. Of course, this is not a necessity.

An example query would look like that:

fl=id,name&rows=0&q=*:*&facet=true&facet.field=name_auto&facet.mincount=1&facet.prefix=har

The result of this query might look like this::

<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
 </lst>
 <result name="response" numFound="4" start="0"/>
 <lst name="facet_counts">
  <lst name="facet_queries"/>
  <lst name="facet_fields">
   <lst name="name_auto">
    <int name="hard disk">1</int>
    <int name="hard disk samsung">1</int>
    <int name="hard disk seagate">1</int>
    <int name="hard disk toshiba">1</int>
   </lst>
  </lst>
  <lst name="facet_dates"/></lst>
</response>

Additional features

It is worth to mention the additional opportunities which are inherent to this method.

The first possibility is to show the user additional information such as number of results that you get when you select an appropriate hint. If you want to show such information it will certainly be an interesting option.

The next thing is sorting with the use of facet.sort parameter. Depending on your needs, we can sort the results by the number of documents (the default behavior, parameter set to true) or alphabetically (value set to false).

We may limit the suggestions to those which have more results than a specified number. To take advantage of this opportunity pass in a parameter facet.mincount with the appropriate number.

And as for me the biggest advantage of this method is the possibility of getting only those suggestions that not only match the letters that the user typed but also some other parameters, like category for example. For example, we want to show hints for the user who is in the household section of our store. We suspect that at this moment the user will not be interested in DVD-type products, and therefore we add a parameter fq=department:homeAppliances (assuming that we have such a department). After such a modified query, you do not get hints generated from the entire index, we only get those narrowed to the selected department.

A few words at the end

As other method, this one too, have its advantages and disadvantages. The advantage of this solution is its ease of use, no additional components requirement, and that the result hints can be easily narrowed to be generated only from those documents that match the query entered by the user. As a big plus is that the method includes number of result that will be shown after selecting the hint (of course with the same search parameters). For the downside is definitely need to have additional types and fields, quite limited abilities and the load caused by the use of faceting mechanism.

The next entry about the autocomplete will try to expand on and show a further methods of generating hints using Solr.

This post is also available in: Polish

This entry was posted on Monday, October 18th, 2010 at 07:23 and is filed under About Solr, Autocomplete. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

16 Responses to “Solr and autocomplete (part 1)”

  1. mhitza Says:

    While I did faceting I never played with autocomplete in Solr. Definitely using it in the next project.

    (Bookmarked)

  2. Otis Gospodnetic Says:

    If you need AutoComplete for Solr, here is something that does it. It uses a different approach than the one described here, though.

    http://sematext.com/products/autocomplete/index.html

    You can see how (well) it works on http://search-lucene.com/ . Enjoy!

  3. gr0 Says:

    Hheheheh first “spam” that matches the topic ;-) But seriously – if you are reading this and you need a good auto complete solution you should look at Sematext AutoComplete. On the other hand we are using some additions to the standard Solr distribution (which we made) like stemming components or “did you mean” functionalities ;)

  4. Andrew Says:

    Thanks for this write-up. I’ve been trying to reproduce your results, but when I use the q=*.* in my search, I get a match for everything. Would you mind posting your schema.xml file as well as the process to load data?

    Thanks!

  5. gr0 Says:

    Hi!

    You are doing almost everything well, but faceting based autocomplete has some disadvantages. One of the disadvantage is that you have to lowercase the string passed to the facet.prefix parameter yourself. Remember that you are interested in the facet results not the search results – that’s why you should make the rows parameter set to 0. The number of documents is right in your example. To be perfectly clear, I’ve made the following changes in the schema.xml:

    <fieldType name=”text_auto” class=”solr.TextField”>
    <analyzer>
    <tokenizer class=”solr.KeywordTokenizerFactory”/>
    <filter class=”solr.LowerCaseFilterFactory”/>
    </analyzer>
    </fieldType>

    <field name=”name_auto” type=”text_auto” indexed=”true” stored=”false”/>

    And also remember to add copy the name to name_auto or add it to your data:

    <copyField source=”name” dest=”name_auto”/>

    That should do the trick.

  6. razz Says:

    Can anyone plz tell me how autocomplete works in solr, in textbox.

  7. gr0 Says:

    What do you mean by ‘how autocomplete works’ ? Do you mean how to configure Solr to return autocomplete suggestions or how to build the complete solution ?

  8. Allan Says:

    Can any one here upload the complete xml file, please? I follow the tutorial but still can’t get the result.

  9. gr0 Says:

    It’s been a long time now and I don’t have the whole schema, although all the relevant parts that are needed are in the tutorial. Maybe you can say what problems you have ?

  10. Alexander Says:

    Hello,
    I have followed your instructions and got this schema.xml

    id

    But doing the query:

    q=*:*&fl=id&facet=true&facet.field=autocomplete&facet.mincount=1&facet.prefix=har&rows=0

    I’m getting this response:

    0385trueid1*:*harautocomplete0

    Solr version is 3.6

    Please, what is wrong?

  11. Alexander Says:

    Sorry, here’s the schema.xml:

    <?xml version="1.0" encoding="UTF-8" ?>
    <schema name="999" version="1.5">
    <types>
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
    <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="text_ro" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ro.txt" enablePositionIncrements="true"/>
    <filter class="solr.SnowballPorterFilterFactory" language="Romanian"/>
    </analyzer>
    </fieldType>
    <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball" enablePositionIncrements="true"/>
    <filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
    </analyzer>
    </fieldType>
    <fieldType name="text_auto" class="solr.TextField">
    <analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    </fieldType>
    </types>
    <fields>
    <field name="id" type="string" indexed="true" stored="true" required="true"/>
    <field name="date" type="int" indexed="true" stored="false" required="true"/>
    <field name="category_id" type="int" indexed="true" stored="false" required="true"/>
    <field name="user_id" type="string" indexed="true" stored="false" required="true"/>
    <field name="t_ro" type="text_ro" indexed="true" stored="false" required="false"/>
    <field name="t_ru" type="text_ru" indexed="true" stored="false" required="false"/>
    <field name="autocomplete" type="text_auto" indexed="true" stored="false" multiValued="true" />
    <dynamicField name="i_*" type="int" indexed="true" stored="false"/>
    <dynamicField name="f_*" type="float" indexed="true" stored="false"/>
    <dynamicField name="d_*" type="string" indexed="false" stored="true"/>
    </fields>
    <uniqueKey>id</uniqueKey>
    <copyField source="t_ro" dest="autocomplete" />
    <copyField source="t_ru" dest="autocomplete" />
    </schema>

    and here’s solr response:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
    <lst name="responseHeader"><int name="status">0</int><int name="QTime">385</int><lst name="params"><str name="facet">true</str><str name="fl">id</str><str name="facet.mincount">1</str><str name="q">*:*</str><str name="facet.prefix">har</str><str name="facet.field">autocomplete</str><str name="rows">0</str></lst></lst><result name="response" numFound="9613" start="0"/><lst name="facet_counts"><lst name="facet_queries"/><lst name="facet_fields"><lst name="autocomplete"/></lst><lst name="facet_dates"/><lst name="facet_ranges"/></lst>
    </response>

  12. gr0 Says:

    Can you provide an example document ?

  13. atodaily Says:

    Could you please provide an example document for solr 4.4?

  14. gr0 Says:

    We will try to provide an updated post this week, but I can’t promise that we will make it.

  15. Rac Says:

    could you please tell me how to get the result and put into the text box?

    Thank you very much

  16. gr0 Says:

    If you are looking for UI related things, you can look at jQuery UI – http://jqueryui.com/autocomplete/