Solr and autocomplete (part 1)

Almost everyone has seen how the autocomplete feature looks like. No wonder, then, Solr provides mechanisms by which we can build such functionality. In today’s entry I will show you how you can add autocomplete mechanism using faceting.

Indeks

Suppose you want to show some hints to the user in the on-line store, for example you want to show products name. Suppose that our index is composed of the following fields:

<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="name" type="text" indexed="true" stored="true" multiValued="false" />
<field name="description" type="text" indexed="true" stored="true" multiValued="false" />

A text type is defined as follows:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
 <analyzer>
  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
  <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

Configuration

To start, consider what you want to achieve – do we want to suggest only individual words that make up a name, or maybe full names that begin with the letters specified by the user. Depending on our choices we have to prepare the appropriate field on which we will build hints.

Prompting individual words that make up the name

In the case of single words, we should use a field that is tokenized. In our case, the field named name will be sufficient. However, note that if you want to use for example stemming you should define another type, which do not use stemming because of how this analysis operates on the contents of the field.

Prompting full name

For full names of the products suggestions we need a different field configuration – the best for this will be a untokenized field. But we can not use string based field for that. For this purpose, we define the field as follows:

<field name="name_auto" type="text_auto" indexed="true" stored="true" multiValued="false" />

This type is defined as follows:

<fieldType name="text_auto" class="solr.TextField">
 <analyzer>
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

To not modify the format of the data we also add the appropriate definition of the copy information:

<copyField source="name" dest="name_auto" />

How do we use it ?

To use the data we prepared we use a fairly simple query:

q=*:*&facet=true&facet.field=FIELD&facet.mincount=1&facet.prefix=USER_QUERY

Where:

FIELD – field on the basis of which we intend to make suggestions. In our case the field named name_auto.
USER_QUERY – letters entered by the user.

It is worth noting rows=0 parameter is added here to only show the faceting result without the query results. Of course, this is not a necessity.

An example query would look like that:

fl=id,name&rows=0&q=*:*&facet=true&facet.field=name_auto&facet.mincount=1&facet.prefix=har

The result of this query might look like this::

<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
 </lst>
 <result name="response" numFound="4" start="0"/>
 <lst name="facet_counts">
  <lst name="facet_queries"/>
  <lst name="facet_fields">
   <lst name="name_auto">
    <int name="hard disk">1</int>
    <int name="hard disk samsung">1</int>
    <int name="hard disk seagate">1</int>
    <int name="hard disk toshiba">1</int>
   </lst>
  </lst>
  <lst name="facet_dates"/></lst>
</response>

Additional features

It is worth to mention the additional opportunities which are inherent to this method.

The first possibility is to show the user additional information such as number of results that you get when you select an appropriate hint. If you want to show such information it will certainly be an interesting option.

The next thing is sorting with the use of facet.sort parameter. Depending on your needs, we can sort the results by the number of documents (the default behavior, parameter set to true) or alphabetically (value set to false).

We may limit the suggestions to those which have more results than a specified number. To take advantage of this opportunity pass in a parameter facet.mincount with the appropriate number.

And as for me the biggest advantage of this method is the possibility of getting only those suggestions that not only match the letters that the user typed but also some other parameters, like category for example. For example, we want to show hints for the user who is in the household section of our store. We suspect that at this moment the user will not be interested in DVD-type products, and therefore we add a parameter fq=department:homeAppliances (assuming that we have such a department). After such a modified query, you do not get hints generated from the entire index, we only get those narrowed to the selected department.

A few words at the end

As other method, this one too, have its advantages and disadvantages. The advantage of this solution is its ease of use, no additional components requirement, and that the result hints can be easily narrowed to be generated only from those documents that match the query entered by the user. As a big plus is that the method includes number of result that will be shown after selecting the hint (of course with the same search parameters). For the downside is definitely need to have additional types and fields, quite limited abilities and the load caused by the use of faceting mechanism.

The next entry about the autocomplete will try to expand on and show a further methods of generating hints using Solr.

Solr.pl