Solr and autocomplete (part 2)

In the previous part I showed how the faceting mechanism can be used to achieve the autocomplete functionality. Today I’ll show you how to use a component called Suggester to implement autocomplete functionality.

The begining


There is one thing that you must know – Suggest component is not available in Solr version 1.4.1 and below. To start using this component you need to download 3_x or trunk version from Lucene/Solr SVN.

Configuration

Before we get into the index configuration we need to define an search component. So let’s do it:

<searchComponent name="suggest" class="solr.SpellCheckComponent">
 <lst name="spellchecker">
  <str name="name">suggest</str>
  <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
  <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
  <str name="field">name_autocomplete</str>
 </lst>
</searchComponent>

It is worth mentioning that suggest component is based on solr.SpellCheckComponent and that’s why we can use the above configuration. We have three important attributes in the configuration:

  • name – name of the component.
  • lookupImpl – an object that will handle the search. At this point we have two possibilities to use – JasperLookup or TSTLookup. This second one characterizes greater efficiency.
  • field – the field on the basis of which suggestions are generated.

Now let’s add the appropriate handler:

<requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
 <lst name="defaults">
  <str name="spellcheck">true</str>
  <str name="spellcheck.dictionary">suggest</str>
  <str name="spellcheck.count">10</str>
 </lst>
 <arr name="components">
  <str>suggest</str>
 </arr>
</requestHandler>

Quite simple configuration, which defines a handler with an additional search component and tell Solr that the maximum number of suggestions returned is 10, this it should use dictionary named suggest (which is actually a Suggest component) which is exactly the same as our defined component.

Index

Let us assume that our document consists of three fields: id, name and description. We want to generate suggestions on the field that hold the name of the product. Our index could look like this:

<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="name" type="text" indexed="true" stored="true" multiValued="false" />
<field name="name_autocomplete" type="text_auto" indexed="true" stored="true" multiValued="false" />
<field name="description" type="text" indexed="true" stored="true" multiValued="false" />

In addition, there is the following copy field definition:

<copyField source="name" dest="name_autocomplete" />

Suggesting single words

In order to achieve individual words suggestions text_autocomplete type should be defined as follows:

<fieldType class="solr.TextField" name="text_auto" positionIncrementGap="100">
 <analyzer>
  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
  <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

Suggesting phrases

To implement the entire phrase suggestions our text_autocomplete type should be defined as follows:

<fieldType class="solr.TextField" name="text_auto">
 <analyzer>
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

If you want to use phrases you may want to define your own query converter.

Dictionary building

Before we start using the component, we need to build its index. To this send the following command to Solr:

/suggest?spellcheck.build=true

Queries

Now we come to use of the component. In order to show how the use the component, I decided suggest whole phrases. The example query could look like that:

/suggest?q=har

After running that query I got the following suggestions:

<?xml version="1.0" encoding="UTF-8"?>
<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
 </lst>
 <lst name="spellcheck">
  <lst name="suggestions">
   <lst name="dys">
    <int name="numFound">4</int>
    <int name="startOffset">0</int>
    <int name="endOffset">3</int>
    <arr name="suggestion">
     <str>hard drive</str>
     <str>hard drive samsung</str>
     <str>hard drive seagate</str>
     <str>hard drive toshiba</str>
    </arr>
   </lst>
  </lst>
 </lst>
</response>

The end

In the next part of the autocomplete functionality I’ll show how to modify its configuration to use static dictionary into the mechanism and how this can helk you get better suggestions. The last part of the series will be a performance comparison of each method in which I’ll try to diagnose which method is the fastest one in various situations.

Leave a Reply

Your email address will not be published. Required fields are marked *