Solr and autocomplete (part 2)

In the previous part I showed how the faceting mechanism can be used to achieve the autocomplete functionality. Today I’ll show you how to use a component called Suggester to implement autocomplete functionality.

The begining


There is one thing that you must know – Suggest component is not available in Solr version 1.4.1 and below. To start using this component you need to download 3_x or trunk version from Lucene/Solr SVN.

Configuration

Before we get into the index configuration we need to define an search component. So let’s do it:

<searchComponent name="suggest" class="solr.SpellCheckComponent">
 <lst name="spellchecker">
  <str name="name">suggest</str>
  <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
  <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
  <str name="field">name_autocomplete</str>
 </lst>
</searchComponent>

It is worth mentioning that suggest component is based on solr.SpellCheckComponent and that’s why we can use the above configuration. We have three important attributes in the configuration:

  • name - name of the component.
  • lookupImpl – an object that will handle the search. At this point we have two possibilities to use – JasperLookup or TSTLookup. This second one characterizes greater efficiency.
  • field – the field on the basis of which suggestions are generated.

Now let’s add the appropriate handler:

<requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
 <lst name="defaults">
  <str name="spellcheck">true</str>
  <str name="spellcheck.dictionary">suggest</str>
  <str name="spellcheck.count">10</str>
 </lst>
 <arr name="components">
  <str>suggest</str>
 </arr>
</requestHandler>

Quite simple configuration, which defines a handler with an additional search component and tell Solr that the maximum number of suggestions returned is 10, this it should use dictionary named suggest (which is actually a Suggest component) which is exactly the same as our defined component.

Index

Let us assume that our document consists of three fields: id, name and description. We want to generate suggestions on the field that hold the name of the product. Our index could look like this:

<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="name" type="text" indexed="true" stored="true" multiValued="false" />
<field name="name_autocomplete" type="text_auto" indexed="true" stored="true" multiValued="false" />
<field name="description" type="text" indexed="true" stored="true" multiValued="false" />

In addition, there is the following copy field definition:

<copyField source="name" dest="name_autocomplete" />

Suggesting single words

In order to achieve individual words suggestions text_autocomplete type should be defined as follows:

<fieldType class="solr.TextField" name="text_auto" positionIncrementGap="100">
 <analyzer>
  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
  <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

Suggesting phrases

To implement the entire phrase suggestions our text_autocomplete type should be defined as follows:

<fieldType class="solr.TextField" name="text_auto">
 <analyzer>
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

If you want to use phrases you may want to define your own query converter.

Dictionary building

Before we start using the component, we need to build its index. To this send the following command to Solr:

/suggest?spellcheck.build=true

Queries

Now we come to use of the component. In order to show how the use the component, I decided suggest whole phrases. The example query could look like that:

/suggest?q=har

After running that query I got the following suggestions:

<?xml version="1.0" encoding="UTF-8"?>
<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
 </lst>
 <lst name="spellcheck">
  <lst name="suggestions">
   <lst name="dys">
    <int name="numFound">4</int>
    <int name="startOffset">0</int>
    <int name="endOffset">3</int>
    <arr name="suggestion">
     <str>hard drive</str>
     <str>hard drive samsung</str>
     <str>hard drive seagate</str>
     <str>hard drive toshiba</str>
    </arr>
   </lst>
  </lst>
 </lst>
</response>

The end

In the next part of the autocomplete functionality I’ll show how to modify its configuration to use static dictionary into the mechanism and how this can helk you get better suggestions. The last part of the series will be a performance comparison of each method in which I’ll try to diagnose which method is the fastest one in various situations.

This post is also available in: Polish

This entry was posted on Monday, November 15th, 2010 at 08:31 and is filed under About Solr, Autocomplete, Solr. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

28 Responses to “Solr and autocomplete (part 2)”

  1. Jean-Claude Dauphin Says:

    Hello,

    Thank you for these articles. They are very useful.

    I implemented autocompletion with the TermsComponent and the Shingle Filter which works quite well.
    **What are the advantages og the Suggester component over the TermsComponent (Except that you can get the suggestions from a static file)?
    **Is-it possible to mix suggestions from the index with suggestions from a static file?

    Thank you for yr time.

    JCD

  2. Le Hoan Says:

    Hi,
    I had a question for you? i have made autocompletion. But when i use queries more two terms, results return about is not right(ie: suggest?q=hard drive). i don’t know why? can you explain for me?

  3. gr0 Says:

    I’ll try to take a look at it today and see what is your problem.

  4. Le Hoan Says:

    can you explain me?

  5. gr0 Says:

    The thing what you are seeing is the way that Suggester works – it splits the words you send on the basis of whitespace characters. One thing you can do about it is develop your own query converter or use faceting and facet.prefix parameter to get the autocomplete functionality.

  6. Diego Says:

    Hi, you can publish a complete schema.xml? thanks from argentina, Diego

  7. gr0 Says:

    That can be hard, because I don’t have it right now :( But you shouldn’t have much trouble with adding those fields to your own schema.

  8. Praveen Says:

    I have tried this approach and it worked wonderfully.

    Thank you for the enlightening article.

    But I need to retrieve the document ‘id’ and ‘title’ along with the suggestion. How can that be done?

  9. gr0 Says:

    I’m affraid you won’t be able to retrieve that information using Suggester and its out of the box functionalities.

  10. JC Says:

    Hi,

    Thank you very much for this article, I had been looking for a solr phrase suggester with no luck.

    I tried yours, but it still doesn’t work for me, when I query for a word, it doesn’t suggest phrases, but just words instead.

    Something weird I found in your example, is that your query says /suggest?q=har, but the response seems to be for some other word: “dys” , though suggestions in the same response seem to be for “har”. Anyway, I can’t make it work.

    Some other thing, is it maybe that I need something else? I don’t get what you mean when you say “If you want to use phrases you may want to define your own query converter.”

    0
    0

    4
    0
    3

    hard drive
    hard drive samsung
    hard drive seagate
    hard drive toshiba

  11. nan Says:

    Thank you for providing this tutorial. I used “Suggesting phrases” in my schema, but returned suggestions are not what i really wanted. For example, “bamboo garden” being splitted into 2 keywords and returned two sets of suggestions, one for bamboo, one for garden….i expected results for “bamboo garden” and “bamboo garden blah blah blah”

    is there anything i can do to get it fixed?

  12. gr0 Says:

    What type of field did you use for your suggestions ?

  13. mark Says:

    How do you popup ( show ) the suggested terms using Jquery? is there any examples?

  14. gr0 Says:

    Take a look at the UI provided with Solr (I think the address was http://localhost:8983/solr/browse). There is a an jquery example there.

  15. radag Says:

    Hi! I’ve tried to make phrases suggestions working, but no luck with this. I use text_auto type with KeywordTokenizerFactory from your example, however it still does not work. Any suggestions?

  16. Johan Says:

    Hi, Thanks for a great post.

    I have one question though because there is something that I do not understand.

    The suggestions that are returned are always in lowercase.
    why is that? Is it not the stored value that is returned?

    Kind regards /Johan

  17. gr0 Says:

    Thanks :) We’re glad you’ve liked it. As for the suggestions – they are lowercased, because our field types are defined as lowercased. You can experiment with removing the lowercase filter, reindexing the data and trying to get suggestions again.

  18. Johan Says:

    Thanks,
    So correct me if I am wrong, it is not the stored value that is returned as suggestions, is the indexed value?

  19. gr0 Says:

    Right, it’s the indexed value.

  20. babu Says:

    Thanks. But i want to know how to use this suggest query to display the suggestions in the search box. Do i need to use any ajaz query?

  21. rahul Says:

    Thanks for the nice article on configuring suggestor component. It helped me a lot.

  22. gr0 Says:

    Yes, you’ll need the UI part developed.

  23. Selvam Says:

    Thanks for the article. Simple and elegant. One query I have is,

    Assuming I index the word “Hard drive” typing “har” returns “hard drive” (lowercased form). Is there a way I can make it return original word i.e “Hard drive”.

  24. Keerthana Says:

    I tried to make the changes in the solrconfig.xml file of solr 3.1 version.But we are unable to get the results when we run the solr. Do we have to make changes in the schema.xml file or pass a text file to the suggestor?? Please do help ! Thank you !

  25. gr0 Says:

    We only had the changes that are present in the blog post. What are the problems you are facing ?

  26. SD Says:

    Thank you for great article. But I need 5 fields data suggest me. This guide only one field /name/ data suggest.

    How can I suggest me on 5+ fields data?

  27. gr0 Says:

    We will try to add a new article on how to do auto complete on more than a single field.

  28. anonymous Says:

    Hi,
    Thanks for this wonderful article. It helped me a lot.
    I have one question to ask.
    When i implement the same code, i get the suggestions in lower case. So, i removed the lowercase filter factory,then if i search “B” it returns the suggestions in proper case that is like “Biology” but when i search “b”, it doesn’t return anything.

    I want the auto suggestions to be in proper case as it is indexed even if i search using small case or capital letters.

    Could you please suggest me what could be done?