Autocomplete on multivalued fields using faceting

In the previous blog post about auto complete on multi-valued field we discussed how highlighting can help us get the information we are interested in. We also promised that we will get back to the topic and we will show how to achieve a similar functionality with the use of Solr faceting capabilities. So, let’s do it.

Before we start

Because this post is more or less a continuation of what we’ve wrote earlier about autocomplete on multi-valued fields we recommend to read the “Autocomplete on multivalued field using highlighting” before reading the rest of this entry. We would also like to note, that the method shown in this entry is very similar to the one shown in the “Solr and autocomplete (part 1)” post, but we wanted to refresh that topic and show the example using multi-valued fields.

Configuration

Similar to the previous post we will start with Solr configuration.

Index structure

The structure of our index is exactly the same as the one previously shown, but let’s recall it. One thing – please remember that we want to have auto complete working on multi-valued field. This field is called features and the whole index fields configuration looks like this:

<fields>
 <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
 <field name="features" type="string" indexed="true" stored="true" multiValued="true"/>
 <field name="features_autocomplete" type="text_autocomplete" indexed="true" stored="true" multiValued="true"/>

 <field name="_version_" type="long" indexed="true" stored="true"/>
</fields>

For getting values for auto complete we will use the features_autocomplete field.

Copy field

Of course we don’t want to change our indexer and we want Solr to automatically copy the data from features field to the features_autocomplete one. Because of that we will add the copyField definition to the schema.xml file, so it looks like this:

<copyField source="features" dest="features_autocomplete"/>

Our text_autocomplete field type

And we’ve come to the first difference – the text_autocomplete field type. This time it looks like this:

<fieldType name="text_autocomplete" class="solr.TextField" positionIncrementGap="100">
 <analyzer>
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

Because of the fact that we will use faceting we use the solr.KeywordTokenizerFactory with the solr.LowerCaseFilterFactory to have the data in our field as a single, lowercased token.

Example data

Our example data is identical to what we had before, but even though let’s recall them for things to be clear:

<add>
 <doc>
  <field name="id">1</field>
  <field name="features">Multiple windows</field>
  <field name="features">Single door</field>
 </doc>
 <doc>
  <field name="id">2</field>
  <field name="features">Single window</field>
  <field name="features">Single door</field>
 </doc>
 <doc>
  <field name="id">3</field>
  <field name="features">Multiple windows</field>
  <field name="features">Multiple doors</field>
 </doc>
</add>

Query with faceting

Let’s look how our query will look like when we will use faceting instead of highlighting.

Full query

When using faceting our query should look more or less like the following one:

q=*:*&rows=0&facet=true&facet.field=features_autocomplete&facet.prefix=sing

A few words about the parameters:

  • rows=0 – we tell Solr that we don’t want the documents that matched the query in the results,
  • facet=true – we inform Solr that we want to use faceting,
  • facet.field=features_autocomplete – we say which field will be used to calculate faceting,
  • facet.prefix=sing – with the use of this parameter we provide the value of a query for auto complete.

Query results

Query results returned by Solr for the above query are as follows:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
  <lst name="params">
    <str name="facet">true</str>
    <str name="q">*:*</str>
    <str name="facet.prefix">sing</str>
    <str name="facet.field">features_autocomplete</str>
    <str name="rows">0</str>
  </lst>
</lst>
<result name="response" numFound="3" start="0">
</result>
<lst name="facet_counts">
  <lst name="facet_queries"/>
  <lst name="facet_fields">
    <lst name="features_autocomplete">
      <int name="single door">2</int>
      <int name="single window">1</int>
    </lst>
  </lst>
  <lst name="facet_dates"/>
  <lst name="facet_ranges"/>
</lst>
</response>

As you can see in the field faceting section we got the phrases we were interested in along with the number of documents they appear in.

What to remember about

The crucial thing to remember is that the value provided to the facet.prefix parameter is not analyzed. Because of that if we would provide the Sing value instead of the singwe wouldn’t get the results. You should remember that.

A short summary

The above entry shown the second method used to develop auto complete functionality on multi-valued fields. Of couse we didn’t say all about the topic and we will get back to it someday, but for now that is all. We hope that someone will find it useful :)

This post is also available in: Polish

This entry was posted on Monday, March 25th, 2013 at 08:59 and is filed under About Solr, Autocomplete, Solr. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to “Autocomplete on multivalued fields using faceting”

  1. Artem Lukanin Says:

    Which method is more effective, using highlites (plus termVectors) or facets?