Autocomplete on multivalued fields using highlighting

One of the recent topics I came across was auto complete feature based on Solr multi-valued fields (for example, this question was asked on Stack Overflow). Let’s look what possibilities we have.

Multiple cores vs single core

One of the possibilities we should consider in the beginning is if we can use a dedicated core or collection for autocomplete. If we can, we should go that way. There are multiple reasons in favor of such approach, for example such collection will be smaller than the one with the data that needs to be search-able, the term count should be smaller and thus your queries will be faster. Of course we have to take care of the additional configuration and indexing, but that’s not too much of a problem right ? In this entry we will look at the situations where having a separate core is not an option – for example because of filtering that needs to be done.

Please also note, that in this entry we assume that we want whole phrases to be shown for the user.

Configuration

Let’s start from the configuration.

Struktura indeksu

Let’s assume that we want to suggest phrases from the multi valued fields. Let’s call that field  features. Configuration of all the fields in the index is as follows:

As you can see, for the auto complete feature, we will use the field named features_autocomplete. The _version_ field is needed by some of the Solr 4.0 (and newer) features and because of that it is present in our index.

Field values copying

In addition to the above configuration we also want to copy the data from the features field to the features_autocomplete one. In order to do that we will use Solr copy field feature. To do that, we add the following section to the schema.xml file:

Field type – text_autocomplete

Let’s have a look at the last thing we have when it comes to configuration – the definition of the text_autocomplete type:

As you can see, during indexing, Solr will create n-grams from the phrase indexed in the features_autocomplete field. It will start from the minimum length of 2, ending on the maximum length of 50.

During querying we will only lowercase our query phrase, nothing else is needed in our case.

Sample data

Our sample data looks like this:

Initial query

Let’s look at the queries now.

In the beginning

Let’s start with a simple query that would return the data we need if we would use a single valued fields. The query looks as follows:

Query results

The results we would get from such query, for our example data, should look like this:

A short comment

As we can see, the results are not satisfying us, because in addition to the value we are querying for, we got all the values that are stored in the multi-valued field. We would only like to have the one that we queried for. Is this possible ? Yes it is – with a little trick. Let’s modify our query to use highlighting.

Query with highlighting

So now, we will make use of Apache Solr highlighting module.

Changed query

What we will do is add the following part to our previous query:

So the whole query looks like this:

A few words about the parameters that were used:

  • hl=true – we inform Solr that we want to use highlighting,
  • hl.fl=features_autocomplete – we tell Solr which field should be used for highlighting,
  • hl.simple.pre= – setting the hl.simple.pre to empty value tells Solr that we don’t want to mark the beginning of the highlighted fragment,
  • hl.simple.post= – setting the hl.simple.post to empty value tells Solr that we don’t want to mark the end of the highlighted fragment.

Modified query results

After querying Solr with the modified query, the following results were returned:

As you can see, the section responsible for highlighting brings the information that we are interested in 🙂

Summary

Of course we need to remember that the approach proposed in this entry is not the only way to have a working auto-complete feature with data in multi-valued fields. In the next entry in this topic we will show how we can use faceting do get the same results if only we can accept some small drawbacks.

2 thoughts on “Autocomplete on multivalued fields using highlighting

  • 17 May 2016 at 07:22
    Permalink

    I am unable to configure auto-complete conf in case of multiple fields. I am working on solr 5.3. Auto-complete in case of single field is working fine. I just wanna know if this method is suitable to use for solr 5.3.

    Reply
  • 31 May 2016 at 08:30
    Permalink

    How to get accurate results for multiple words?

    For Ex: if my query is “q=features_autocomplete:single+w&fl=features_autocomplete”

    I want the result to be only “Single Window”, i don’t want “Single door” in that.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.