“Car sale application” – solr.ReversedWildcardFilter – let’s optimize wildcard queries (part 8)

“Car sale application” users started to use wildard queries more and more often. This fact forced us to think about wildcard queries optimization. solr.ReversedWildcardFilter comes to rescue us.


The solr.ReversedWildcardFilter filter provides us with new tokens, which in fact are reverses tokens, that are indexed to provide faster leading wildcard queries. The filter supports the following init arguments:

  • withOriginal – if true, then produce both original and reversed tokens at the same positions. If false, then produce only reversed tokens.
  • maxPosAsterisk – maximum position (1-based) of the asterisk wildcard (‘*’) that triggers the reversal of query term. Asterisk that occurs at positions higher than this value will not cause the reversal of query term.
  • maxPosQuestion – maximum position (1-based) of the question mark wildcard (‘?’) that triggers the reversal of query term.
  • maxFractionAsterisk – additional parameter that triggers the reversal if asterisk (‘*’) position is less than this fraction of the query token length.
  • minTrailing – minimum number of trailing characters in query token after the last wildcard character. For good performance this should be set to a value larger than 1.

schema.xml changes

New filter is added to the “text” field type:

solr.ReversedWildcardFilterFactory filter is added only to the index analyzer. We do not define any arguments in the filter definition, because we would like to use the default configuration, which is:

  • withOriginal – „true”, we would like to produce original tokens
  • maxPosAsterisk – 2
  • maxPosQuestion – 1
  • maxPosQuestion – 0.0f (disabled)
  • maxPosQuestion – 2

Sample data

Let’s index some sample data:

Let’s create queries

Let me remind you that the default search field is the “content” field, that among others contains “make” and “model” field. To analyse query results and solr.ReversedWildcardFilter filter behaviour, we will set the „stored” argument of the „content” field to “true”. We will also add the debugQuery query argument, which will allow us to find out, which tokens are used in the query processing (original or reversed).

  1. ?q=lan*&fl=id,content&debugQuery=on

    We have used asterisk wildcard (‘*’) at the end of the query (position = 4), so the original tokens were used:

  2. ?q=*dx&fl=id,content&debugQuery=on

    We have used asterisk wildcard (‘*’) at the beginning of the query (position = 1) and additionally we have two trailing characters after the last wildcard. That’s why the revesed tokens were used:

    As we can see, the reversed tokens have a special prefix in order to avoid collisions and false matches.

  3. ?q=r?x&fl=id,content&debugQuery=on

    We have used question mark wildcard (‘?’) on position number 2 and additionally we have only one trailing character after the wildcard. The original tokens were used:

The end

Thanks to the solr.ReversedWildcardFilter filter, we have successfully optimized wildcard queries. “Car sale application” users can now effectively use them 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.