RankField & Rank Query Parser

One of the additions to Solr that we didn’t talk about yet is the new field type called the RankField and the Rank Query Parser that can leverage it. Together they can be used to introduce scoring based on the content of the document in an optimized way. Let’s have a quick look at what the mentioned pair gives us.

The Idea Behind Rank Query Parser

The idea behind the Rank Query Parser is that it provides the functionality of using the information from the document to modify the score of the resulting documents. It provides a subset of what the Function Query Parser already provided, but it can also be used with the BlockMax-WAND algorithm for improved query performance. 

The RankField

Using RankField is very simple. We need to define the appropriate field type, a field using that field type, and of course, populate it with data. Let’s assume we have the following document structure:

{
  "id" : 1,
  "name": "RankField and RankQueryParser",
  "type": "post",
  "views": 1000 
}

We have the document identifier, the name of the document, its type, and the number of views. We will be interested in the last field. In addition to using it for display purposes, we would also like to use it for ranking. Our schema could look as follows:

<field name="id" type="string" />
<field name="name" type="text_ws" />
<field name="type" type="string" />
<field name="views" type="rank" />

We also need to define the rank type, which could look as follows:

<fieldType name="rank" class="solr.RankField" />

That is everything we need – we are ready to go.

Using the Rank Query Parser

To simply use the RankQueryParser and include the views field in the scoring calculation we could run a query similar to the following one:

q=_query_:{!rank f='views' function='log'}

Knowing that we have two documents that look as follows:

[
  {
    "id" : 1,
    "name": "RankField and RankQueryParser",
    "type": "post",
    "views": 1000 
  },
  {
    "id" : 2,
    "name": "Lucene and Solr 8.6.1 were released",
    "type": "announcement",
    "views": 10
  }
]

Our results would look like this:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":3,
    "params":{
      "q":"_query_:{!rank f='views' function='log'}",
      "fl":"score,*"}},
  "response":{"numFound":2,"start":0,"maxScore":6.908755,"numFoundExact":true,"docs":[
      {
        "id":"1",
        "name":"RankField and RankQueryParser",
        "type":"post",
        "_version_":1678886835690930176,
        "score":6.908755},
      {
        "id":"2",
        "name": "Lucene and Solr 8.6.1 were released",
        "type":"announcement",
        "_version_":1678886835758039040,
        "score":2.3978953}]
  }}

You can see that even though we’ve run the match all query that gives a score of 1.0 to all matching documents, the score in our case is different. Solr took the log function and applied it to all matching results.

Performance

Of course, the above behavior can be easily achieved by using a standard Function Query Parser, but the key point with the Rank Query Parser is that we can use the BlockMax-WAND algorithm to improve the performance of our query. To do this we need to include the minExactCount parameter to our query to define how many accurate hits need to be present in the results. After that, Solr may skip documents that do not enter the top N results matching the query.

The response from Solr when minExactCount parameter is used look as follows:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":1,
    "params":{
      "q":"_query_:{!rank f='views' function='log'}",
      "fl":"score,*",
      "minExactCount":"1"}},
  "response":{"numFound":2,"start":0,"maxScore":6.908755,"numFoundExact":true,"docs":[
      {
        "id":"1",
        "name":"RankField and RankQueryParser",
        "type":"post",
        "_version_":1678886835690930176,
        "score":6.908755},
      {
        "id":"2",
        "name":"Lucene and Solr 8.6.1 were released",
        "type":"announcement",
        "_version_":1678886835758039040,
        "score":2.3978953}]
  }}

You can see an additional numFoundExact attribute in the response header. We will talk about the BlockMax-WAND algorithm in Solr in the next few weeks in a dedicated blog post, so stay tuned if you would like to read about it. There are some pros and cons to it that I think is worth discussing. 

Available Functions

At the moment of writing the blog post there are three functions available that we can use with the Rank Query Parser:

  • log – the logarithmic function, which accepts weight and scalingFactor attributes
  • satu – the saturation function accepting the pivot and weight attributes
  • sigm – the sigmoid function accepting the pivotweight, and exponent attributes

You can use one of those functions to scale the scoring factor and adjust how the rank field value affects the scoring.

Conclusions

Though we already had the ability to include the function query in our queries and use the field value from it we can now also use the BlockMax-WAND algorithm. This allows improving the query performance in situations where we don’t need the exact number of rows and we are happy with only top N results. Something worth considering.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.