<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>rankqueryparser &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/rankqueryparser/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Sat, 14 Nov 2020 15:22:48 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>RankField &#038; Rank Query Parser</title>
		<link>https://solr.pl/en/2020/09/28/rankfield-rank-query-parser/</link>
					<comments>https://solr.pl/en/2020/09/28/rankfield-rank-query-parser/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 28 Sep 2020 14:22:14 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[query parser]]></category>
		<category><![CDATA[rank]]></category>
		<category><![CDATA[rankfield]]></category>
		<category><![CDATA[rankqueryparser]]></category>
		<category><![CDATA[solr]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=1036</guid>

					<description><![CDATA[One of the additions to Solr that we didn&#8217;t talk about yet is the new field type called the&#160;RankField&#160;and the&#160;Rank Query Parser&#160;that can leverage it. Together they can be used to introduce scoring based on the content of the document]]></description>
										<content:encoded><![CDATA[
<p>One of the additions to Solr that we didn&#8217;t talk about yet is the new field type called the&nbsp;<strong>RankField</strong>&nbsp;and the&nbsp;<strong>Rank Query Parser</strong>&nbsp;that can leverage it. Together they can be used to introduce scoring based on the content of the document in an optimized way. Let&#8217;s have a quick look at what the mentioned pair gives us.</p>



<span id="more-1036"></span>



<h2 class="wp-block-heading">The Idea Behind Rank Query Parser</h2>



<p>The idea behind the&nbsp;<strong>Rank Query Parser</strong>&nbsp;is that it provides the functionality of using the information from the document to modify the score of the resulting documents. It provides a subset of what the&nbsp;<strong>Function Query Parser</strong>&nbsp;already provided, but it can also be used with the BlockMax-WAND algorithm for improved query performance.&nbsp;</p>



<h2 class="wp-block-heading">The RankField</h2>



<p>Using&nbsp;<strong>RankField</strong>&nbsp;is very simple. We need to define the appropriate field type, a field using that field type, and of course, populate it with data. Let&#8217;s assume we have the following document structure:</p>



<pre class="wp-block-code"><code class="">{
  "id" : 1,
  "name": "RankField and RankQueryParser",
  "type": "post",
  "views": 1000 
}</code></pre>



<p>We have the document identifier, the name of the document, its type, and the number of views. We will be interested in the last field. In addition to using it for display purposes, we would also like to use it for ranking. Our schema could look as follows:</p>



<pre class="wp-block-code"><code class="">&lt;field name="id" type="string" />
&lt;field name="name" type="text_ws" />
&lt;field name="type" type="string" />
&lt;field name="views" type="rank" /></code></pre>



<p>We also need to define the&nbsp;<strong>rank</strong>&nbsp;type, which could look as follows:</p>



<pre class="wp-block-code"><code class="">&lt;fieldType name="rank" class="solr.RankField" /></code></pre>



<p>That is everything we need &#8211; we are ready to go.</p>



<h2 class="wp-block-heading">Using the Rank Query Parser</h2>



<p>To simply use the&nbsp;<strong>RankQueryParser</strong>&nbsp;and include the&nbsp;<strong>views</strong>&nbsp;field in the scoring calculation we could run a query similar to the following one:</p>



<pre class="wp-block-code"><code class="">q=_query_:{!rank f='views' function='log'}</code></pre>



<p>Knowing that we have two documents that look as follows:</p>



<pre class="wp-block-code"><code class="">[
  {
    "id" : 1,
    "name": "RankField and RankQueryParser",
    "type": "post",
    "views": 1000 
  },
  {
    "id" : 2,
    "name": "Lucene and Solr 8.6.1 were released",
    "type": "announcement",
    "views": 10
  }
]</code></pre>



<p>Our results would look like this:</p>



<pre class="wp-block-code"><code class="">{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":3,
    "params":{
      "q":"_query_:{!rank f='views' function='log'}",
      "fl":"score,*"}},
  "response":{"numFound":2,"start":0,"maxScore":6.908755,"numFoundExact":true,"docs":[
      {
        "id":"1",
        "name":"RankField and RankQueryParser",
        "type":"post",
        "_version_":1678886835690930176,
        "score":6.908755},
      {
        "id":"2",
        "name": "Lucene and Solr 8.6.1 were released",
        "type":"announcement",
        "_version_":1678886835758039040,
        "score":2.3978953}]
  }}</code></pre>



<p>You can see that even though we&#8217;ve run the&nbsp;<strong>match all</strong>&nbsp;query that gives a score of&nbsp;<strong>1.0</strong>&nbsp;to all matching documents, the score in our case is different. Solr took the&nbsp;<strong>log</strong>&nbsp;function and applied it to all matching results.</p>



<h2 class="wp-block-heading">Performance</h2>



<p>Of course, the above behavior can be easily achieved by using a standard&nbsp;<strong>Function Query Parser</strong>, but the key point with the&nbsp;<strong>Rank Query Parser</strong>&nbsp;is that we can use the BlockMax-WAND algorithm to improve the performance of our query. To do this we need to include the&nbsp;<strong>minExactCount</strong>&nbsp;parameter to our query to define how many accurate hits need to be present in the results. After that, Solr may skip documents that do not enter the top N results matching the query.</p>



<p>The response from Solr when&nbsp;<strong>minExactCount</strong>&nbsp;parameter is used look as follows:</p>



<pre class="wp-block-code"><code class="">{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":1,
    "params":{
      "q":"_query_:{!rank f='views' function='log'}",
      "fl":"score,*",
      "minExactCount":"1"}},
  "response":{"numFound":2,"start":0,"maxScore":6.908755,"numFoundExact":true,"docs":[
      {
        "id":"1",
        "name":"RankField and RankQueryParser",
        "type":"post",
        "_version_":1678886835690930176,
        "score":6.908755},
      {
        "id":"2",
        "name":"Lucene and Solr 8.6.1 were released",
        "type":"announcement",
        "_version_":1678886835758039040,
        "score":2.3978953}]
  }}</code></pre>



<p>You can see an additional&nbsp;<strong>numFoundExact</strong>&nbsp;attribute in the response header. We will talk about the BlockMax-WAND algorithm in Solr in the next few weeks in a dedicated blog post, so stay tuned if you would like to read about it. There are some pros and cons to it that I think is worth discussing.&nbsp;</p>



<h2 class="wp-block-heading">Available Functions</h2>



<p>At the moment of writing the blog post there are three functions available that we can use with the&nbsp;<strong>Rank Query Parser</strong>:</p>



<ul class="wp-block-list"><li><strong>log</strong>&nbsp;&#8211; the logarithmic function, which accepts&nbsp;<strong>weight</strong>&nbsp;and&nbsp;<strong>scalingFactor</strong>&nbsp;attributes</li><li><strong>satu</strong>&nbsp;&#8211; the saturation function accepting the&nbsp;<strong>pivot</strong>&nbsp;and&nbsp;<strong>weight</strong>&nbsp;attributes</li><li><strong>sigm</strong>&nbsp;&#8211; the sigmoid function accepting the&nbsp;<strong>pivot</strong>,&nbsp;<strong>weight</strong>, and&nbsp;<strong>exponent</strong>&nbsp;attributes</li></ul>



<p>You can use one of those functions to scale the scoring factor and adjust how the rank field value affects the scoring.</p>



<h2 class="wp-block-heading">Conclusions</h2>



<p>Though we already had the ability to include the function query in our queries and use the field value from it we can now also use the BlockMax-WAND algorithm. This allows improving the query performance in situations where we don&#8217;t need the exact number of rows and we are happy with only top N results. Something worth considering.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2020/09/28/rankfield-rank-query-parser/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
