<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>dismax &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/dismax-2/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Wed, 11 Nov 2020 22:41:30 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>What can we use Dismax tie parameter for?</title>
		<link>https://solr.pl/en/2012/02/06/what-can-we-use-dismax-tie-parameter-for/</link>
					<comments>https://solr.pl/en/2012/02/06/what-can-we-use-dismax-tie-parameter-for/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 06 Feb 2012 22:40:53 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[dismax]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[tie]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=439</guid>

					<description><![CDATA[Dismax query parser have been with Solr for a long time. Most of the time we use parameters like qf, pf&#160;or mm&#160;forgetting about a very useful parameter which allows us to control how the lower scoring fields are treated &#8211;]]></description>
										<content:encoded><![CDATA[<p>Dismax query parser have been with Solr for a long time. Most of the time we use parameters like <em>qf</em>, <em>pf</em>&nbsp;or <em>mm</em>&nbsp;forgetting about a very useful parameter which allows us to control how the lower scoring fields are treated &#8211; about the <em>tie</em>&nbsp;parameter.</p>
<p><span id="more-439"></span></p>
<h3>Tie</h3>
<p>The <em>tie</em>&nbsp;parameter allows one to control how the lower scoring fields affects score for a given word. If we set the <em>tie</em>&nbsp;parameter to a 0.0 value, during the score calculation, only the fields that were scored highest will matter. However if we set it to 0.99 the fields scoring lower will have almost the same impact on the score as the highest scoring field. So let&#8217;s check if that actually works.</p>
<h3>Data structure and data example</h3>
<p>To test how the <em>tie</em>&nbsp;parameter works I&#8217;ve chosen a simple index structure which would describe products in e-commerce shop, of course in a simple mode:
</p>
<pre class="brush:xml">&lt;field name="id" type="string" indexed="true" stored="true" required="true" /&gt;
&lt;field name="title" type="text_ws" indexed="true" stored="true" /&gt;
&lt;field name="description" type="text_ws" indexed="true" stored="true" /&gt;
&lt;field name="author" type="text_ws" indexed="true" stored="true" multiValued="true" /&gt;</pre>
<p>The&nbsp;<em>text_ws</em>&nbsp;type was defined as follows:
</p>
<pre class="brush:xml">&lt;fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100"&gt;
 &lt;analyzer&gt;
  &lt;tokenizer class="solr.WhitespaceTokenizerFactory"/&gt;
  &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
 &lt;/analyzer&gt;
&lt;/fieldType&gt;</pre>
<p>The example documents look like this:
</p>
<pre class="brush:xml">&lt;add&gt;
 &lt;doc&gt;
  &lt;field name="id"&gt;1&lt;/field&gt;
  &lt;field name="title"&gt;First test book&lt;/field&gt;
  &lt;field name="description"&gt;This is a description of the first test book by Joe and Jane Blow&lt;/field&gt;
  &lt;field name="author"&gt;Joe Blow&lt;/field&gt;
  &lt;field name="author"&gt;Jane Blow&lt;/field&gt;
 &lt;/doc&gt;
 &lt;doc&gt;
  &lt;field name="id"&gt;2&lt;/field&gt;
  &lt;field name="title"&gt;Second test book&lt;/field&gt;
  &lt;field name="description"&gt;This is a description of the second test book by Joe Blow&lt;/field&gt;
  &lt;field name="author"&gt;Joe Blow&lt;/field&gt;
 &lt;/doc&gt;
&lt;/add&gt;</pre>
<h3>Tie == 0.01 result</h3>
<p>Let&#8217;s start the test. The first query was the following one:
</p>
<pre class="brush:xml">defType=dismax&amp;qf=title^1000 description author^10&amp;tie=0.01&amp;fl=id,score&amp;debugQuery=on&amp;indent=true&amp;q=joe blow book</pre>
<p>The above resulted in the following Solr results (visualization &#8211; <a href="http://explain.solr.pl/explains/cf0wnkpj" target="_blank" rel="noopener noreferrer">http://explain.solr.pl/explains/cf0wnkpj</a>):
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;lst name="responseHeader"&gt;
  &lt;int name="status"&gt;0&lt;/int&gt;
  &lt;int name="QTime"&gt;8&lt;/int&gt;
  &lt;lst name="params"&gt;
    &lt;str name="fl"&gt;id,score&lt;/str&gt;
    &lt;str name="debugQuery"&gt;on&lt;/str&gt;
    &lt;str name="indent"&gt;true&lt;/str&gt;
    &lt;str name="tie"&gt;0.01&lt;/str&gt;
    &lt;str name="q"&gt;joe blow book&lt;/str&gt;
    &lt;str name="qf"&gt;title^1000 description author^10&lt;/str&gt;
    &lt;str name="defType"&gt;dismax&lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;result name="response" numFound="2" start="0" maxScore="0.07342677"&gt;
  &lt;doc&gt;
    &lt;float name="score"&gt;0.07342677&lt;/float&gt;
    &lt;str name="id"&gt;2&lt;/str&gt;
  &lt;/doc&gt;
  &lt;doc&gt;
    &lt;float name="score"&gt;0.073365316&lt;/float&gt;
    &lt;str name="id"&gt;1&lt;/str&gt;
  &lt;/doc&gt;
&lt;/result&gt;
&lt;lst name="debug"&gt;
  &lt;str name="rawquerystring"&gt;joe blow book&lt;/str&gt;
  &lt;str name="querystring"&gt;joe blow book&lt;/str&gt;
  &lt;str name="parsedquery"&gt;+((DisjunctionMaxQuery((author:joe^10.0 | title:joe^1000.0 | description:joe)~0.01) DisjunctionMaxQuery((author:blow^10.0 | title:blow^1000.0 | description:blow)~0.01) DisjunctionMaxQuery((author:book^10.0 | title:book^1000.0 | description:book)~0.01))~3) ()&lt;/str&gt;
  &lt;str name="parsedquery_toString"&gt;+(((author:joe^10.0 | title:joe^1000.0 | description:joe)~0.01 (author:blow^10.0 | title:blow^1000.0 | description:blow)~0.01 (author:book^10.0 | title:book^1000.0 | description:book)~0.01)~3) ()&lt;/str&gt;
  &lt;lst name="explain"&gt;
    &lt;str name="2"&gt;
0.07342677 = (MATCH) sum of:
  0.07342677 = (MATCH) sum of:
    8.957935E-4 = (MATCH) max plus 0.01 times others of:
      8.9543534E-4 = (MATCH) weight(author:joe^10.0 in 1), product of:
        0.0024097771 = queryWeight(author:joe^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.3715843 = (MATCH) fieldWeight(author:joe in 1), product of:
          1.0 = tf(termFreq(author:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.625 = fieldNorm(field=author, doc=1)
      3.5817415E-5 = (MATCH) weight(description:joe in 1), product of:
        2.4097772E-4 = queryWeight(description:joe), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:joe in 1), product of:
          1.0 = tf(termFreq(description:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=1)
    8.957935E-4 = (MATCH) max plus 0.01 times others of:
      8.9543534E-4 = (MATCH) weight(author:blow^10.0 in 1), product of:
        0.0024097771 = queryWeight(author:blow^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.3715843 = (MATCH) fieldWeight(author:blow in 1), product of:
          1.0 = tf(termFreq(author:blow)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.625 = fieldNorm(field=author, doc=1)
      3.5817415E-5 = (MATCH) weight(description:blow in 1), product of:
        2.4097772E-4 = queryWeight(description:blow), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:blow in 1), product of:
          1.0 = tf(termFreq(description:blow)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=1)
    0.07163518 = (MATCH) max plus 0.01 times others of:
      0.07163482 = (MATCH) weight(title:book^1000.0 in 1), product of:
        0.2409777 = queryWeight(title:book^1000.0), product of:
          1000.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.29726744 = (MATCH) fieldWeight(title:book in 1), product of:
          1.0 = tf(termFreq(title:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=title, doc=1)
      3.5817415E-5 = (MATCH) weight(description:book in 1), product of:
        2.4097772E-4 = queryWeight(description:book), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:book in 1), product of:
          1.0 = tf(termFreq(description:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=1)
&lt;/str&gt;
    &lt;str name="1"&gt;
0.073365316 = (MATCH) sum of:
  0.073365316 = (MATCH) sum of:
    7.1670645E-4 = (MATCH) max plus 0.01 times others of:
      7.163483E-4 = (MATCH) weight(author:joe^10.0 in 0), product of:
        0.0024097771 = queryWeight(author:joe^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.29726744 = (MATCH) fieldWeight(author:joe in 0), product of:
          1.0 = tf(termFreq(author:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=author, doc=0)
      3.5817415E-5 = (MATCH) weight(description:joe in 0), product of:
        2.4097772E-4 = queryWeight(description:joe), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:joe in 0), product of:
          1.0 = tf(termFreq(description:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=0)
    0.0010134276 = (MATCH) max plus 0.01 times others of:
      0.0010130694 = (MATCH) weight(author:blow^10.0 in 0), product of:
        0.0024097771 = queryWeight(author:blow^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.42039964 = (MATCH) fieldWeight(author:blow in 0), product of:
          1.4142135 = tf(termFreq(author:blow)=2)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=author, doc=0)
      3.5817415E-5 = (MATCH) weight(description:blow in 0), product of:
        2.4097772E-4 = queryWeight(description:blow), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:blow in 0), product of:
          1.0 = tf(termFreq(description:blow)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=0)
    0.07163518 = (MATCH) max plus 0.01 times others of:
      0.07163482 = (MATCH) weight(title:book^1000.0 in 0), product of:
        0.2409777 = queryWeight(title:book^1000.0), product of:
          1000.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.29726744 = (MATCH) fieldWeight(title:book in 0), product of:
          1.0 = tf(termFreq(title:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=title, doc=0)
      3.5817415E-5 = (MATCH) weight(description:book in 0), product of:
        2.4097772E-4 = queryWeight(description:book), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:book in 0), product of:
          1.0 = tf(termFreq(description:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=0)
    &lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;/response&gt;</pre>
<h4 class="brush:xml">First document</h4>
<p><a href="http://solr.pl/wp-content/uploads/2012/02/tie001_doc12.png"><img fetchpriority="high" decoding="async" class="alignnone size-full wp-image-2088" title="tie001_doc1" src="http://solr.pl/wp-content/uploads/2012/02/tie001_doc12.png" alt="" width="600" height="248"></a></p>
<h4>Second document</h4>
<p><a href="http://solr.pl/wp-content/uploads/2012/02/tie001_doc21.png"><img decoding="async" class="alignnone size-full wp-image-2089" title="tie001_doc2" src="http://solr.pl/wp-content/uploads/2012/02/tie001_doc21.png" alt="" width="600" height="248"></a></p>
<h4>What can we say about that ?</h4>
<p>As you can see, when we passed the 0.01 value to the <em>tie</em>&nbsp;parameter, only those fields that have the highest score for the given query word are most influential. Good example of that behavior is the <em>book</em>&nbsp;word in the first document on the results list. Score for that word is&nbsp;<code>0.07163518</code>, which was calculated as the sum of the highest scored field (the <em>title</em>&nbsp;field) and the sum of the rest of the fields multiplied by <em>tie.</em></p>
<h3>Tie == 0.99 result</h3>
<p>The second query sent to Solr looked as follows:
</p>
<pre class="brush:xml">defType=dismax&amp;qf=title^1000 description author^10&amp;tie=0.99&amp;fl=id,score&amp;debugQuery=on&amp;indent=true&amp;q=joe blow book</pre>
<p>Which resulted in the following Solr results: (visualization &#8211; <a href="http://explain.solr.pl/explains/1w7b06lv" target="_blank" rel="noopener noreferrer">http://explain.solr.pl/explains/1w7b06lv</a>):
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;lst name="responseHeader"&gt;
  &lt;int name="status"&gt;0&lt;/int&gt;
  &lt;int name="QTime"&gt;15&lt;/int&gt;
  &lt;lst name="params"&gt;
    &lt;str name="fl"&gt;id,score&lt;/str&gt;
    &lt;str name="debugQuery"&gt;on&lt;/str&gt;
    &lt;str name="indent"&gt;true&lt;/str&gt;
    &lt;str name="tie"&gt;0.99&lt;/str&gt;
    &lt;str name="q"&gt;joe blow book&lt;/str&gt;
    &lt;str name="qf"&gt;title^1000 description author^10&lt;/str&gt;
    &lt;str name="defType"&gt;dismax&lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;result name="response" numFound="2" start="0" maxScore="0.07352995"&gt;
  &lt;doc&gt;
    &lt;float name="score"&gt;0.07352995&lt;/float&gt;
    &lt;str name="id"&gt;2&lt;/str&gt;
  &lt;/doc&gt;
  &lt;doc&gt;
    &lt;float name="score"&gt;0.0734685&lt;/float&gt;
    &lt;str name="id"&gt;1&lt;/str&gt;
  &lt;/doc&gt;
&lt;/result&gt;
&lt;lst name="debug"&gt;
  &lt;str name="rawquerystring"&gt;joe blow book&lt;/str&gt;
  &lt;str name="querystring"&gt;joe blow book&lt;/str&gt;
  &lt;str name="parsedquery"&gt;+((DisjunctionMaxQuery((author:joe^10.0 | title:joe^1000.0 | description:joe)~0.99) DisjunctionMaxQuery((author:blow^10.0 | title:blow^1000.0 | description:blow)~0.99) DisjunctionMaxQuery((author:book^10.0 | title:book^1000.0 | description:book)~0.99))~3) ()&lt;/str&gt;
  &lt;str name="parsedquery_toString"&gt;+(((author:joe^10.0 | title:joe^1000.0 | description:joe)~0.99 (author:blow^10.0 | title:blow^1000.0 | description:blow)~0.99 (author:book^10.0 | title:book^1000.0 | description:book)~0.99)~3) ()&lt;/str&gt;
  &lt;lst name="explain"&gt;
    &lt;str name="2"&gt;
0.07352995 = (MATCH) sum of:
  0.07352995 = (MATCH) sum of:
    9.308678E-4 = (MATCH) max plus 0.99 times others of:
      8.9540955E-4 = (MATCH) weight(author:joe^10.0 in 1), product of:
        0.0024097078 = queryWeight(author:joe^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.3715843 = (MATCH) fieldWeight(author:joe in 1), product of:
          1.0 = tf(termFreq(author:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.625 = fieldNorm(field=author, doc=1)
      3.581638E-5 = (MATCH) weight(description:joe in 1), product of:
        2.4097077E-4 = queryWeight(description:joe), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:joe in 1), product of:
          1.0 = tf(termFreq(description:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=1)
    9.308678E-4 = (MATCH) max plus 0.99 times others of:
      8.9540955E-4 = (MATCH) weight(author:blow^10.0 in 1), product of:
        0.0024097078 = queryWeight(author:blow^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.3715843 = (MATCH) fieldWeight(author:blow in 1), product of:
          1.0 = tf(termFreq(author:blow)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.625 = fieldNorm(field=author, doc=1)
      3.581638E-5 = (MATCH) weight(description:blow in 1), product of:
        2.4097077E-4 = queryWeight(description:blow), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:blow in 1), product of:
          1.0 = tf(termFreq(description:blow)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=1)
    0.071668215 = (MATCH) max plus 0.99 times others of:
      0.07163276 = (MATCH) weight(title:book^1000.0 in 1), product of:
        0.24097076 = queryWeight(title:book^1000.0), product of:
          1000.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.29726744 = (MATCH) fieldWeight(title:book in 1), product of:
          1.0 = tf(termFreq(title:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=title, doc=1)
      3.581638E-5 = (MATCH) weight(description:book in 1), product of:
        2.4097077E-4 = queryWeight(description:book), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:book in 1), product of:
          1.0 = tf(termFreq(description:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=1)
&lt;/str&gt;
    &lt;str name="1"&gt;
0.0734685 = (MATCH) sum of:
  0.0734685 = (MATCH) sum of:
    7.517859E-4 = (MATCH) max plus 0.99 times others of:
      7.1632763E-4 = (MATCH) weight(author:joe^10.0 in 0), product of:
        0.0024097078 = queryWeight(author:joe^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.29726744 = (MATCH) fieldWeight(author:joe in 0), product of:
          1.0 = tf(termFreq(author:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=author, doc=0)
      3.581638E-5 = (MATCH) weight(description:joe in 0), product of:
        2.4097077E-4 = queryWeight(description:joe), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:joe in 0), product of:
          1.0 = tf(termFreq(description:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=0)
    0.0010484984 = (MATCH) max plus 0.99 times others of:
      0.0010130403 = (MATCH) weight(author:blow^10.0 in 0), product of:
        0.0024097078 = queryWeight(author:blow^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.42039964 = (MATCH) fieldWeight(author:blow in 0), product of:
          1.4142135 = tf(termFreq(author:blow)=2)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=author, doc=0)
      3.581638E-5 = (MATCH) weight(description:blow in 0), product of:
        2.4097077E-4 = queryWeight(description:blow), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:blow in 0), product of:
          1.0 = tf(termFreq(description:blow)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=0)
    0.071668215 = (MATCH) max plus 0.99 times others of:
      0.07163276 = (MATCH) weight(title:book^1000.0 in 0), product of:
        0.24097076 = queryWeight(title:book^1000.0), product of:
          1000.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.29726744 = (MATCH) fieldWeight(title:book in 0), product of:
          1.0 = tf(termFreq(title:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=title, doc=0)
      3.581638E-5 = (MATCH) weight(description:book in 0), product of:
        2.4097077E-4 = queryWeight(description:book), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:book in 0), product of:
          1.0 = tf(termFreq(description:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=0)
    &lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;/response&gt;</pre>
<h4>First document</h4>
<p><a href="http://solr.pl/wp-content/uploads/2012/02/tie099_doc11.png"><img decoding="async" class="alignnone size-full wp-image-2091" title="tie099_doc1" src="http://solr.pl/wp-content/uploads/2012/02/tie099_doc11.png" alt="" width="600" height="245"></a></p>
<h4>Second document</h4>
<p><a href="http://solr.pl/wp-content/uploads/2012/02/tie099_doc21.png"><img loading="lazy" decoding="async" class="alignnone size-full wp-image-2092" title="tie099_doc2" src="http://solr.pl/wp-content/uploads/2012/02/tie099_doc21.png" alt="" width="600" height="248"></a></p>
<h4>What can we say about that ?</h4>
<p>As you can see the score of the result documents changed. Let&#8217;s have a look at the same document and the same <em>book</em>&nbsp;word. In the case, we sent 0.99 as the value of the <em>tie </em>parameter and the score value of that word increased comparing to the score when using <em>tie</em>&nbsp;of 0.01. Of course, the change is not only because the <em>tie</em>&nbsp;parameter but also because of normalization, but let&#8217;s forget about it for things to be simple <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /> So, in the second case, we see the score of&nbsp;<code>0.071668215</code>, which is the score of the <em>title</em>&nbsp;field summed with the score of the other fields multiplied by 0.99 (<em>tie</em>&nbsp;parameter value).</p>
<h3>To sum up</h3>
<p>As you can see, the <em>tie</em>&nbsp;parameter allows us, to control how the score is calculated for the DisjunctionMaxQuery. In extreme cases, when we only want the highest scored fields to contribute to the total score we can set the <em>tie </em>parameter to 0.0. <em>Tie</em>&nbsp;lets us control, how we want the low scoring fields to be treated, when score of the documents is calculated and thus where they are on the results list we get from Solr when using Dismax query parser.</p>
<h3>In case you are wodering</h3>
<p>In case you are wondering what we used to show you the diagrams &#8211; please go to&nbsp;<a href="http://explain.solr.pl/help" target="_blank" rel="noopener noreferrer">http://explain.solr.pl/help</a> and see, maybe&nbsp;<a href="http://explain.solr.pl/" target="_blank" rel="noopener noreferrer">http://explain.solr.pl/</a> may be helpful in Your case.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2012/02/06/what-can-we-use-dismax-tie-parameter-for/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Solr and PhraseQuery &#8211; phrase bonus in query stage</title>
		<link>https://solr.pl/en/2010/07/14/solr-and-phrasequery-phrase-bonus-in-query-stage/</link>
					<comments>https://solr.pl/en/2010/07/14/solr-and-phrasequery-phrase-bonus-in-query-stage/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Wed, 14 Jul 2010 09:19:38 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[boosting]]></category>
		<category><![CDATA[dismax]]></category>
		<category><![CDATA[edismax]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[phrase]]></category>
		<category><![CDATA[phrase query]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[standard]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=54</guid>

					<description><![CDATA[In the majority of system implementations I dealt with, sooner or later, there was a problem &#8211; search results tunning. One of the simplest ways to improve the search results quality was phrase boosting. Having the three most popular query]]></description>
										<content:encoded><![CDATA[<p>In the majority of system implementations I dealt with, sooner or later, there was a problem &#8211; search results tunning. One of the simplest ways to improve the search results quality was phrase boosting. Having the three most popular query parsers in Solr and the variety of parameters to control them I though it will be a good idea to check how they behave and how they affect performance.</p>
<p><span id="more-54"></span></p>
<p>In the current <em>trunk</em> of Solr we have three query parsers:</p>
<ul>
<li>Standard Solr Query Parser &#8211; default parser for Solr based on Lucene query parser</li>
<li>DisMax Query Parser</li>
<li>Extended DisMax Query Parser</li>
</ul>
<p>Each of the mentioned query parsers have it`s own capabilities in case of phrase boosting on query stage. I won`t mention index time term proximity in this post &#8211; I`ll get back to it some other time. So, about the parsers now.</p>
<p><strong>Standard Solr Query Parser</strong></p>
<p>Parser based on Standard Lucene Query Parser and enhancing it`s parent capabilities. When it comes to phrase boosting, we don`t have much choice. Lets say, that our system is a search system for large Internet library, where users can rate books, leave comments and discuss books in the library forums. Our goal is to index all the data generated by the users and our suppliers and then represent this data in our search results. When user search for &#8220;Java design patterns&#8221;&nbsp; we want to show him the books that have those words in a document. No problem, lets make a Solr query like this:</p>
<p><code>q=java+design+patterns</code></p>
<p>So we get the results and we can say that our search engine is behaving well and we don`t want to improve search quality. But I would add another part to the query &#8211; part that would favor document which have a phrase (words given to the query are next to each other in the document) in the search-able fields. It`s an easy step, our modified query would look like this:<br />
<code><br />
q=java+design+patterns+OR+"java+design+patterns"^30</code></p>
<p>By adding that additional query part (<em>+OR+&#8221;java+design+patterns&#8221;^30</em>) we modified our search results &#8211; by adding that part, on the first position in our result we now have books which have the exact phrase in the search fields. Lucene query generated by the parser look like that:
</p>
<pre class="brush:xml">&lt;str name="parsedquery"&gt;name:java name:design name:patterns PhraseQuery(name:"java design patterns"^30.0)&lt;/str&gt;
&lt;str name="parsedquery_toString"&gt;name:java name:design name:patterns name:"java design patterns"^30.0&lt;/str&gt;</pre>
<p>Search results for above query as follows:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;lst name="responseHeader"&gt;
   &lt;int name="status"&gt;0&lt;/int&gt;
   &lt;int name="QTime"&gt;0&lt;/int&gt;
   &lt;lst name="params"&gt;
      &lt;str name="q"&gt;java design patterns OR "java design patterns"^30&lt;/str&gt;
      &lt;str name="fl"&gt;score,id,name&lt;/str&gt;
   &lt;/lst&gt;
&lt;/lst&gt;
&lt;result name="response" numFound="5" start="0" maxScore="1.2399161"&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;1.2399161&lt;/float&gt;
      &lt;str name="id"&gt;1&lt;/str&gt;
      &lt;str name="name"&gt;Java design patterns&lt;/str&gt;
   &lt;/doc&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;0.010219089&lt;/float&gt;
      &lt;str name="id"&gt;2&lt;/str&gt;
      &lt;str name="name"&gt;Design patterns java&lt;/str&gt;
   &lt;/doc&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;0.010219089&lt;/float&gt;
      &lt;str name="id"&gt;3&lt;/str&gt;
      &lt;str name="name"&gt;Design java patterns&lt;/str&gt;
   &lt;/doc&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;0.010219089&lt;/float&gt;
      &lt;str name="id"&gt;4&lt;/str&gt;
      &lt;str name="name"&gt;Patterns design java&lt;/str&gt;
   &lt;/doc&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;0.010219089&lt;/float&gt;
      &lt;str name="id"&gt;5&lt;/str&gt;
      &lt;str name="name"&gt;Patterns java design&lt;/str&gt;
   &lt;/doc&gt;
&lt;/result&gt;
&lt;/response&gt;</pre>
<p><strong>DisMax Query Parser</strong></p>
<p>In addition to constructing queries in such a manner as described above, we can use the parameter <strong>pf</strong> and modify its behavior by using the <strong>ps</strong> parameter. <strong>Pf</strong> parameter provide information about the fields in which phrases will be identified. <strong>Pf p</strong>arameter is often used in a manner analogous to the parameter <strong>qf</strong> specifying a list of search-able fields. In addition to that, we must specify the boost parameter for the phrase otherwise the default boost will be taken into consideration. The query using DisMax would look like that:</p>
<p><code>q=java+design+patterns&amp;defType=dismax&amp;qf=name&amp;pf=name^30&amp;ps=0</code></p>
<p>While the query passed to Lucene looks as follows:
</p>
<pre class="brush:xml">&lt;str name="parsedquery"&gt;+((DisjunctionMaxQuery((name:java)) DisjunctionMaxQuery((name:design)) DisjunctionMaxQuery((name:patterns)))~3) DisjunctionMaxQuery((name:"java design patterns"^30.0))&lt;/str&gt;
&lt;str name="parsedquery_toString"&gt;+(((name:java) (name:design) (name:patterns))~3) (name:"java design patterns"^30.0)&lt;/str&gt;</pre>
<p>The results for the query thus constructed are as follows:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;lst name="responseHeader"&gt;
   &lt;int name="status"&gt;0&lt;/int&gt;
   &lt;int name="QTime"&gt;0&lt;/int&gt;
   &lt;lst name="params"&gt;
      &lt;str name="pf"&gt;name^30&lt;/str&gt;
      &lt;str name="fl"&gt;id,name,score&lt;/str&gt;
      &lt;str name="q"&gt;java design patterns&lt;/str&gt;
      &lt;str name="qf"&gt;name&lt;/str&gt;
      &lt;str name="defType"&gt;dismax&lt;/str&gt;
      &lt;str name="ps"&gt;0&lt;/str&gt;
   &lt;/lst&gt;
&lt;/lst&gt;
&lt;result name="response" numFound="5" start="0" maxScore="1.2399161"&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;1.2399161&lt;/float&gt;
      &lt;str name="id"&gt;1&lt;/str&gt;
      &lt;str name="name"&gt;Java design patterns&lt;/str&gt;
   &lt;/doc&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;0.013625451&lt;/float&gt;
      &lt;str name="id"&gt;2&lt;/str&gt;
      &lt;str name="name"&gt;Design patterns java&lt;/str&gt;
   &lt;/doc&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;0.013625451&lt;/float&gt;
      &lt;str name="id"&gt;3&lt;/str&gt;
      &lt;str name="name"&gt;Design java patterns&lt;/str&gt;
   &lt;/doc&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;0.013625451&lt;/float&gt;
      &lt;str name="id"&gt;4&lt;/str&gt;
      &lt;str name="name"&gt;Patterns design java&lt;/str&gt;
   &lt;/doc&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;0.013625451&lt;/float&gt;
      &lt;str name="id"&gt;5&lt;/str&gt;
      &lt;str name="name"&gt;Patterns java design&lt;/str&gt;
   &lt;/doc&gt;
&lt;/result&gt;
&lt;/response&gt;</pre>
<p>It is noteworthy that the order of results for both methods is the same. This follows from the fact, that the phrase has been identified only in the document with the id of 1.Look that there is no difference in the value of <em>score</em> for the first document in both methods. Of course the other documents, located on positions from 2 to 5, are in both cases on the same positions, but have different <em>score</em> values because of the difference in query passed to Lucene.</p>
<p>But, I used the <strong>ps</strong> parameter (set to 0) and didn`t mention why I did it. When You use the <strong>pf</strong> (and pf2, but more on that later) parameter, the <strong>ps</strong> parameter mean <em>Phrase Slop</em> &#8211; a maximum distance of words from each other to form a phrase. For instance, <strong>ps=2</strong> will mean that the words can be a maximum of two places from each other to form a phrase. Note, however, that despite the fact that both the &#8220;Java sample design patterns&#8221; and &#8220;Java design patterns&#8221; will create a phrase, but the document entitled &#8220;Java design patterns&#8221; will have a bigger <em>score</em> value, despite the settings <strong>ps=2</strong>, because of terms located closer together.</p>
<p><strong>Extended DisMax Query Parser</strong></p>
<p>Unfortunately without the use of trunk You can not use eDisMax. But, anyway, the query using eDisMax <em>Enhanced Term Proximity Boosting</em> would look like that:</p>
<p><code>q=java+design+patterns&amp;defType=edismax&amp;qf=name&amp;pf2=name^30&amp;ps=0</code></p>
<p>The above query creates the following query to Lucene:
</p>
<pre class="brush:xml">&lt;str name="parsedquery"&gt;+(DisjunctionMaxQuery((name:java)) DisjunctionMaxQuery((name:design)) DisjunctionMaxQuery((name:patterns))) (DisjunctionMaxQuery((name:"java design"^30.0)) DisjunctionMaxQuery((name:"design patterns"^30.0)))&lt;/str&gt;
&lt;str name="parsedquery_toString"&gt;+((name:java) (name:design) (name:patterns)) ((name:"java design"^30.0) (name:"design patterns"^30.0))&lt;/str&gt;</pre>
<p>As seen, in addition to the standard DisjunctionMaxQuery produced by DisMax (and this its expanded version), extended DisMax parser also produced two additional queries &#8211; the ones responsible for <em>enhanced term proximity boosting</em>.&nbsp; The additional queries boosts pair of word created from the terms in the user query. In the presented case the created test pairs were &#8220;java design&#8221; and &#8220;design patterns&#8221;. As you can guess the most significant documents in the results list, documents will be generated by having both pairs, the next document will have one of the pair, and another will not have any. As proof I present the result of the above query send to Solr:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;lst name="responseHeader"&gt;
   &lt;int name="status"&gt;0&lt;/int&gt;
   &lt;int name="QTime"&gt;0&lt;/int&gt;
   &lt;lst name="params"&gt;
      &lt;str name="fl"&gt;id,name,score&lt;/str&gt;
      &lt;str name="q"&gt;java design patterns&lt;/str&gt;
      &lt;str name="qf"&gt;name&lt;/str&gt;
      &lt;str name="pf2"&gt;name^30&lt;/str&gt;
      &lt;str name="defType"&gt;edismax&lt;/str&gt;
      &lt;str name="ps"&gt;0&lt;/str&gt;
   &lt;/lst&gt;
&lt;/lst&gt;
&lt;result name="response" numFound="5" start="0" maxScore="1.1705827"&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;1.1705827&lt;/float&gt;
      &lt;str name="id"&gt;1&lt;/str&gt;
      &lt;str name="name"&gt;Java design patterns&lt;/str&gt;
   &lt;/doc&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;0.3034844&lt;/float&gt;
      &lt;str name="id"&gt;2&lt;/str&gt;
      &lt;str name="name"&gt;Design patterns java&lt;/str&gt;
   &lt;/doc&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;0.3034844&lt;/float&gt;
      &lt;str name="id"&gt;5&lt;/str&gt;
      &lt;str name="name"&gt;Patterns java design&lt;/str&gt;
   &lt;/doc&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;0.014451639&lt;/float&gt;
      &lt;str name="id"&gt;3&lt;/str&gt;
      &lt;str name="name"&gt;Design java patterns&lt;/str&gt;
   &lt;/doc&gt;
   &lt;doc&gt;
      &lt;float name="score"&gt;0.014451639&lt;/float&gt;
      &lt;str name="id"&gt;4&lt;/str&gt;
      &lt;str name="name"&gt;Patterns design java&lt;/str&gt;
   &lt;/doc&gt;
&lt;/result&gt;
&lt;/response&gt;</pre>
<p>As you can see the first document has not changed its position. The second and third place are the documents that have one of the pairs generated by the parser. As a result documents with id 2 and 5 have the same coefficient<em> score </em>value. The result list is closed by the documents with only terms present in the search-able fields.</p>
<p><strong>Performance</strong></p>
<p>In any case, it must be taken into account that individual features will affect the performance of applications based on Solr. I thought I`ll do a simple performance test. The assumptions of the test are quite simple &#8211; index data from wikipedia and for each phrase boost method create five queries &#8211; each of the queries assembled from two to six tokens. Solr cache disabled, restart of Solr after each query. The result is the arithmetic mean of 10 repetitions of each test. Before the test results, a few words about the index:</p>
<ul>
<li>Number of documents in the index: 1,177,239</li>
<li>Number of segments: 1</li>
<li>Number of terms: 18.506.646</li>
<li>Number of term/document pairs: 230.297.212</li>
<li>Number of tokens: 418.135.268</li>
<li>The size of the index: 4.6GB (optimized)</li>
<li>Lucene version used to build the index: 4.0-dev 964000</li>
</ul>
<p>Phrases that were selected for each iteration of the test:</p>
<ul>
<li>Iteration I: &#8220;Great Peter&#8221;</li>
<li>Iteration II: &#8220;World War Two&#8221;</li>
<li>Iteration III: &#8220;World War Two Germany&#8221;</li>
<li>Iteration IV: &#8220;Move Time Eastern Poland Reformation&#8221;</li>
<li>Iteration V: &#8220;Change Winter Cloths To Summer Cloths Now&#8221;</li>
</ul>
<p>The results were as follows:</p>
[table “1” not found /]<br />

<p>Please note that the reported results concern only the issue of performance and are not suggesting a method of phrase boosting. The choice of method is a matter of requirements and implementation. As for the results, you can see that the DisMax method is the quickest one.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2010/07/14/solr-and-phrasequery-phrase-bonus-in-query-stage/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
