<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>search &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/search/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Sat, 14 Nov 2020 14:14:55 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>Solr 6.5 and large stored fields &#8211; quick look</title>
		<link>https://solr.pl/en/2017/05/01/solr-6-5-and-large-stored-fields-quick-look/</link>
					<comments>https://solr.pl/en/2017/05/01/solr-6-5-and-large-stored-fields-quick-look/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 01 May 2017 13:14:28 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[cache]]></category>
		<category><![CDATA[document]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=927</guid>

					<description><![CDATA[As we know Solr has a few caches, for example &#8211; filterCache for filters, queryResultCache for query results caching and of course the documentCache for caching documents for fast retrieval. Today we will focus on the last of the mentioned]]></description>
										<content:encoded><![CDATA[<p>As we know Solr has a few caches, for example &#8211; <em>filterCache</em> for filters, <em>queryResultCache</em> for query results caching and of course the <em>documentCache</em> for caching documents for fast retrieval. Today we will focus on the last of the mentioned caches and what can be done to better utilize the cache if you use it.</p>
<p><span id="more-927"></span></p>
<h3>The problem</h3>
<p>When <em>documentCache</em> is present in <em>solrconfig.xml</em> after the first time a field is retrieved from Lucene Solr will cache its value along with the document and store it in the <em>documentCache</em>. This can be very expensive, especially for large stored fields &#8211; image a situation when you have the documents OCRed from a book and you show the content of the pages. If you don&#8217;t reuse such data, so basically a lot hit ration in the <em>documentCache</em>, will result in more garbage produced by Solr itself and thus JVM garbage collector having harder time to clean that up. That can lead to higher CPU usage and worse performance of Solr in general. Let&#8217;s look at what we can do with such large, stored fields.</p>
<h3>Marking the field as large</h3>
<p>Starting with Solr 6.5 we got the ability to add additional property to the field definition, one called <em>large</em> which takes a value of <em>true</em> or <em>false</em> by default being&nbsp;<em>false</em>. Field that we want to mark as large should be set as <em>stored=&#8221;true&#8221;</em> and <em>multiValued=&#8221;false&#8221;</em>. In such cases, setting the <em>large=&#8221;true&#8221;</em> property on the field definition will make the field value not cached inside the <em>documentCache.</em></p>
<h3>Noticing the difference</h3>
<p>Because this is a <em>quick look</em> type of post, I don&#8217;t want to get into too much specifics, but I would like to compare two collections with the same data. Each collection have the same set of fields:</p>
<ul>
<li><em>id</em> &#8211; identifier of the document,</li>
<li><em>name</em> &#8211; name of the document,</li>
<li><em>body</em> &#8211; text of the document, which can be very, very large.</li>
</ul>
<p>One collection will have the <em>large=&#8221;true&#8221;</em> for the <em>body</em> field and the other won&#8217;t have that property set. We will also index a few large documents and see how <em>documentCache</em> behaves.</p>
<p>So here are the commands to setup those two collections using Solr.pl Github account (<a href="https://github.com/solrpl/">https://github.com/solrpl/</a>). First setup one collection and gather statistics and then remove all the files, restart Solr, create the second collection and gather statistics. The commands are as follows:
</p>
<pre class="brush:xml">$ mkdir /tmp/solr
$ mkdir /tmp/solr/collection_with_large
$ mkdir /tmp/solr/collection_without_large
$ wget https://github.com/solrpl/blog/tree/master/posts/large_field/data.xml /tmp/solr/data.xml
$ wget https://github.com/solrpl/blog/tree/master/posts/large_field/collection_with_large/managed-schema /tmp/solr/collection_with_large/managed-schema
$ wget https://github.com/solrpl/blog/tree/master/posts/large_field/collection_with_large/solrconfig.xml /tmp/solr/collection_with_large/solrconfig.xml
$ wget https://github.com/solrpl/blog/tree/master/posts/large_field/collection_without_large/managed-schema /tmp/solr/collection_without_large/managed-schema
$ wget https://github.com/solrpl/blog/tree/master/posts/large_field/collection_without_large/solrconfig.xml /tmp/solr/collection_without_large/solrconfig.xml
$ bin/solr zk upconfig -z localhost:9983 -n config_with_large -d /tmp/collection_with_large
$ bin/solr create_collection -c collection_with_large -n config_with_large -shards 1 -replicationFactor 1
$ curl -XPOST 'localhost:8983/solr/collection_with_large/update?commit=true' -H 'Content-Type:application/xml' --data-binary @/tmp/solr/data.xml
$ curl 'localhost:8983/solr/collection_with_large/select?q=*:*'</pre>
<p>And now let&#8217;s create the second collection using the downloaded data:
</p>
<pre class="brush:xml">$ bin/solr zk upconfig -z localhost:9983 -n config_without_large -d /tmp/collection_without_large
$ bin/solr create_collection -c collection_without_large -n config_without_large -shards 1 -replicationFactor 1
$ curl -XPOST 'localhost:8983/solr/collection_without_large/update?commit=true' -H 'Content-Type:application/xml' --data-binary @/tmp/solr/data.xml
$ curl 'localhost:8983/solr/collection_without_large/select?q=*:*'</pre>
<p>And now, let&#8217;s check the usage of the <em>documentCache</em> that we&#8217;ve gathered. So we have this for the collection with the <em>body</em> field marked as <em>large=&#8221;true&#8221;</em>:</p>
<p><a href="http://solr.pl/wp-content/uploads/2017/04/field_with_large.png"><img decoding="async" class="aligncenter wp-image-3945 size-medium" src="http://solr.pl/wp-content/uploads/2017/04/field_with_large-300x65.png" alt="" width="300" height="65"></a></p>
<p>And we have this for the collection with the <em>body</em> field without the <em>large=&#8221;true&#8221;</em> property:</p>
<p><a href="http://solr.pl/wp-content/uploads/2017/04/field_without_large.png"><img decoding="async" class="aligncenter wp-image-3946 size-medium" src="http://solr.pl/wp-content/uploads/2017/04/field_without_large-300x70.png" alt="" width="300" height="70"></a></p>
<p>As you can see, the field marked with <em>large=&#8221;true&#8221;</em> was not put into the <em>documentCache</em> directly, but only as a lazy loaded large field, which is what we were aiming for. This means, that we can still use the <em>documentCache</em> and not worry about Solr putting the large, stored fields there, which was the case in the second example.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2017/05/01/solr-6-5-and-large-stored-fields-quick-look/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Simple photo search</title>
		<link>https://solr.pl/en/2012/02/20/simple-photo-search/</link>
					<comments>https://solr.pl/en/2012/02/20/simple-photo-search/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 20 Feb 2012 22:41:31 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[cell]]></category>
		<category><![CDATA[exif]]></category>
		<category><![CDATA[extract]]></category>
		<category><![CDATA[photo]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[solr cell]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=441</guid>

					<description><![CDATA[Recently we had a change to help with a non-commercial project which included search as its part. One of the assumptions, although not the key ones, was the photo search functionality, so that the user could find the pictures fast]]></description>
										<content:encoded><![CDATA[<p>Recently we had a change to help with a non-commercial project which included search as its part. One of the assumptions, although not the key ones, was the photo search functionality, so that the user could find the pictures fast and accurately. Because the search had to work with meta data of JPEG files, the idea was simple &#8211; use Apache Solr with Apache Tika.</p>
<p><span id="more-441"></span></p>
<h3>Assumptions</h3>
<p>Assumptions were quite simple &#8211; the user should be able to find photos by their file name, author and other data available in EXIF, like aperture, shutter speed, focal length or ISO value. Another thing was that Solr should take care of grabbing the meta data from JPEG files, so this was definitely something we wanted use Solr cell for. As You can see, those assumptions were simple.</p>
<h3>Index structure</h3>
<p>Index structure was very simple and contained only most needed fields. The fields section of the <em>schema.xml</em> file looked as follows:
</p>
<pre class="brush:xml">&lt;field name="id" type="string" indexed="true" stored="true" required="true" /&gt;
&lt;field name="name" type="text" indexed="true" stored="true" /&gt;
&lt;field name="author" type="text" indexed="true" stored="true" /&gt;
&lt;field name="iso" type="text" indexed="true" stored="true" multiValued="true" /&gt;
&lt;field name="iso_string" type="text" indexed="true" stored="true" multiValued="true" /&gt;
&lt;field name="aperture" type="double" indexed="true" stored="true" /&gt;
&lt;field name="exposure" type="string" indexed="true" stored="true" /&gt;
&lt;field name="exposure_time" type="double" indexed="true" stored="true" /&gt;
&lt;field name="focal" type="string" indexed="true" stored="true" /&gt;
&lt;field name="focal_35" type="string" indexed="true" stored="true" /&gt;
&lt;dynamicField name="ignored_*" type="string" indexed="false" stored="false" multiValued="true" /&gt;</pre>
<p>The dynamic field was added to ignore the data we weren&#8217;t interested in. Also the <em>copyField </em>was introduced to copy the <em>iso</em> field value to <em>iso_string</em> field to enable faceting.</p>
<h3>Solr configuration</h3>
<p>The following handler definition was added to <em>solrconfig.xml </em>file:
</p>
<pre class="brush:xml">&lt;requestHandler name="/update/extract" class="solr.extraction.ExtractingRequestHandler"&gt;
 &lt;lst name="defaults"&gt;
  &lt;str name="uprefix"&gt;ignored_&lt;/str&gt;
  &lt;str name="lowernames"&gt;true&lt;/str&gt;
  &lt;str name="captureAttr"&gt;true&lt;/str&gt;
  &lt;str name="fmap.stream_name"&gt;name&lt;/str&gt;
  &lt;str name="fmap.artist"&gt;author&lt;/str&gt;
  &lt;str name="fmap.exif_isospeedratings"&gt;iso&lt;/str&gt;
  &lt;str name="fmap.exif_fnumber"&gt;aperture&lt;/str&gt;
  &lt;str name="fmap.exposure_time"&gt;exposure&lt;/str&gt;
  &lt;str name="fmap.exif_exposuretime"&gt;exposure_time&lt;/str&gt;
  &lt;str name="fmap.focal_length"&gt;focal&lt;/str&gt;
  &lt;str name="fmap.focal_length_35"&gt;focal_35&lt;/str&gt;
 &lt;/lst&gt;
&lt;/requestHandler&gt;</pre>
<p>A few words about configuration. The <em>uprefix</em> parameter tells Solr which prefix it should use for the fields that were not mentioned explicitly in the handler configuration. In the above case, the fields which were not mentioned will be prefixed with the <em>ignored_</em> word. That means that they will be matched by the dynamic field and thus they won&#8217;t be indexed (<em>stored=&#8221;false&#8221;</em> and <em>indexed=&#8221;false&#8221;</em>). The <em>lowernames </em>parameter with the value of <em>true</em> will cause all the field names to be lowercased. The <em>captureAttr</em> parameter tell Solr, to catch file attributes. The next parameters in the above configuration is mapping definition between fields returned by Tika and fields in the index. For example, <em>fmap.exif_fnumber</em> with the value of <em>aperture </em>says Solr to place the value of Tika <em>exif_fnumber</em> in the <em>aperture</em> index field.</p>
<h4>Additional, needed libraries</h4>
<p>In order for the above configuration to work we need some additional libraries (similar to the ones described in <a href="http://solr.pl/en/2012/01/23/document-language-identification/" target="_blank" rel="noopener noreferrer">language identification</a>). From the <em>dist</em> directory that is available in Solr distribution we copy the <em>apache-solr-cell-3.5.0.jar</em> file to <em>tikaDir </em>directory that should be created at the same level as the <em>webapps </em>directory in Solr deployment (of course this is an example). Next we add the following like to the <em>solrconfig.xml</em> file:
</p>
<pre class="brush:xml">&lt;lib dir="../tikaLib/" /&gt;</pre>
<p>The above tell Solr to include all the libraries from the given directory. Next we need to copy all the jar files from the <em>contrib/extraction/</em> Solr distribution directory to the created <em>tikaDir</em> directory. Additional <em>solrconfig.xml</em> changes are not needed.</p>
<h3>Data indexation</h3>
<p>The assumptions were, that there will be about 10.000 new photos a week that will need to be indexed. Those photos will be stored in a shared file system location. A simple bash script was responsible for choosing the files that were needed to be indexed and during its work it run the following command for each file:
</p>
<pre class="brush:bash">curl 'http://solrmaster:8983/solr/photos/update/extract?literal.id=9926&amp;commit=true" -F "myfile=@Wisla_2011_10_10.JPG"</pre>
<p>The above command sends a file names Wisla_2011_10_10.JPG to <em>/extract</em> handler and says to run <em>commit</em> command after its processing. In addition to that, the unique id of the file is set (the <em>literal.id</em> parameter).</p>
<h3>Queries</h3>
<p>I addition to some standard filtering by author or other attributes of the photo it was also desired for the search to work. Yeah, just work <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /> We decided, that if we were the users of the application, we would like the fields like author or file name to be important. So, we decided to start with the following query:
</p>
<pre class="brush:xml">q=jan+kowalski+wisla&amp;qf=name^100+author^1000+iso+aperture+exposure_time+focal&amp;defType=dismax</pre>
<p>As you can see, the query is simple. Two fields in the index are more valuable then others &#8211; name of the photo and its author. The value of those fields were set up by adding query time boosts. The rest of the fields are without boost, so the default boost of 1 applies.</p>
<h3>To sum up</h3>
<p>The described deployment is really simple. The applications works as so the search <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /> The next steps that will have to be done is the JVM and Solr tunning. One of the most important things would be looking at the users behavior and tune up searches to make search experience as good as possible. But let&#8217;s leave it for other solr.pl post.</p>
<p>&nbsp;</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2012/02/20/simple-photo-search/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>What can we use Dismax tie parameter for?</title>
		<link>https://solr.pl/en/2012/02/06/what-can-we-use-dismax-tie-parameter-for/</link>
					<comments>https://solr.pl/en/2012/02/06/what-can-we-use-dismax-tie-parameter-for/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 06 Feb 2012 22:40:53 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[dismax]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[tie]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=439</guid>

					<description><![CDATA[Dismax query parser have been with Solr for a long time. Most of the time we use parameters like qf, pf&#160;or mm&#160;forgetting about a very useful parameter which allows us to control how the lower scoring fields are treated &#8211;]]></description>
										<content:encoded><![CDATA[<p>Dismax query parser have been with Solr for a long time. Most of the time we use parameters like <em>qf</em>, <em>pf</em>&nbsp;or <em>mm</em>&nbsp;forgetting about a very useful parameter which allows us to control how the lower scoring fields are treated &#8211; about the <em>tie</em>&nbsp;parameter.</p>
<p><span id="more-439"></span></p>
<h3>Tie</h3>
<p>The <em>tie</em>&nbsp;parameter allows one to control how the lower scoring fields affects score for a given word. If we set the <em>tie</em>&nbsp;parameter to a 0.0 value, during the score calculation, only the fields that were scored highest will matter. However if we set it to 0.99 the fields scoring lower will have almost the same impact on the score as the highest scoring field. So let&#8217;s check if that actually works.</p>
<h3>Data structure and data example</h3>
<p>To test how the <em>tie</em>&nbsp;parameter works I&#8217;ve chosen a simple index structure which would describe products in e-commerce shop, of course in a simple mode:
</p>
<pre class="brush:xml">&lt;field name="id" type="string" indexed="true" stored="true" required="true" /&gt;
&lt;field name="title" type="text_ws" indexed="true" stored="true" /&gt;
&lt;field name="description" type="text_ws" indexed="true" stored="true" /&gt;
&lt;field name="author" type="text_ws" indexed="true" stored="true" multiValued="true" /&gt;</pre>
<p>The&nbsp;<em>text_ws</em>&nbsp;type was defined as follows:
</p>
<pre class="brush:xml">&lt;fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100"&gt;
 &lt;analyzer&gt;
  &lt;tokenizer class="solr.WhitespaceTokenizerFactory"/&gt;
  &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
 &lt;/analyzer&gt;
&lt;/fieldType&gt;</pre>
<p>The example documents look like this:
</p>
<pre class="brush:xml">&lt;add&gt;
 &lt;doc&gt;
  &lt;field name="id"&gt;1&lt;/field&gt;
  &lt;field name="title"&gt;First test book&lt;/field&gt;
  &lt;field name="description"&gt;This is a description of the first test book by Joe and Jane Blow&lt;/field&gt;
  &lt;field name="author"&gt;Joe Blow&lt;/field&gt;
  &lt;field name="author"&gt;Jane Blow&lt;/field&gt;
 &lt;/doc&gt;
 &lt;doc&gt;
  &lt;field name="id"&gt;2&lt;/field&gt;
  &lt;field name="title"&gt;Second test book&lt;/field&gt;
  &lt;field name="description"&gt;This is a description of the second test book by Joe Blow&lt;/field&gt;
  &lt;field name="author"&gt;Joe Blow&lt;/field&gt;
 &lt;/doc&gt;
&lt;/add&gt;</pre>
<h3>Tie == 0.01 result</h3>
<p>Let&#8217;s start the test. The first query was the following one:
</p>
<pre class="brush:xml">defType=dismax&amp;qf=title^1000 description author^10&amp;tie=0.01&amp;fl=id,score&amp;debugQuery=on&amp;indent=true&amp;q=joe blow book</pre>
<p>The above resulted in the following Solr results (visualization &#8211; <a href="http://explain.solr.pl/explains/cf0wnkpj" target="_blank" rel="noopener noreferrer">http://explain.solr.pl/explains/cf0wnkpj</a>):
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;lst name="responseHeader"&gt;
  &lt;int name="status"&gt;0&lt;/int&gt;
  &lt;int name="QTime"&gt;8&lt;/int&gt;
  &lt;lst name="params"&gt;
    &lt;str name="fl"&gt;id,score&lt;/str&gt;
    &lt;str name="debugQuery"&gt;on&lt;/str&gt;
    &lt;str name="indent"&gt;true&lt;/str&gt;
    &lt;str name="tie"&gt;0.01&lt;/str&gt;
    &lt;str name="q"&gt;joe blow book&lt;/str&gt;
    &lt;str name="qf"&gt;title^1000 description author^10&lt;/str&gt;
    &lt;str name="defType"&gt;dismax&lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;result name="response" numFound="2" start="0" maxScore="0.07342677"&gt;
  &lt;doc&gt;
    &lt;float name="score"&gt;0.07342677&lt;/float&gt;
    &lt;str name="id"&gt;2&lt;/str&gt;
  &lt;/doc&gt;
  &lt;doc&gt;
    &lt;float name="score"&gt;0.073365316&lt;/float&gt;
    &lt;str name="id"&gt;1&lt;/str&gt;
  &lt;/doc&gt;
&lt;/result&gt;
&lt;lst name="debug"&gt;
  &lt;str name="rawquerystring"&gt;joe blow book&lt;/str&gt;
  &lt;str name="querystring"&gt;joe blow book&lt;/str&gt;
  &lt;str name="parsedquery"&gt;+((DisjunctionMaxQuery((author:joe^10.0 | title:joe^1000.0 | description:joe)~0.01) DisjunctionMaxQuery((author:blow^10.0 | title:blow^1000.0 | description:blow)~0.01) DisjunctionMaxQuery((author:book^10.0 | title:book^1000.0 | description:book)~0.01))~3) ()&lt;/str&gt;
  &lt;str name="parsedquery_toString"&gt;+(((author:joe^10.0 | title:joe^1000.0 | description:joe)~0.01 (author:blow^10.0 | title:blow^1000.0 | description:blow)~0.01 (author:book^10.0 | title:book^1000.0 | description:book)~0.01)~3) ()&lt;/str&gt;
  &lt;lst name="explain"&gt;
    &lt;str name="2"&gt;
0.07342677 = (MATCH) sum of:
  0.07342677 = (MATCH) sum of:
    8.957935E-4 = (MATCH) max plus 0.01 times others of:
      8.9543534E-4 = (MATCH) weight(author:joe^10.0 in 1), product of:
        0.0024097771 = queryWeight(author:joe^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.3715843 = (MATCH) fieldWeight(author:joe in 1), product of:
          1.0 = tf(termFreq(author:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.625 = fieldNorm(field=author, doc=1)
      3.5817415E-5 = (MATCH) weight(description:joe in 1), product of:
        2.4097772E-4 = queryWeight(description:joe), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:joe in 1), product of:
          1.0 = tf(termFreq(description:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=1)
    8.957935E-4 = (MATCH) max plus 0.01 times others of:
      8.9543534E-4 = (MATCH) weight(author:blow^10.0 in 1), product of:
        0.0024097771 = queryWeight(author:blow^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.3715843 = (MATCH) fieldWeight(author:blow in 1), product of:
          1.0 = tf(termFreq(author:blow)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.625 = fieldNorm(field=author, doc=1)
      3.5817415E-5 = (MATCH) weight(description:blow in 1), product of:
        2.4097772E-4 = queryWeight(description:blow), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:blow in 1), product of:
          1.0 = tf(termFreq(description:blow)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=1)
    0.07163518 = (MATCH) max plus 0.01 times others of:
      0.07163482 = (MATCH) weight(title:book^1000.0 in 1), product of:
        0.2409777 = queryWeight(title:book^1000.0), product of:
          1000.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.29726744 = (MATCH) fieldWeight(title:book in 1), product of:
          1.0 = tf(termFreq(title:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=title, doc=1)
      3.5817415E-5 = (MATCH) weight(description:book in 1), product of:
        2.4097772E-4 = queryWeight(description:book), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:book in 1), product of:
          1.0 = tf(termFreq(description:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=1)
&lt;/str&gt;
    &lt;str name="1"&gt;
0.073365316 = (MATCH) sum of:
  0.073365316 = (MATCH) sum of:
    7.1670645E-4 = (MATCH) max plus 0.01 times others of:
      7.163483E-4 = (MATCH) weight(author:joe^10.0 in 0), product of:
        0.0024097771 = queryWeight(author:joe^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.29726744 = (MATCH) fieldWeight(author:joe in 0), product of:
          1.0 = tf(termFreq(author:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=author, doc=0)
      3.5817415E-5 = (MATCH) weight(description:joe in 0), product of:
        2.4097772E-4 = queryWeight(description:joe), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:joe in 0), product of:
          1.0 = tf(termFreq(description:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=0)
    0.0010134276 = (MATCH) max plus 0.01 times others of:
      0.0010130694 = (MATCH) weight(author:blow^10.0 in 0), product of:
        0.0024097771 = queryWeight(author:blow^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.42039964 = (MATCH) fieldWeight(author:blow in 0), product of:
          1.4142135 = tf(termFreq(author:blow)=2)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=author, doc=0)
      3.5817415E-5 = (MATCH) weight(description:blow in 0), product of:
        2.4097772E-4 = queryWeight(description:blow), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:blow in 0), product of:
          1.0 = tf(termFreq(description:blow)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=0)
    0.07163518 = (MATCH) max plus 0.01 times others of:
      0.07163482 = (MATCH) weight(title:book^1000.0 in 0), product of:
        0.2409777 = queryWeight(title:book^1000.0), product of:
          1000.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.29726744 = (MATCH) fieldWeight(title:book in 0), product of:
          1.0 = tf(termFreq(title:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=title, doc=0)
      3.5817415E-5 = (MATCH) weight(description:book in 0), product of:
        2.4097772E-4 = queryWeight(description:book), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0532142E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:book in 0), product of:
          1.0 = tf(termFreq(description:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=0)
    &lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;/response&gt;</pre>
<h4 class="brush:xml">First document</h4>
<p><a href="http://solr.pl/wp-content/uploads/2012/02/tie001_doc12.png"><img fetchpriority="high" decoding="async" class="alignnone size-full wp-image-2088" title="tie001_doc1" src="http://solr.pl/wp-content/uploads/2012/02/tie001_doc12.png" alt="" width="600" height="248"></a></p>
<h4>Second document</h4>
<p><a href="http://solr.pl/wp-content/uploads/2012/02/tie001_doc21.png"><img loading="lazy" decoding="async" class="alignnone size-full wp-image-2089" title="tie001_doc2" src="http://solr.pl/wp-content/uploads/2012/02/tie001_doc21.png" alt="" width="600" height="248"></a></p>
<h4>What can we say about that ?</h4>
<p>As you can see, when we passed the 0.01 value to the <em>tie</em>&nbsp;parameter, only those fields that have the highest score for the given query word are most influential. Good example of that behavior is the <em>book</em>&nbsp;word in the first document on the results list. Score for that word is&nbsp;<code>0.07163518</code>, which was calculated as the sum of the highest scored field (the <em>title</em>&nbsp;field) and the sum of the rest of the fields multiplied by <em>tie.</em></p>
<h3>Tie == 0.99 result</h3>
<p>The second query sent to Solr looked as follows:
</p>
<pre class="brush:xml">defType=dismax&amp;qf=title^1000 description author^10&amp;tie=0.99&amp;fl=id,score&amp;debugQuery=on&amp;indent=true&amp;q=joe blow book</pre>
<p>Which resulted in the following Solr results: (visualization &#8211; <a href="http://explain.solr.pl/explains/1w7b06lv" target="_blank" rel="noopener noreferrer">http://explain.solr.pl/explains/1w7b06lv</a>):
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;lst name="responseHeader"&gt;
  &lt;int name="status"&gt;0&lt;/int&gt;
  &lt;int name="QTime"&gt;15&lt;/int&gt;
  &lt;lst name="params"&gt;
    &lt;str name="fl"&gt;id,score&lt;/str&gt;
    &lt;str name="debugQuery"&gt;on&lt;/str&gt;
    &lt;str name="indent"&gt;true&lt;/str&gt;
    &lt;str name="tie"&gt;0.99&lt;/str&gt;
    &lt;str name="q"&gt;joe blow book&lt;/str&gt;
    &lt;str name="qf"&gt;title^1000 description author^10&lt;/str&gt;
    &lt;str name="defType"&gt;dismax&lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;result name="response" numFound="2" start="0" maxScore="0.07352995"&gt;
  &lt;doc&gt;
    &lt;float name="score"&gt;0.07352995&lt;/float&gt;
    &lt;str name="id"&gt;2&lt;/str&gt;
  &lt;/doc&gt;
  &lt;doc&gt;
    &lt;float name="score"&gt;0.0734685&lt;/float&gt;
    &lt;str name="id"&gt;1&lt;/str&gt;
  &lt;/doc&gt;
&lt;/result&gt;
&lt;lst name="debug"&gt;
  &lt;str name="rawquerystring"&gt;joe blow book&lt;/str&gt;
  &lt;str name="querystring"&gt;joe blow book&lt;/str&gt;
  &lt;str name="parsedquery"&gt;+((DisjunctionMaxQuery((author:joe^10.0 | title:joe^1000.0 | description:joe)~0.99) DisjunctionMaxQuery((author:blow^10.0 | title:blow^1000.0 | description:blow)~0.99) DisjunctionMaxQuery((author:book^10.0 | title:book^1000.0 | description:book)~0.99))~3) ()&lt;/str&gt;
  &lt;str name="parsedquery_toString"&gt;+(((author:joe^10.0 | title:joe^1000.0 | description:joe)~0.99 (author:blow^10.0 | title:blow^1000.0 | description:blow)~0.99 (author:book^10.0 | title:book^1000.0 | description:book)~0.99)~3) ()&lt;/str&gt;
  &lt;lst name="explain"&gt;
    &lt;str name="2"&gt;
0.07352995 = (MATCH) sum of:
  0.07352995 = (MATCH) sum of:
    9.308678E-4 = (MATCH) max plus 0.99 times others of:
      8.9540955E-4 = (MATCH) weight(author:joe^10.0 in 1), product of:
        0.0024097078 = queryWeight(author:joe^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.3715843 = (MATCH) fieldWeight(author:joe in 1), product of:
          1.0 = tf(termFreq(author:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.625 = fieldNorm(field=author, doc=1)
      3.581638E-5 = (MATCH) weight(description:joe in 1), product of:
        2.4097077E-4 = queryWeight(description:joe), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:joe in 1), product of:
          1.0 = tf(termFreq(description:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=1)
    9.308678E-4 = (MATCH) max plus 0.99 times others of:
      8.9540955E-4 = (MATCH) weight(author:blow^10.0 in 1), product of:
        0.0024097078 = queryWeight(author:blow^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.3715843 = (MATCH) fieldWeight(author:blow in 1), product of:
          1.0 = tf(termFreq(author:blow)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.625 = fieldNorm(field=author, doc=1)
      3.581638E-5 = (MATCH) weight(description:blow in 1), product of:
        2.4097077E-4 = queryWeight(description:blow), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:blow in 1), product of:
          1.0 = tf(termFreq(description:blow)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=1)
    0.071668215 = (MATCH) max plus 0.99 times others of:
      0.07163276 = (MATCH) weight(title:book^1000.0 in 1), product of:
        0.24097076 = queryWeight(title:book^1000.0), product of:
          1000.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.29726744 = (MATCH) fieldWeight(title:book in 1), product of:
          1.0 = tf(termFreq(title:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=title, doc=1)
      3.581638E-5 = (MATCH) weight(description:book in 1), product of:
        2.4097077E-4 = queryWeight(description:book), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:book in 1), product of:
          1.0 = tf(termFreq(description:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=1)
&lt;/str&gt;
    &lt;str name="1"&gt;
0.0734685 = (MATCH) sum of:
  0.0734685 = (MATCH) sum of:
    7.517859E-4 = (MATCH) max plus 0.99 times others of:
      7.1632763E-4 = (MATCH) weight(author:joe^10.0 in 0), product of:
        0.0024097078 = queryWeight(author:joe^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.29726744 = (MATCH) fieldWeight(author:joe in 0), product of:
          1.0 = tf(termFreq(author:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=author, doc=0)
      3.581638E-5 = (MATCH) weight(description:joe in 0), product of:
        2.4097077E-4 = queryWeight(description:joe), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:joe in 0), product of:
          1.0 = tf(termFreq(description:joe)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=0)
    0.0010484984 = (MATCH) max plus 0.99 times others of:
      0.0010130403 = (MATCH) weight(author:blow^10.0 in 0), product of:
        0.0024097078 = queryWeight(author:blow^10.0), product of:
          10.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.42039964 = (MATCH) fieldWeight(author:blow in 0), product of:
          1.4142135 = tf(termFreq(author:blow)=2)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=author, doc=0)
      3.581638E-5 = (MATCH) weight(description:blow in 0), product of:
        2.4097077E-4 = queryWeight(description:blow), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:blow in 0), product of:
          1.0 = tf(termFreq(description:blow)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=0)
    0.071668215 = (MATCH) max plus 0.99 times others of:
      0.07163276 = (MATCH) weight(title:book^1000.0 in 0), product of:
        0.24097076 = queryWeight(title:book^1000.0), product of:
          1000.0 = boost
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.29726744 = (MATCH) fieldWeight(title:book in 0), product of:
          1.0 = tf(termFreq(title:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.5 = fieldNorm(field=title, doc=0)
      3.581638E-5 = (MATCH) weight(description:book in 0), product of:
        2.4097077E-4 = queryWeight(description:book), product of:
          0.5945349 = idf(docFreq=2, maxDocs=2)
          4.0530972E-4 = queryNorm
        0.14863372 = (MATCH) fieldWeight(description:book in 0), product of:
          1.0 = tf(termFreq(description:book)=1)
          0.5945349 = idf(docFreq=2, maxDocs=2)
          0.25 = fieldNorm(field=description, doc=0)
    &lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;/response&gt;</pre>
<h4>First document</h4>
<p><a href="http://solr.pl/wp-content/uploads/2012/02/tie099_doc11.png"><img loading="lazy" decoding="async" class="alignnone size-full wp-image-2091" title="tie099_doc1" src="http://solr.pl/wp-content/uploads/2012/02/tie099_doc11.png" alt="" width="600" height="245"></a></p>
<h4>Second document</h4>
<p><a href="http://solr.pl/wp-content/uploads/2012/02/tie099_doc21.png"><img loading="lazy" decoding="async" class="alignnone size-full wp-image-2092" title="tie099_doc2" src="http://solr.pl/wp-content/uploads/2012/02/tie099_doc21.png" alt="" width="600" height="248"></a></p>
<h4>What can we say about that ?</h4>
<p>As you can see the score of the result documents changed. Let&#8217;s have a look at the same document and the same <em>book</em>&nbsp;word. In the case, we sent 0.99 as the value of the <em>tie </em>parameter and the score value of that word increased comparing to the score when using <em>tie</em>&nbsp;of 0.01. Of course, the change is not only because the <em>tie</em>&nbsp;parameter but also because of normalization, but let&#8217;s forget about it for things to be simple <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /> So, in the second case, we see the score of&nbsp;<code>0.071668215</code>, which is the score of the <em>title</em>&nbsp;field summed with the score of the other fields multiplied by 0.99 (<em>tie</em>&nbsp;parameter value).</p>
<h3>To sum up</h3>
<p>As you can see, the <em>tie</em>&nbsp;parameter allows us, to control how the score is calculated for the DisjunctionMaxQuery. In extreme cases, when we only want the highest scored fields to contribute to the total score we can set the <em>tie </em>parameter to 0.0. <em>Tie</em>&nbsp;lets us control, how we want the low scoring fields to be treated, when score of the documents is calculated and thus where they are on the results list we get from Solr when using Dismax query parser.</p>
<h3>In case you are wodering</h3>
<p>In case you are wondering what we used to show you the diagrams &#8211; please go to&nbsp;<a href="http://explain.solr.pl/help" target="_blank" rel="noopener noreferrer">http://explain.solr.pl/help</a> and see, maybe&nbsp;<a href="http://explain.solr.pl/" target="_blank" rel="noopener noreferrer">http://explain.solr.pl/</a> may be helpful in Your case.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2012/02/06/what-can-we-use-dismax-tie-parameter-for/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
