<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>highlighting &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/highlighting-2/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Thu, 12 Nov 2020 12:51:18 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>Autocomplete on multivalued fields using highlighting</title>
		<link>https://solr.pl/en/2013/02/25/autocomplete-on-multivalued-fields-using-highlighting/</link>
					<comments>https://solr.pl/en/2013/02/25/autocomplete-on-multivalued-fields-using-highlighting/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 25 Feb 2013 12:50:33 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[autocomplete]]></category>
		<category><![CDATA[highlighting]]></category>
		<category><![CDATA[multivalued]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=536</guid>

					<description><![CDATA[One of the recent topics I came across was auto complete feature based on Solr multi-valued fields (for example, this question was asked on Stack Overflow). Let&#8217;s look what possibilities we have. Multiple cores vs single core One of the]]></description>
										<content:encoded><![CDATA[<p>One of the recent topics I came across was auto complete feature based on Solr multi-valued fields (for example, this question was asked on <a href="http://stackoverflow.com/questions/14865417/autocomplete-feature-using-solr4-on-multivalued-fields">Stack Overflow</a>). Let&#8217;s look what possibilities we have.</p>
<p><span id="more-536"></span></p>
<h2>Multiple cores vs single core</h2>
<p>One of the possibilities we should consider in the beginning is if we can use a dedicated core or collection for autocomplete. If we can, we should go that way. There are multiple reasons in favor of such approach, for example such collection will be smaller than the one with the data that needs to be search-able, the term count should be smaller and thus your queries will be faster. Of course we have to take care of the additional configuration and indexing, but that&#8217;s not too much of a problem right ? In this entry we will look at the situations where having a separate core is not an option &#8211; for example because of filtering that needs to be done.</p>
<p>Please also note, that in this entry we assume that we want whole phrases to be shown for the user.</p>
<h2>Configuration</h2>
<p>Let&#8217;s start from the configuration.</p>
<h3>Struktura indeksu</h3>
<p>Let&#8217;s assume that we want to suggest phrases from the multi valued fields. Let&#8217;s call that field&nbsp; <em>features</em>. Configuration of all the fields in the index is as follows:
</p>
<pre class="brush:xml">&lt;fields&gt;
 &lt;field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /&gt;
 &lt;field name="features" type="string" indexed="true" stored="true" multiValued="true"/&gt;
 &lt;field name="features_autocomplete" type="text_autocomplete" indexed="true" stored="true" multiValued="true"/&gt;

 &lt;field name="_version_" type="long" indexed="true" stored="true"/&gt;
&lt;/fields&gt;</pre>
<p>As you can see, for the auto complete feature, we will use the field named <em>features_autocomplete</em>. The <em>_version_</em> field is needed by some of the Solr 4.0 (and newer) features and because of that it is present in our index.</p>
<h3>Field values copying</h3>
<p>In addition to the above configuration we also want to copy the data from the <em>features</em> field to the&nbsp;<em>features_autocomplete</em> one. In order to do that we will use Solr copy field feature. To do that, we add the following section to the <em>schema.xml</em> file:
</p>
<pre class="brush:xml">&lt;copyField source="features" dest="features_autocomplete"/&gt;</pre>
<h3>Field type &#8211; text_autocomplete</h3>
<p>Let&#8217;s have a look at the last thing we have when it comes to configuration &#8211; the definition of the <em>text_autocomplete</em> type:
</p>
<pre class="brush:xml">&lt;fieldType name="text_autocomplete" class="solr.TextField" positionIncrementGap="100"&gt;
 &lt;analyzer type="index"&gt;
  &lt;tokenizer class="solr.KeywordTokenizerFactory"/&gt;
  &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
  &lt;filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="50" /&gt;
 &lt;/analyzer&gt;
 &lt;analyzer type="query"&gt;
  &lt;tokenizer class="solr.KeywordTokenizerFactory" /&gt;
  &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
 &lt;/analyzer&gt;
&lt;/fieldType&gt;</pre>
<p>As you can see, during indexing, Solr will create n-grams from the phrase indexed in the <em>features_autocomplete</em> field. It will start from the minimum length of 2, ending on the maximum length of 50.</p>
<p>During querying we will only lowercase our query phrase, nothing else is needed in our case.</p>
<h3>Sample data</h3>
<p>Our sample data looks like this:
</p>
<pre class="brush:xml">&lt;add&gt;
 &lt;doc&gt;
  &lt;field name="id"&gt;1&lt;/field&gt;
  &lt;field name="features"&gt;Multiple windows&lt;/field&gt;
  &lt;field name="features"&gt;Single door&lt;/field&gt;
 &lt;/doc&gt;
 &lt;doc&gt;
  &lt;field name="id"&gt;2&lt;/field&gt;
  &lt;field name="features"&gt;Single window&lt;/field&gt;
  &lt;field name="features"&gt;Single door&lt;/field&gt;
 &lt;/doc&gt;
 &lt;doc&gt;
  &lt;field name="id"&gt;3&lt;/field&gt;
  &lt;field name="features"&gt;Multiple windows&lt;/field&gt;
  &lt;field name="features"&gt;Multiple doors&lt;/field&gt;
 &lt;/doc&gt;
&lt;/add&gt;</pre>
<h2>Initial query</h2>
<p>Let&#8217;s look at the queries now.</p>
<h3>In the beginning</h3>
<p>Let&#8217;s start with a simple query that would return the data we need if we would use a single valued fields. The query looks as follows:
</p>
<pre class="brush:xml">q=features_autocomplete:sing&amp;fl=features_autocomplete</pre>
<h3>Query results</h3>
<p>The results we would get from such query, for our example data, should look like this:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
 &lt;lst name="responseHeader"&gt;
  &lt;int name="status"&gt;0&lt;/int&gt;
  &lt;int name="QTime"&gt;3&lt;/int&gt;
  &lt;lst name="params"&gt;
   &lt;str name="fl"&gt;features_autocomplete&lt;/str&gt;
   &lt;str name="q"&gt;features_autocomplete:sing&lt;/str&gt;
  &lt;/lst&gt;
 &lt;/lst&gt;
 &lt;result name="response" numFound="2" start="0"&gt;
 &lt;doc&gt;
  &lt;arr name="features_autocomplete"&gt;
   &lt;str&gt;Single window&lt;/str&gt;
   &lt;str&gt;Single door&lt;/str&gt;
  &lt;/arr&gt;
 &lt;/doc&gt;
 &lt;doc&gt;
  &lt;arr name="features_autocomplete"&gt;
   &lt;str&gt;Multiple windows&lt;/str&gt;
   &lt;str&gt;Single door&lt;/str&gt;
  &lt;/arr&gt;
 &lt;/doc&gt;
 &lt;/result&gt;
&lt;/response&gt;</pre>
<h3>A short comment</h3>
<p>As we can see, the results are not satisfying us, because in addition to the value we are querying for, we got all the values that are stored in the multi-valued field. We would only like to have the one that we queried for. Is this possible ? Yes it is &#8211; with a little trick. Let&#8217;s modify our query to use highlighting.</p>
<h2>Query with highlighting</h2>
<p>So now, we will make use of Apache Solr highlighting module.</p>
<h3>Changed query</h3>
<p>What we will do is add the following part to our previous query:
</p>
<pre class="brush:xml">hl=true&amp;hl.fl=features_autocomplete&amp;hl.simple.pre=&amp;hl.simple.post=</pre>
<p>So the whole query looks like this:
</p>
<pre class="brush:xml">q=features_autocomplete:sing&amp;fl=features_autocomplete&amp;hl=true&amp;hl.fl=features_autocomplete&amp;hl.simple.pre=&amp;hl.simple.post=</pre>
<p>A few words about the parameters that were used:</p>
<ul>
<li><em>hl=true</em> &#8211; we inform Solr that we want to use highlighting,</li>
<li><em>hl.fl=features_autocomplete</em> &#8211; we tell Solr which field should be used for highlighting,</li>
<li><em>hl.simple.pre=</em> &#8211; setting the&nbsp;<em>hl.simple.pre</em> to empty value tells Solr that we don&#8217;t want to mark the beginning of the highlighted fragment,</li>
<li><em>hl.simple.post=</em> &#8211; setting the&nbsp;<em>hl.simple.post</em> to empty value tells Solr that we don&#8217;t want to mark the end of the highlighted fragment.</li>
</ul>
<h3>Modified query results</h3>
<p>After querying Solr with the modified query, the following results were returned:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
 &lt;lst name="responseHeader"&gt;
  &lt;int name="status"&gt;0&lt;/int&gt;
  &lt;int name="QTime"&gt;4&lt;/int&gt;
  &lt;lst name="params"&gt;
   &lt;str name="fl"&gt;features_autocomplete&lt;/str&gt;
   &lt;str name="q"&gt;features_autocomplete:sing&lt;/str&gt;
   &lt;str name="hl.simple.pre"/&gt;
   &lt;str name="hl.simple.post"/&gt;
   &lt;str name="hl.fl"&gt;features_autocomplete&lt;/str&gt;
   &lt;str name="hl"&gt;true&lt;/str&gt;
  &lt;/lst&gt;
 &lt;/lst&gt;
 &lt;result name="response" numFound="2" start="0"&gt;
 &lt;doc&gt;
  &lt;arr name="features_autocomplete"&gt;
   &lt;str&gt;Single window&lt;/str&gt;
   &lt;str&gt;Single door&lt;/str&gt;
  &lt;/arr&gt;
 &lt;/doc&gt;
 &lt;doc&gt;
  &lt;arr name="features_autocomplete"&gt;
   &lt;str&gt;Multiple windows&lt;/str&gt;
   &lt;str&gt;Single door&lt;/str&gt;
  &lt;/arr&gt;
 &lt;/doc&gt;
 &lt;/result&gt;
 &lt;lst name="highlighting"&gt;
  &lt;lst name="2"&gt;
   &lt;arr name="features_autocomplete"&gt;
    &lt;str&gt;Single window&lt;/str&gt;
   &lt;/arr&gt;
  &lt;/lst&gt;
  &lt;lst name="1"&gt;
   &lt;arr name="features_autocomplete"&gt;
    &lt;str&gt;Single door&lt;/str&gt;
   &lt;/arr&gt;
  &lt;/lst&gt;
 &lt;/lst&gt;
&lt;/response&gt;</pre>
<p>As you can see, the section responsible for highlighting brings the information that we are interested in <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<h2>Summary</h2>
<p>Of course we need to remember that the approach proposed in this entry is not the only way to have a working auto-complete feature with data in multi-valued fields. In the next entry in this topic we will show how we can use faceting do get the same results if only we can accept some small drawbacks.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2013/02/25/autocomplete-on-multivalued-fields-using-highlighting/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
