<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>4.0 &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/4-0-2/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Wed, 11 Nov 2020 22:44:26 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>Solr 4.0 and Polish language analysis</title>
		<link>https://solr.pl/en/2012/04/02/solr-4-0-and-polish-language-analysis/</link>
					<comments>https://solr.pl/en/2012/04/02/solr-4-0-and-polish-language-analysis/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 02 Apr 2012 21:43:51 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[4.0]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[hunspell]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[morfologik]]></category>
		<category><![CDATA[polish]]></category>
		<category><![CDATA[solr]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=449</guid>

					<description><![CDATA[Because Polish language analysis functionality is present in Lucene (and Solr) for some time I decided to take a look and compare it on the basis of upcoming Lucene and Solr 4.0. Options At the time of writing, the following]]></description>
										<content:encoded><![CDATA[<p>Because Polish language analysis functionality is present in Lucene (and Solr) for some time I decided to take a look and compare it on the basis of upcoming Lucene and Solr 4.0.</p>
<p><span id="more-449"></span></p>
<h3>Options</h3>
<p>At the time of writing, the following options were present when it comes to analyzing Polish:</p>
<ul>
<li>Use Stempel library (available since Solr 3.1)</li>
<li>Use Hunspell and Polish dictionaries (available since Solr 3.5)</li>
<li>Use Morfologik library (will be available in Solr 4.0, <a href="https://issues.apache.org/jira/browse/SOLR-3272" target="_blank" rel="noopener noreferrer">SOLR-3272</a>).</li>
</ul>
<h3>Configuration</h3>
<p>Lets look how to configure all the above options in Solr (please remember that all the following configuration examples are based on Solr 4.0).</p>
<h4>Stempel</h4>
<p>In order to add Polish stemming using Stempel library, we just need to add the following filter to our type definition:
</p>
<pre class="brush:xml">&lt;filter class="solr.StempelPolishStemFilterFactory" /&gt;</pre>
<p>In addition to that, you need to add <em>lucene-analyzers-stempel-4.0.jar</em> library and <em>apache-solr-analysis-extras-4.0.jar</em> library to&nbsp;<em>SOLR_HOME/lib</em>. It&#8217;s also a good idea to use<em> solr.LowerCaseFilterFactory</em> before Stempel filter.</p>
<h4>Hunspell</h4>
<p>Similar to the configuration above, to use Hunspell you need to add a new filter to your type definition. For example in the following way:
</p>
<pre class="brush:xml">&lt;filter class="solr.HunspellStemFilterFactory" dictionary="pl_PL.dic" affix="pl_PL.aff" ignoreCase="true" /&gt;</pre>
<p>Parameters <em>dictionary</em> and <em>affix</em> are responsible for dictionary definition that we want to use. The <em>ignoreCase</em> parameter set to <em>true</em> tells Hunspell to ignore character case. You can find Hunspell dictionaries at the following URL: <a href="http://wiki.services.openoffice.org/wiki/Dictionaries" target="_blank" rel="noopener noreferrer">http://wiki.services.openoffice.org/wiki/Dictionaries</a>.</p>
<h4>Morfologik</h4>
<p>Similar to the two above examples all you need to change in your <em>schema.xml</em> is adding a new filter, this time the following way:
</p>
<pre class="brush:xml">&lt;filter class="solr.MorfologikFilterFactory" dictionary="MORFOLOGIK" /&gt;</pre>
<p>The <em>dictionary</em> parameter tell Solr which dictionary you would like to use. You can choose the one from the following three:</p>
<ul>
<li>MORFOLOGIK</li>
<li>MORFEUSZ</li>
<li>COMBINED</li>
</ul>
<p>In addition to that, you need to add the following libraries to the <em>SOLR_HOME/lib</em>: <em>lucene-analyzers-morfologik-4.0.jar, </em><em>apache-solr-analysis-extras-4.0.jar, morfologik-fsa-1.5.2.jar</em>, <em>morfologik-polish-1.5.2.jar</em> and <em>morfologik-stemming-1.5.2.jar</em>.</p>
<h3>Results Comparison</h3>
<p>Of course I wasn&#8217;t able to judge the results of analysis from the above three filters on the whole Polish language corpus and that&#8217;s why I decided to choose four work, to see the each of the filters behave. Those words are: &#8220;<em>urodzić urodzony urodzona urodzeni&#8221;</em> (this words are variations of the <em>born</em> word in Polish)<em>. </em>The results are as follows:<em><br />
</em></p>
<h4>Stempel</h4>
<p>The terms I got from Stempel were the following ones:
</p>
<pre>[urodzić] [urodzo] [urodzona] [urodzeni]</pre>
<p>Not all of them are words, but you have to remember that Stempel is a stemmer and because of that it produce stems which can be different from the actual words or their root forms. It is important to have the words we are interested in to be processed to the same tokens, which will allow to find those words by Lucene/Solr. Remembering that, I have to say, that the results of analysis using Stempel are not as good as I would like them to be. For example by searching for <em>urodzić</em> word you won&#8217;t be able to find documents with words like <em>urodzona</em> or <em>urodzić</em>.</p>
<h4>Hunspell</h4>
<p>The result of Hunspell analysis were as follows:
</p>
<pre>[urodzić, urodzić] [urodzony, urodzić] [urodzić] [urodzić, urodzony, urodzenie]</pre>
<p>Comparing the results I got when using Hunspell to those Stempel produced we can see the difference. Our sample query for the <em>urodzić</em> word, would find documents with words like <em>urodzony</em>, <em>urodzona</em> oraz <em>urodzeni</em>, which is quite nice. You can also notice, that with three words we got more than one term on the same positions. The results I got when using Hunspell are OK and I think they should satisfy most of the users (they do satisfy me), but lets have a look on the newly introduced filter in Lucene and Solr &#8211; Morrfologik.</p>
<h4>Morfologik</h4>
<p>The results of Morfologik analysis were as follows:
</p>
<pre>[urodzić] [urodzony, urodzić] [urodzić] [urodzić, urodzony]</pre>
<p>Again, if you compare those the the ones got when using Hunspell you can hardly see the difference (of course in this particular case). The only difference between Hunspell and Morfologik is the last term for which we got different results. In my opinion the results achieved with Morfologik, are satisfying.</p>
<h3>Performance</h3>
<p>The performance test was done in a simple manner &#8211; for each filter I&#8217;ve indexed 5 million documents, where all the text fields were based on Polish language analysis with appropriate filter (in addition to that some standard filters like stopwords, synonyms and so on). Every time the indexation was done on a clean Solr 4.0 instance. Because of using Data Import Handler I&#8217;ve sent commit every 100k documents. The index contained several fields, but the actual index structure was not crucial for the test as I indexed the same set of documents every time. Following are the test results:</p>
[table “21” not found /]<br />

<p><strong>Warning<em>:</em></strong> At the time of writing, according to&nbsp; <a href="https://issues.apache.org/jira/browse/SOLR-3245">SOLR-3245</a> JIRA issue there is a problem with Hunspell performance with Polish dictionaries and Solr 4.0. I&#8217;m almost certain that this situation will be resolved by the time Solr 4.0 will be released. But right now performance of Hunspell with Polish dictionaries and Solr 4.0 may not be sufficient.</p>
<h3>Short Summary</h3>
<p>Despite not having performance results for Hunspell (because I don&#8217;t count the ones I have right now as correct ones) we can see that Hunspell and Morfologik are a good candidates for Polish language analysis. Looking at Morfologik we have similar performance to Stempel, but Morfologik results are better in my opinion and that will make your user more happy.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2012/04/02/solr-4-0-and-polish-language-analysis/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Solr 4.0: Realtime GET</title>
		<link>https://solr.pl/en/2012/01/09/solr-4-0-realtime-get/</link>
					<comments>https://solr.pl/en/2012/01/09/solr-4-0-realtime-get/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 09 Jan 2012 20:57:41 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[4.0]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[get]]></category>
		<category><![CDATA[near]]></category>
		<category><![CDATA[nrt]]></category>
		<category><![CDATA[real]]></category>
		<category><![CDATA[solr]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=392</guid>

					<description><![CDATA[The next functionality I decided to look at, from the upcoming Solr 4.0, is the so called &#8220;Realtime Get&#8221;. It allows you to see the data even though it was not yet added to the index, thus before the commit&#160;operation]]></description>
										<content:encoded><![CDATA[<p>The next functionality I decided to look at, from the upcoming Solr 4.0, is the so called &#8220;Realtime Get&#8221;. It allows you to see the data even though it was not yet added to the index, thus before the <em>commit</em>&nbsp;operation being sent to Solr. Let&#8217;s see how it works.</p>
<p><span id="more-392"></span></p>
<h3>Some theory</h3>
<p>Data update in Lucene and Solr has one disadvantage &#8211; when you submit index updates they can&#8217;t be seen until <em>commit</em>&nbsp;operation is run. The problem is that <em>commit</em>&nbsp;is costly in terms of performance and intense commiting may cause performance problems. So, when you need your data to be visible right after being change you may be forced to choose &#8211; either performance, or fast updates. In order to address that Lucene and Solr are working towards enabling <em>Near Real Time</em>&nbsp;(NRT) searching. In Lucene we have that possibility, in Solr 4.0 we will also be able to use that and not only that.</p>
<h3>Configuration</h3>
<p>In order to use <em>Realtime Get</em>&nbsp;functionality we need to configure the following Solr features:</p>
<h4>Transaction log</h4>
<p>The first thing to configure is the transaction log writing. In order to do that you need to add the following to your <em>updateHandler</em>&nbsp;configuration:
</p>
<pre class="brush:xml">&lt;updateLog&gt;
  &lt;str name="dir"&gt;
<p>The above entry says, that the directory holding transaction log will be located in the same directory where the index directory is located.</p>
<h4>Realtime Get handler</h4>
<p>The second thing that needs to be done, to see the <em>Realtime Get</em>&nbsp;in action, is the appropriate handler configuration (or adding component to your already defined handler). To do that add the following to your&nbsp;<em>solrconfig.xml file</em>:
</p>
<pre class="brush:xml">&lt;requestHandler name="/get" class="solr.RealTimeGetHandler"&gt;
  &lt;lst name="defaults"&gt;
    &lt;str name="omitHeader"&gt;true&lt;/str&gt;
  &lt;/lst&gt;
&lt;/requestHandler&gt;</pre>
<p>The above entry it's nothing unusual - it just add a new request handler implementing <em>solr.RealTimeGetHandler</em>&nbsp;class, which enables checking the transaction log.</p>
<h3>Action</h3>
<p>To check how&nbsp;<em>Realtime Get</em>&nbsp;works I decided to do a simple test. The first thing I did is indexing one file (from the ones that are available in the&nbsp;<em>exampledocs</em>&nbsp;directory) with the use of the following bash command:
</p>
<pre class="brush:bash">curl 'http://localhost:8983/solr/update' -d @hd.xml -H 'Content-type:application/xml'</pre>
<p>Of course I did not send the <em>commit</em>&nbsp;operation after indexing. As we could expect the following query:
</p>
<pre class="brush:bash">http://localhost:8983/solr/select?q=*:*</pre>
<p>didn't return search results. So let's check, if the handler registered as&nbsp;<em>/get</em>&nbsp;will be able to get us some results. In order to do that I send the following query:
</p>
<pre class="brush:bash">http://localhost:8983/solr/get?id=SP2514N</pre>
<p>And in result I got the following document:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;doc name="doc"&gt;
  &lt;str name="id"&gt;SP2514N&lt;/str&gt;
  &lt;str name="name"&gt;Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133&lt;/str&gt;
  &lt;str name="manu"&gt;Samsung Electronics Co. Ltd.&lt;/str&gt;
  &lt;str name="manu_id_s"&gt;samsung&lt;/str&gt;
  &lt;arr name="cat"&gt;
    &lt;str&gt;electronics&lt;/str&gt;
    &lt;str&gt;hard drive&lt;/str&gt;
  &lt;/arr&gt;
  &lt;arr name="features"&gt;
    &lt;str&gt;7200RPM, 8MB cache, IDE Ultra ATA-133&lt;/str&gt;
    &lt;str&gt;NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor&lt;/str&gt;
  &lt;/arr&gt;
  &lt;float name="price"&gt;92.0&lt;/float&gt;
  &lt;int name="popularity"&gt;6&lt;/int&gt;
  &lt;bool name="inStock"&gt;true&lt;/bool&gt;
  &lt;date name="manufacturedate_dt"&gt;2006-02-13T15:26:37Z&lt;/date&gt;
  &lt;str name="store"&gt;35.0752,-97.032&lt;/str&gt;&lt;/doc&gt;
&lt;/response&gt;</pre>
<p>So Solr returned the result that wasn't added to the index - nice !</p>
<h3>Usage possibilities</h3>
<p>You probably noticed, that in order to fetch a document with <em>/get</em>&nbsp;handler I needed to provide it's unique identifier (or identifiers list). That's true, <em>Realtime Get</em>&nbsp;doesn't support searching, because it was not created to support full searching. This functionality is able to show us the updates of the documents which identifiers are known (so for example the ones in the index) - in example by adding the component used in <em>solr.RealTimeGetHandler</em> to any of your defined handler. And the good news is - you don't have to worry update performance - <em>solr.RealTimeGet</em>&nbsp;is very fast. So, if one of your problems is frequent updated you can look in the future with a smile <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /><em></em></p>
<h3>Last few words</h3>
<p>The <em>Realtime Get</em>&nbsp;functionality brings new possibilities when it comes to Solr and also on the road to the SolrCloud. With the use of transaction log one can implement automatic cluster node restore or instance NRT instance updates. As you can see Solr 4.0 is not only about search, but also about data store and bringing Solr closer to NoSQL solutions.</p>
<p>{solr.data.dir:}&lt;/str&gt;<br />
&lt;/updateLog&gt;</p>
<p>The above entry says, that the directory holding transaction log will be located in the same directory where the index directory is located.</p>
<h4>Realtime Get handler</h4>
<p>The second thing that needs to be done, to see the <em>Realtime Get</em>&nbsp;in action, is the appropriate handler configuration (or adding component to your already defined handler). To do that add the following to your&nbsp;<em>solrconfig.xml file</em>:
</p>
<pre wp-pre-tag-1=""></pre>
<p>The above entry it&#8217;s nothing unusual &#8211; it just add a new request handler implementing <em>solr.RealTimeGetHandler</em>&nbsp;class, which enables checking the transaction log.</p>
<h3>Action</h3>
<p>To check how&nbsp;<em>Realtime Get</em>&nbsp;works I decided to do a simple test. The first thing I did is indexing one file (from the ones that are available in the&nbsp;<em>exampledocs</em>&nbsp;directory) with the use of the following bash command:
</p>
<pre wp-pre-tag-2=""></pre>
<p>Of course I did not send the <em>commit</em>&nbsp;operation after indexing. As we could expect the following query:
</p>
<pre wp-pre-tag-3=""></pre>
<p>didn&#8217;t return search results. So let&#8217;s check, if the handler registered as&nbsp;<em>/get</em>&nbsp;will be able to get us some results. In order to do that I send the following query:
</p>
<pre wp-pre-tag-4=""></pre>
<p>And in result I got the following document:
</p>
<pre wp-pre-tag-5=""></pre>
<p>So Solr returned the result that wasn&#8217;t added to the index &#8211; nice !</p>
<h3>Usage possibilities</h3>
<p>You probably noticed, that in order to fetch a document with <em>/get</em>&nbsp;handler I needed to provide it&#8217;s unique identifier (or identifiers list). That&#8217;s true, <em>Realtime Get</em>&nbsp;doesn&#8217;t support searching, because it was not created to support full searching. This functionality is able to show us the updates of the documents which identifiers are known (so for example the ones in the index) &#8211; in example by adding the component used in <em>solr.RealTimeGetHandler</em> to any of your defined handler. And the good news is &#8211; you don&#8217;t have to worry update performance &#8211; <em>solr.RealTimeGet</em>&nbsp;is very fast. So, if one of your problems is frequent updated you can look in the future with a smile <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /><em></em></p>
<h3>Last few words</h3>
<p>The <em>Realtime Get</em>&nbsp;functionality brings new possibilities when it comes to Solr and also on the road to the SolrCloud. With the use of transaction log one can implement automatic cluster node restore or instance NRT instance updates. As you can see Solr 4.0 is not only about search, but also about data store and bringing Solr closer to NoSQL solutions.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2012/01/09/solr-4-0-realtime-get/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Solr 4.0: DocTransformers first look</title>
		<link>https://solr.pl/en/2011/12/05/solr-4-0-doctransformers-first-look/</link>
					<comments>https://solr.pl/en/2011/12/05/solr-4-0-doctransformers-first-look/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 05 Dec 2011 20:55:51 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[4.0]]></category>
		<category><![CDATA[doc]]></category>
		<category><![CDATA[document]]></category>
		<category><![CDATA[first]]></category>
		<category><![CDATA[first look]]></category>
		<category><![CDATA[look]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[transformer]]></category>
		<category><![CDATA[transformers]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=386</guid>

					<description><![CDATA[In todays entry we will look at the next feature that will come with version 4.0 of Apache Solr. We will look at the functionality which enables us to modify the fields in Solr result list. Do I need it]]></description>
										<content:encoded><![CDATA[<p>In todays entry we will look at the next feature that will come with version 4.0 of Apache Solr. We will look at the functionality which enables us to modify the fields in Solr result list.</p>
<p><span id="more-386"></span></p>
<h3>Do I need it ?</h3>
<p>Till now, we didn&#8217;t have much choice when it comes to the results returned by Solr. When Solr 4.0 will be published we will be given a new tool, so called <em>DocTransformers</em>. This feature enables us to modify the fields of the documents returned in the search results by Solr. Looking at what is available now we can for example change the names of the fields returned or mark the documents that were added by the <em>QueryElevationComponent</em>. Right now there are only a few implementation, but implementing your own <em>DocTranformer </em>is not hard.</p>
<h3>What is already available ?</h3>
<p>At the exact moment we are writing this, the following transformers are available:</p>
<ul>
<li>One that enables you to mark the documents that were added by the <em>QueryElevationComponent</em>.</li>
<li>One that enables you to add the explain information to the document.</li>
<li>One that enables you to add static value as a field of the document.</li>
<li>One that enables you to add the shard if from which the document was fetched.</li>
<li>One that enables you to add the <em>docid</em> as the document field (identifier used by Lucene).</li>
</ul>
<h3>How to use DocTransformers ?</h3>
<p>Lets look at how to use <em>DocTransformers</em>. To do that I&#8217;ve downloaded <em>trunk</em> version of Apache Solr (4.0) from the svn repository and I&#8217;ve run the example deployment. Next, I&#8217;ve indexed the example data and I&#8217;ve run the following query:
</p>
<pre class="brush:xml">http://localhost:8983/solr/select?q=encoded&amp;fl=name,score,[docid],[explain]</pre>
<p>If you look at the <em>fl</em> parameter you will notice that we told Solr that we want the <em>name</em> field in the results, the <em>score</em> of the document and two <em>DocTransformers</em>: <em>[docid]</em> and <em>[explain]</em>. In result I&#8217;ve got the following XML:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
 &lt;lst name="responseHeader"&gt;
  &lt;int name="status"&gt;0&lt;/int&gt;
  &lt;int name="QTime"&gt;2&lt;/int&gt;
  &lt;lst name="params"&gt;
    &lt;str name="q"&gt;encoded&lt;/str&gt;
    &lt;str name="fl"&gt;name,score,[docid],[explain]&lt;/str&gt;
  &lt;/lst&gt;
 &lt;/lst&gt;
 &lt;result name="response" numFound="2" start="0" maxScore="0.50524884"&gt;
 &lt;doc&gt;
  &lt;str name="name"&gt;Test with some GB18030 encoded characters&lt;/str&gt;
  &lt;float name="score"&gt;0.50524884&lt;/float&gt;
  &lt;int name="[docid]"&gt;0&lt;/int&gt;
  &lt;str name="[explain]"&gt;
  0.50524884 = (MATCH) weight(text:encoded in 0) [DefaultSimilarity], result of:
    0.50524884 = score(doc=0,freq=1.0 = termFreq=1), product of:
      1.0000001 = queryWeight, product of:
        3.2335923 = idf(docFreq=2, maxDocs=28)
        0.3092536 = queryNorm
      0.5052488 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1
        3.2335923 = idf(docFreq=2, maxDocs=28)
        0.15625 = fieldNorm(doc=0)
  &lt;/str&gt;
 &lt;/doc&gt;
 &lt;doc&gt;
  &lt;str name="name"&gt;Test with some UTF-8 encoded characters&lt;/str&gt;
  &lt;float name="score"&gt;0.4041991&lt;/float&gt;
  &lt;int name="[docid]"&gt;25&lt;/int&gt;
  &lt;str name="[explain]"&gt;
  0.4041991 = (MATCH) weight(text:encoded in 25) [DefaultSimilarity], result of:
    0.4041991 = score(doc=25,freq=1.0 = termFreq=1), product of:
      1.0000001 = queryWeight, product of:
        3.2335923 = idf(docFreq=2, maxDocs=28)
        0.3092536 = queryNorm
      0.40419903 = fieldWeight in 25, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1
        3.2335923 = idf(docFreq=2, maxDocs=28)
        0.125 = fieldNorm(doc=25)
  &lt;/str&gt;
 &lt;/doc&gt;
&lt;/result&gt;
&lt;/response&gt;</pre>
<p>As you can see, Solr did what we asked for.</p>
<h3>Your own implementation</h3>
<p>Let&#8217;s discuss, who to implement you own <em>DocTransformer</em>. Below, you have an example class named <em>RenameFieldsTransformer </em>from the <em>org.apache.solr.response.transform</em> package in Apache Solr source code. In general, all you have to do is override the following two methods from the <em>DocTransformer</em> class from <em>org.apache.solr.response.transform</em> package:</p>
<ul>
<li><code>String getName()</code> &#8211; method returning transformers name,</li>
<li><code>void transform(SolrDocument doc, int docid)</code> &#8211; method which makes the actual transformation.</li>
</ul>
<p>Implementation looks like this:
</p>
<pre class="brush:java">public class RenameFieldsTransformer extends DocTransformer {
 final NamedList&lt;String&gt; rename;

 public RenameFieldsTransformer( NamedList&lt;String&gt; rename ) {
  this.rename = rename;
 }

 @Override
 public String getName() {
  StringBuilder str = new StringBuilder();
  str.append( "Rename[" );
  for( int i=0; i&lt; rename.size(); i++ ) {
   if( i &gt; 0 ) {
    str.append( "," );
   }
   str.append( rename.getName(i) ).append( "&gt;&gt;" ).append( rename.getVal( i ) );
  }
  str.append( "]" );
  return str.toString();
 }

 @Override
 public void transform(SolrDocument doc, int docid) {
  for( int i=0; i&lt;rename.size(); i++ ) {
   Object v = doc.remove( rename.getName(i) );
   if( v != null ) {
    doc.setField(rename.getVal(i), v);
   }
  }
 }
}</pre>
<p>The code shown above enables us to rename the fields returned in the results. As you can see the <em>transform</em> method iterates through all the values in <em>rename</em> class variable. The <em>rename</em> variable consist of name value pairs which are field name and the name it should have after the transformation. You must also remember that in order to use your own transformer you need to add it&#8217;s configuration to the <em>solrconfig.xml</em> file. Here is the example which can be found on Solr wiki page:
</p>
<pre class="brush:xml">&lt;transformer name="elevated" class="org.apache.solr.response.transform.EditorialMarkerFactory" /&gt;</pre>
<h3>To sum up</h3>
<p>You should remember that the describes functionality is marked as experimental and can change its behavior when Lucene and Solr 4.0 will be released. We will get back to this topic as soon as Solr 4.0 will be released.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2011/12/05/solr-4-0-doctransformers-first-look/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Hierarchical faceting &#8211; Pivot facets in trunk</title>
		<link>https://solr.pl/en/2010/10/25/hierarchical-faceting-pivot-facets-in-trunk/</link>
					<comments>https://solr.pl/en/2010/10/25/hierarchical-faceting-pivot-facets-in-trunk/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 25 Oct 2010 12:17:29 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[4.0]]></category>
		<category><![CDATA[facet]]></category>
		<category><![CDATA[grouping]]></category>
		<category><![CDATA[hierarchical]]></category>
		<category><![CDATA[pivot]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[trunk]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=90</guid>

					<description><![CDATA[In a large number of implementations which I took part in, sooner or later, the question arise &#8211; what can we do to get faceting as a tree structure. Of course there some tricks for that, however, their use was]]></description>
										<content:encoded><![CDATA[<p>In  a large number of implementations which I took part in, sooner or  later, the question arise &#8211; what can we do to get faceting as a tree  structure. Of  course there some tricks for that, however, their use was to modify the  data and appropriate processing of the results on application side. It was not particularly functional, nor especially comfortable. However,  a few days ago Solr version 4.0 has been enhanced with code that is  marked as <a href="https://issues.apache.org/jira/browse/SOLR-792" target="_blank" rel="noopener noreferrer">Solr-792</a> in the system JIRA. Let&#8217;s see in this case, how to  get the faceting results as a tree.</p>
<p><span id="more-90"></span></p>
<p>Important Note &#8211; at this point this functionality is only available in version 4.0, Solr, which is the development version. To use this version you need to download the code from trunk of Lucene/Solr SVN repository.</p>
<h3>A few words at the beginning</h3>
<p>In many projects in which I had the opportunity to deal with there was a need to use a hierarchical faceting. One  of the simplest example is the requirement of showing the cities in the  provinces and the number of documents in both provinces, as well as in  various cities. Till recently, with no changes in the structure of data, it was impossible to achieve such functionality. Now it is possible <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<h3>Indexing</h3>
<p>In  order not to unnecessarily complicate the described functionality I  decided to use the sample XML documents that are available in the  directory <em>/exampledocs</em> of the example deployment. I also didn&#8217;t modify the<em> schema.xml </em>file, or <em>solrconfig.xml</em>, so that configurations are standard. So thats all when it comes to configuration. So we can start the indexing process (I called the command from the directory <em>$SOLR_HOME/exampledocs/</em>):
</p>
<pre class="brush:bash">./post.sh *.xml</pre>
<p>After seeing several screens of information, and we have our data indexed.</p>
<h3>The mechanism</h3>
<p>It is not difficult to use hierarchical faceting. Solr creators gave us to use two additional parameters to the ones we already know:</p>
<ul>
<li> <em>facet.pivot</em> &#8211; list of comma-separated fields, which shows at which fields and in what order to calculate the structure,</li>
<li><em>facet.pivot.mincount</em> &#8211; the minimum number of documents there needs to be  to the result to be included in faceting results. The default value is 1.</li>
</ul>
<p>So let&#8217;s try it.</p>
<h3>Queries</h3>
<p>At the beginning of the try with two fields. I  query for all the documents from the index and add the parameter  facet.pivot=cat,inStock to say Solr that I want to get the results of  the hierarchical faceting, where the first level of the hierarchy is the  cat field, and the second level is the inStock field. The query looks as follows:
</p>
<pre class="brush:xml">http://localhost:8983/solr/select/?q=*:*&amp;facet=true&amp;facet.pivot=cat,inStock</pre>
<p>To shorten the listing I omitted the part responsible for the search results along with a header.
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
.
.
.
&lt;result name="response" numFound="19" start="0"/&gt;
&lt;lst name="facet_counts"&gt;
  &lt;lst name="facet_queries"/&gt;
  &lt;lst name="facet_fields"/&gt;
  &lt;lst name="facet_dates"/&gt;
  &lt;lst name="facet_ranges"/&gt;
  &lt;lst name="facet_pivot"&gt;
    &lt;arr name="cat,inStock"&gt;
      &lt;lst&gt;
        &lt;str name="field"&gt;cat&lt;/str&gt;
        &lt;str name="value"&gt;electronics&lt;/str&gt;
        &lt;int name="count"&gt;17&lt;/int&gt;
        &lt;arr name="pivot"&gt;
          &lt;lst&gt;
            &lt;str name="field"&gt;inStock&lt;/str&gt;
            &lt;bool name="value"&gt;true&lt;/bool&gt;
            &lt;int name="count"&gt;13&lt;/int&gt;
          &lt;/lst&gt;
          &lt;lst&gt;
            &lt;str name="field"&gt;inStock&lt;/str&gt;
            &lt;bool name="value"&gt;false&lt;/bool&gt;
            &lt;int name="count"&gt;4&lt;/int&gt;
          &lt;/lst&gt;
        &lt;/arr&gt;
      &lt;/lst&gt;
      &lt;lst&gt;
        &lt;str name="field"&gt;cat&lt;/str&gt;
        &lt;str name="value"&gt;memory&lt;/str&gt;
        &lt;int name="count"&gt;6&lt;/int&gt;
        &lt;arr name="pivot"&gt;
          &lt;lst&gt;
            &lt;str name="field"&gt;inStock&lt;/str&gt;
            &lt;bool name="value"&gt;true&lt;/bool&gt;
            &lt;int name="count"&gt;6&lt;/int&gt;
          &lt;/lst&gt;
        &lt;/arr&gt;
      &lt;/lst&gt;
      &lt;lst&gt;
        &lt;str name="field"&gt;cat&lt;/str&gt;
        &lt;str name="value"&gt;connector&lt;/str&gt;
        &lt;int name="count"&gt;2&lt;/int&gt;
        &lt;arr name="pivot"&gt;
          &lt;lst&gt;
            &lt;str name="field"&gt;inStock&lt;/str&gt;
            &lt;bool name="value"&gt;false&lt;/bool&gt;
            &lt;int name="count"&gt;2&lt;/int&gt;
          &lt;/lst&gt;
        &lt;/arr&gt;
      &lt;/lst&gt;
      &lt;lst&gt;
        &lt;str name="field"&gt;cat&lt;/str&gt;
        &lt;str name="value"&gt;graphics card&lt;/str&gt;
        &lt;int name="count"&gt;2&lt;/int&gt;
        &lt;arr name="pivot"&gt;
          &lt;lst&gt;
            &lt;str name="field"&gt;inStock&lt;/str&gt;
            &lt;bool name="value"&gt;false&lt;/bool&gt;
            &lt;int name="count"&gt;2&lt;/int&gt;
          &lt;/lst&gt;
        &lt;/arr&gt;
      &lt;/lst&gt;
      &lt;lst&gt;
        &lt;str name="field"&gt;cat&lt;/str&gt;
        &lt;str name="value"&gt;hard drive&lt;/str&gt;
        &lt;int name="count"&gt;2&lt;/int&gt;
        &lt;arr name="pivot"&gt;
          &lt;lst&gt;
            &lt;str name="field"&gt;inStock&lt;/str&gt;
            &lt;bool name="value"&gt;true&lt;/bool&gt;
            &lt;int name="count"&gt;2&lt;/int&gt;
          &lt;/lst&gt;
        &lt;/arr&gt;
      &lt;/lst&gt;
      &lt;lst&gt;
        &lt;str name="field"&gt;cat&lt;/str&gt;
        &lt;str name="value"&gt;monitor&lt;/str&gt;
        &lt;int name="count"&gt;2&lt;/int&gt;
        &lt;arr name="pivot"&gt;
          &lt;lst&gt;
            &lt;str name="field"&gt;inStock&lt;/str&gt;
            &lt;bool name="value"&gt;true&lt;/bool&gt;
            &lt;int name="count"&gt;2&lt;/int&gt;
          &lt;/lst&gt;
        &lt;/arr&gt;
      &lt;/lst&gt;
      &lt;lst&gt;
        &lt;str name="field"&gt;cat&lt;/str&gt;
        &lt;str name="value"&gt;search&lt;/str&gt;
        &lt;int name="count"&gt;2&lt;/int&gt;
        &lt;arr name="pivot"&gt;
          &lt;lst&gt;
            &lt;str name="field"&gt;inStock&lt;/str&gt;
            &lt;bool name="value"&gt;true&lt;/bool&gt;
            &lt;int name="count"&gt;2&lt;/int&gt;
          &lt;/lst&gt;
        &lt;/arr&gt;
      &lt;/lst&gt;
      &lt;lst&gt;
        &lt;str name="field"&gt;cat&lt;/str&gt;
        &lt;str name="value"&gt;software&lt;/str&gt;
        &lt;int name="count"&gt;2&lt;/int&gt;
        &lt;arr name="pivot"&gt;
          &lt;lst&gt;
            &lt;str name="field"&gt;inStock&lt;/str&gt;
            &lt;bool name="value"&gt;true&lt;/bool&gt;
            &lt;int name="count"&gt;2&lt;/int&gt;
          &lt;/lst&gt;
        &lt;/arr&gt;
      &lt;/lst&gt;
    &lt;/arr&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;/response&gt;</pre>
<p>The presentation of faceting results has changed in this case. For  each of the main level we have the markers defining the field (the tag  with the attribute <em>name=&#8221;field&#8221;</em>), value (the tag with the attribute <em> name=&#8221;value&#8221;</em>) and the number of documents (the tag with the attribute  <em>name=&#8221;count&#8221;</em>). Next there is the the second level hierarchy (tag with the attribute <em>name=&#8221;pivot&#8221;</em>). The second level contains the same elements as the first level &#8211; name, value and the number of documents with a given value.</p>
<p>Let&#8217;s see how this mechanism can deal with more levels of depth. To check that I run the following query:
</p>
<pre class="brush:xml">http://localhost:8983/solr/select/?q=*:*&amp;facet=true&amp;facet.pivot=cat,inStock,features</pre>
<p>I omitted the response header with the results, leaving the faceting results only.  In addition, due to the length of the faceting results I only show one level one level faceting:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
.
.
.
&lt;result name="response" numFound="19" start="0"/&gt;
&lt;lst name="facet_counts"&gt;
  &lt;lst name="facet_queries"/&gt;
  &lt;lst name="facet_fields"/&gt;
  &lt;lst name="facet_dates"/&gt;
  &lt;lst name="facet_ranges"/&gt;
  &lt;lst name="facet_pivot"&gt;
    &lt;arr name="cat,inStock,features"&gt;
      &lt;lst&gt;
        &lt;str name="field"&gt;cat&lt;/str&gt;
        &lt;str name="value"&gt;electronics&lt;/str&gt;
        &lt;int name="count"&gt;17&lt;/int&gt;
        &lt;arr name="pivot"&gt;
          &lt;lst&gt;
            &lt;str name="field"&gt;inStock&lt;/str&gt;
            &lt;bool name="value"&gt;true&lt;/bool&gt;
            &lt;int name="count"&gt;13&lt;/int&gt;
            &lt;arr name="pivot"&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;2&lt;/str&gt;
                &lt;int name="count"&gt;7&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;3&lt;/str&gt;
                &lt;int name="count"&gt;7&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;lcd&lt;/str&gt;
                &lt;int name="count"&gt;5&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;x&lt;/str&gt;
                &lt;int name="count"&gt;5&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;ca&lt;/str&gt;
                &lt;int name="count"&gt;4&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;latenc&lt;/str&gt;
                &lt;int name="count"&gt;4&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;tft&lt;/str&gt;
                &lt;int name="count"&gt;4&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;v&lt;/str&gt;
                &lt;int name="count"&gt;4&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;0&lt;/str&gt;
                &lt;int name="count"&gt;3&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;1&lt;/str&gt;
                &lt;int name="count"&gt;3&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;25&lt;/str&gt;
                &lt;int name="count"&gt;3&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;30&lt;/str&gt;
                &lt;int name="count"&gt;3&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;5&lt;/str&gt;
                &lt;int name="count"&gt;3&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;7&lt;/str&gt;
                &lt;int name="count"&gt;3&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;8&lt;/str&gt;
                &lt;int name="count"&gt;3&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;time&lt;/str&gt;
                &lt;int name="count"&gt;3&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;up&lt;/str&gt;
                &lt;int name="count"&gt;3&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;000&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;19&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;20&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;2336&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;27&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;275&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;6&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;75&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;activ&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;built&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;cach&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;color&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;flash&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;heat&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;heatspread&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;matrix&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;mb&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;ms&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;photo&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;resolut&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;seek&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;speed&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;spreader&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;unbuff&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;usb&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
            &lt;/arr&gt;
          &lt;/lst&gt;
          &lt;lst&gt;
            &lt;str name="field"&gt;inStock&lt;/str&gt;
            &lt;bool name="value"&gt;false&lt;/bool&gt;
            &lt;int name="count"&gt;4&lt;/int&gt;
            &lt;arr name="pivot"&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;0&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;1&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;16&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;2&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;20&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;3&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;9&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;90&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;adapt&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;car&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;clock&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;direct&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;directx&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;dual&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;dvi&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;express&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;gddr&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;ghz&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;gl&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;gpu&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;gpuvpu&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;hdtv&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;mb&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;mhz&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;open&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;opengl&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;out&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;pci&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;power&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;vpu&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;white&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
              &lt;lst&gt;
                &lt;str name="field"&gt;features&lt;/str&gt;
                &lt;str name="value"&gt;x&lt;/str&gt;
                &lt;int name="count"&gt;2&lt;/int&gt;
              &lt;/lst&gt;
            &lt;/arr&gt;
          &lt;/lst&gt;
        &lt;/arr&gt;
      &lt;/lst&gt;
    &lt;/arr&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;/response&gt;</pre>
<p>As shown in the example, also in this case Solr had no problems with the correct calculation of the hierarchy. The  above example is almost the same, in the context of data available, as  the previous example, it only contains one more level of depth.</p>
<h3>A few words at the end</h3>
<p>In my opinion this is one of the more useful features for &#8220;<em>ordinary</em>&#8221; user. Unfortunately, so far only available in development version of Solr. I  have not found any information about whether it is planned to transfer  this functionality to version 1.5 of Solr, which is named <em>branch_3x</em> branch in SVN. However, it is important that this functionality was  commited, and sooner or later Solr users will be able to use it.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2010/10/25/hierarchical-faceting-pivot-facets-in-trunk/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Quick look &#8211; FieldCollapsing</title>
		<link>https://solr.pl/en/2010/09/20/quick-look-fieldcollapsing/</link>
					<comments>https://solr.pl/en/2010/09/20/quick-look-fieldcollapsing/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 20 Sep 2010 12:12:39 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[4.0]]></category>
		<category><![CDATA[collapsing]]></category>
		<category><![CDATA[field]]></category>
		<category><![CDATA[fieldcollapsing]]></category>
		<category><![CDATA[grouping]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[lucene 4.0]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[solr 4.0]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=77</guid>

					<description><![CDATA[FieldCollapsing, or in other words grouping of search results has just been commited to the svn repository. I decided to take a look at this functionality and see how it works. I want to begin with brief information &#8211; FieldCollapsing]]></description>
										<content:encoded><![CDATA[<p>FieldCollapsing, or in other words grouping of search results has just been commited to the svn repository. I decided to take a look at this  functionality and see how it works.</p>
<p><span id="more-77"></span></p>
<p>I  want to begin with brief information &#8211; FieldCollapsing is only  available in version 4.0 of Solr, which is a development version of Solr  project, and it&#8217;s rather unlikely to be transfered to version 3.X.</p>
<h3>FieldCollapsing &#8211; what is it ?</h3>
<p>Imagine that our index contains information about companies from different cities. We  want to show our users one (or, for example two or three) companies in  each city, of course, the companies that meet the search criteria. How to do that &#8211; just use the FieldCollapsing mechanism. It allows the returned results to be grouped based on field contents. The search results can be grouped into a single document, or a fixed quantity of documents.</p>
<h3>Parameters</h3>
<p>Similarly,  as with most features available in Solr, the behavior of  FieldCollapsing mechanism can be configured through a number of  parameters, here they are:</p>
<ul>
<li> <em>group </em>&#8211; setting this parameter to true enables FieldCollapsing mechanism. The default value is <em>false</em>.</li>
<li><em>group.field</em> &#8211; this parameter determines on the contents of what field grouping is going to take place.</li>
<li><em>group.func </em>&#8211; definition of function, based on the outcome of which grouping will be made.</li>
<li><em>group.limit</em> &#8211; the number of documents returned in each group. The default is 1.</li>
<li><em>group.sort </em>&#8211; parameter specifying how to sort the documents in groups. The default value is the value <em>score desc</em>.</li>
</ul>
<p>It  is worth noting that the rows parameter passed to the query will  determine the number of groups to be returned in search results not the  amount of individual documents. Sort parameter behaviour is also changed. This parameter will tell Solr how to sort groups not individual documents. Groups wil be sorted based on the content of fields of the first documents in every group.</p>
<h3>Search Results</h3>
<p>Search results are different from those to which we are accustomed. They are grouped according to the parameters that we have passed. The  main element of the search results are no longer documents &#8211; when we  use FieldCollapsing the main search result element is a group of  documents. Within the groups the documents are shown (their number is defined by group.limit parameter). For example, making the following query:
</p>
<pre class="brush:xml">http://localhost:8983/solr/select/?q=*:*&amp;group=true&amp;group.field=instock&amp;indent=true</pre>
<p>to  Solr which index  was created by indexing all documents in XML format  from a catalog <em>exampledocs </em>will result in getting the following  response:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;lst name="responseHeader"&gt;
  &lt;int name="status"&gt;0&lt;/int&gt;
  &lt;int name="QTime"&gt;0&lt;/int&gt;
  &lt;lst name="params"&gt;
    &lt;str name="group.field"&gt;inStock&lt;/str&gt;
    &lt;str name="group"&gt;true&lt;/str&gt;
    &lt;str name="indent"&gt;true&lt;/str&gt;
    &lt;str name="q"&gt;*:*&lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;lst name="grouped"&gt;
  &lt;lst name="inStock"&gt;
    &lt;int name="matches"&gt;19&lt;/int&gt;
    &lt;arr name="groups"&gt;
     &lt;lst&gt;
        &lt;str name="groupValue"&gt;T&lt;/str&gt;
        &lt;result name="doclist" numFound="15" start="0"&gt;
          &lt;doc&gt;
            &lt;arr name="cat"&gt;&lt;str&gt;electronics&lt;/str&gt;&lt;str&gt;hard drive&lt;/str&gt;&lt;/arr&gt;
            &lt;arr name="features"&gt;&lt;str&gt;7200RPM, 8MB cache, IDE Ultra ATA-133&lt;/str&gt;&lt;str&gt;NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor&lt;/str&gt;&lt;/arr&gt;
            &lt;str name="id"&gt;SP2514N&lt;/str&gt;
            &lt;bool name="inStock"&gt;true&lt;/bool&gt;
            &lt;str name="manu"&gt;Samsung Electronics Co. Ltd.&lt;/str&gt;
            &lt;date name="manufacturedate_dt"&gt;2006-02-13T15:26:37Z&lt;/date&gt;
            &lt;str name="name"&gt;Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133&lt;/str&gt;
            &lt;int name="popularity"&gt;6&lt;/int&gt;
            &lt;float name="price"&gt;92.0&lt;/float&gt;
            &lt;str name="store"&gt;45.17614,-93.87341&lt;/str&gt;
            &lt;double name="store_0_d"&gt;45.17614&lt;/double&gt;
            &lt;double name="store_1_d"&gt;-93.87341&lt;/double&gt;
            &lt;str name="store_lat_lon"&gt;45.17614,-93.87341&lt;/str&gt;
          &lt;/doc&gt;
        &lt;/result&gt;
      &lt;/lst&gt;
      &lt;lst&gt;
        &lt;str name="groupValue"&gt;F&lt;/str&gt;
        &lt;result name="doclist" numFound="4" start="0"&gt;
          &lt;doc&gt;
            &lt;arr name="cat"&gt;&lt;str&gt;electronics&lt;/str&gt;&lt;str&gt;connector&lt;/str&gt;&lt;/arr&gt;
            &lt;arr name="features"&gt;&lt;str&gt;car power adapter, white&lt;/str&gt;&lt;/arr&gt;
            &lt;str name="id"&gt;F8V7067-APL-KIT&lt;/str&gt;
            &lt;bool name="inStock"&gt;false&lt;/bool&gt;
            &lt;str name="manu"&gt;Belkin&lt;/str&gt;
            &lt;date name="manufacturedate_dt"&gt;2005-08-01T16:30:25Z&lt;/date&gt;
            &lt;str name="name"&gt;Belkin Mobile Power Cord for iPod w/ Dock&lt;/str&gt;
            &lt;int name="popularity"&gt;1&lt;/int&gt;
            &lt;float name="price"&gt;19.95&lt;/float&gt;
            &lt;str name="store"&gt;45.17614,-93.87341&lt;/str&gt;
            &lt;double name="store_0_d"&gt;45.17614&lt;/double&gt;
            &lt;double name="store_1_d"&gt;-93.87341&lt;/double&gt;
            &lt;str name="store_lat_lon"&gt;45.17614,-93.87341&lt;/str&gt;
            &lt;float name="weight"&gt;4.0&lt;/float&gt;
          &lt;/doc&gt;
        &lt;/result&gt;
      &lt;/lst&gt;
    &lt;/arr&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;/response&gt;</pre>
<h3>At the end</h3>
<p>An interesting feature that will certainly find use in some systems. However, please note that this functionality will be further developed. So far there is no support for distributed search and for grouping on multivalued fields. At  this time there&#8217;s no point of a performance testing, first because of  the changes that will come to the mechanism, and secondly because of the  fact that this is Lucene and Solr 4.0 which are both in development. However, I will be definitely watching how this functionality evolves <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2010/09/20/quick-look-fieldcollapsing/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
