<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>document &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/document-2/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Sat, 14 Nov 2020 14:14:55 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>Solr 6.5 and large stored fields &#8211; quick look</title>
		<link>https://solr.pl/en/2017/05/01/solr-6-5-and-large-stored-fields-quick-look/</link>
					<comments>https://solr.pl/en/2017/05/01/solr-6-5-and-large-stored-fields-quick-look/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 01 May 2017 13:14:28 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[cache]]></category>
		<category><![CDATA[document]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=927</guid>

					<description><![CDATA[As we know Solr has a few caches, for example &#8211; filterCache for filters, queryResultCache for query results caching and of course the documentCache for caching documents for fast retrieval. Today we will focus on the last of the mentioned]]></description>
										<content:encoded><![CDATA[<p>As we know Solr has a few caches, for example &#8211; <em>filterCache</em> for filters, <em>queryResultCache</em> for query results caching and of course the <em>documentCache</em> for caching documents for fast retrieval. Today we will focus on the last of the mentioned caches and what can be done to better utilize the cache if you use it.</p>
<p><span id="more-927"></span></p>
<h3>The problem</h3>
<p>When <em>documentCache</em> is present in <em>solrconfig.xml</em> after the first time a field is retrieved from Lucene Solr will cache its value along with the document and store it in the <em>documentCache</em>. This can be very expensive, especially for large stored fields &#8211; image a situation when you have the documents OCRed from a book and you show the content of the pages. If you don&#8217;t reuse such data, so basically a lot hit ration in the <em>documentCache</em>, will result in more garbage produced by Solr itself and thus JVM garbage collector having harder time to clean that up. That can lead to higher CPU usage and worse performance of Solr in general. Let&#8217;s look at what we can do with such large, stored fields.</p>
<h3>Marking the field as large</h3>
<p>Starting with Solr 6.5 we got the ability to add additional property to the field definition, one called <em>large</em> which takes a value of <em>true</em> or <em>false</em> by default being&nbsp;<em>false</em>. Field that we want to mark as large should be set as <em>stored=&#8221;true&#8221;</em> and <em>multiValued=&#8221;false&#8221;</em>. In such cases, setting the <em>large=&#8221;true&#8221;</em> property on the field definition will make the field value not cached inside the <em>documentCache.</em></p>
<h3>Noticing the difference</h3>
<p>Because this is a <em>quick look</em> type of post, I don&#8217;t want to get into too much specifics, but I would like to compare two collections with the same data. Each collection have the same set of fields:</p>
<ul>
<li><em>id</em> &#8211; identifier of the document,</li>
<li><em>name</em> &#8211; name of the document,</li>
<li><em>body</em> &#8211; text of the document, which can be very, very large.</li>
</ul>
<p>One collection will have the <em>large=&#8221;true&#8221;</em> for the <em>body</em> field and the other won&#8217;t have that property set. We will also index a few large documents and see how <em>documentCache</em> behaves.</p>
<p>So here are the commands to setup those two collections using Solr.pl Github account (<a href="https://github.com/solrpl/">https://github.com/solrpl/</a>). First setup one collection and gather statistics and then remove all the files, restart Solr, create the second collection and gather statistics. The commands are as follows:
</p>
<pre class="brush:xml">$ mkdir /tmp/solr
$ mkdir /tmp/solr/collection_with_large
$ mkdir /tmp/solr/collection_without_large
$ wget https://github.com/solrpl/blog/tree/master/posts/large_field/data.xml /tmp/solr/data.xml
$ wget https://github.com/solrpl/blog/tree/master/posts/large_field/collection_with_large/managed-schema /tmp/solr/collection_with_large/managed-schema
$ wget https://github.com/solrpl/blog/tree/master/posts/large_field/collection_with_large/solrconfig.xml /tmp/solr/collection_with_large/solrconfig.xml
$ wget https://github.com/solrpl/blog/tree/master/posts/large_field/collection_without_large/managed-schema /tmp/solr/collection_without_large/managed-schema
$ wget https://github.com/solrpl/blog/tree/master/posts/large_field/collection_without_large/solrconfig.xml /tmp/solr/collection_without_large/solrconfig.xml
$ bin/solr zk upconfig -z localhost:9983 -n config_with_large -d /tmp/collection_with_large
$ bin/solr create_collection -c collection_with_large -n config_with_large -shards 1 -replicationFactor 1
$ curl -XPOST 'localhost:8983/solr/collection_with_large/update?commit=true' -H 'Content-Type:application/xml' --data-binary @/tmp/solr/data.xml
$ curl 'localhost:8983/solr/collection_with_large/select?q=*:*'</pre>
<p>And now let&#8217;s create the second collection using the downloaded data:
</p>
<pre class="brush:xml">$ bin/solr zk upconfig -z localhost:9983 -n config_without_large -d /tmp/collection_without_large
$ bin/solr create_collection -c collection_without_large -n config_without_large -shards 1 -replicationFactor 1
$ curl -XPOST 'localhost:8983/solr/collection_without_large/update?commit=true' -H 'Content-Type:application/xml' --data-binary @/tmp/solr/data.xml
$ curl 'localhost:8983/solr/collection_without_large/select?q=*:*'</pre>
<p>And now, let&#8217;s check the usage of the <em>documentCache</em> that we&#8217;ve gathered. So we have this for the collection with the <em>body</em> field marked as <em>large=&#8221;true&#8221;</em>:</p>
<p><a href="http://solr.pl/wp-content/uploads/2017/04/field_with_large.png"><img decoding="async" class="aligncenter wp-image-3945 size-medium" src="http://solr.pl/wp-content/uploads/2017/04/field_with_large-300x65.png" alt="" width="300" height="65"></a></p>
<p>And we have this for the collection with the <em>body</em> field without the <em>large=&#8221;true&#8221;</em> property:</p>
<p><a href="http://solr.pl/wp-content/uploads/2017/04/field_without_large.png"><img decoding="async" class="aligncenter wp-image-3946 size-medium" src="http://solr.pl/wp-content/uploads/2017/04/field_without_large-300x70.png" alt="" width="300" height="70"></a></p>
<p>As you can see, the field marked with <em>large=&#8221;true&#8221;</em> was not put into the <em>documentCache</em> directly, but only as a lazy loaded large field, which is what we were aiming for. This means, that we can still use the <em>documentCache</em> and not worry about Solr putting the large, stored fields there, which was the case in the second example.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2017/05/01/solr-6-5-and-large-stored-fields-quick-look/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Document language identification</title>
		<link>https://solr.pl/en/2012/01/23/document-language-identification/</link>
					<comments>https://solr.pl/en/2012/01/23/document-language-identification/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 23 Jan 2012 20:59:03 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[3.5]]></category>
		<category><![CDATA[document]]></category>
		<category><![CDATA[identification]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[tika]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=396</guid>

					<description><![CDATA[One of the functionality of the latest Solr version (3.5) is the ability to identify the language of the document during its indexation. In todays entry we will see how Apache Solr work together with Apache Tika to identify the]]></description>
										<content:encoded><![CDATA[<p>One of the functionality of the latest Solr version (<a href="http://solr.pl/en/2011/11/27/apache-lucene-and-solr-3-5/" target="_blank" rel="noopener noreferrer">3.5</a>) is the ability to identify the language of the document during its indexation. In todays entry we will see how Apache Solr work together with Apache Tika to identify the language of the documents.</p>
<p><span id="more-396"></span></p>
<h3>At the beginning</h3>
<p>You should remember that the described functionality was introduced in Solr 3.5.</p>
<h3>Assumptions</h3>
<p>We will be using two fields to identify the document language:&nbsp;<em>title</em>&nbsp;and&nbsp;<em>body</em>. We want to store the information of the detected language in the <em>lang</em>&nbsp;field.</p>
<h3>Index structure</h3>
<p>The structure of our index is of course simplified and contain only fields needed for the test. So the field definition part of the <em>schema.xml</em>&nbsp;file looks like this:
</p>
<pre class="brush:xml">&lt;field name="id" type="string" indexed="true" stored="true" required="true" /&gt;
&lt;field name="title" type="text_ws" indexed="true" stored="true" /&gt;
&lt;field name="body" type="text_ws" indexed="true" stored="true" /&gt;
&lt;field name="lang" type="string" indexed="true" stored="true" /&gt;</pre>
<p>All the fields as marked as&nbsp;<em>stored=&#8221;true&#8221;</em>&nbsp;for simplicity.</p>
<h3>Update request processor configuration</h3>
<p>In order to be able to use the language identification feature we need to configure Solr update request processor. We will be using the one that is using Apache Tika (there is a second implementation based on&nbsp;<a href="http://code.google.com/p/language-detection/">http://code.google.com/p/language-detection/</a>). In order to configure the process we add the following to the <em>solrconfig.xml</em>&nbsp;file:
</p>
<pre class="brush:xml">&lt;updateRequestProcessorChain name="langid"&gt;
  &lt;processor name="langid" class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory"&gt;
    &lt;lst name="defaults"&gt;
      &lt;str name="langid.fl"&gt;title,body&lt;/str&gt;
      &lt;str name="langid.langField"&gt;lang&lt;/str&gt;
    &lt;/lst&gt;
  &lt;/processor&gt;
  &lt;processor class="solr.LogUpdateProcessorFactory" /&gt;
  &lt;processor class="solr.RunUpdateProcessorFactory" /&gt;
&lt;/updateRequestProcessorChain&gt;</pre>
<p>Other parameters of the <em>TikaLanguageIdentifierUpdateProcessorFactory</em>&nbsp;are described on Apache Solr wiki pages available at the following URL address:&nbsp;<a href="http://wiki.apache.org/solr/LanguageDetection">http://wiki.apache.org/solr/LanguageDetection</a>.</p>
<h3>Additional libraries</h3>
<p>In order for the update request processor to be working we need some additional libraries. From the <em>dist</em>&nbsp;directory from Apache Solr distribution we copy the&nbsp;<em>apache-solr-langid-3.5.0.jar</em>&nbsp;to&nbsp;<em>tikaDir</em>&nbsp;(for example), which we make on the same level as the <em>webapps</em>&nbsp;directory. Then we add the following line to the&nbsp;<em>solrconfig.xml </em>file:
</p>
<pre class="brush:xml">&lt;lib dir="../tikaLib/" regex="apache-solr-langid-\d.*\.jar" /&gt;</pre>
<p>The next library we will need is the Tika jar with all the goodiess (<em>tika-app-1.0.jar</em>) which we can download at the following URL address: <a href="http://tika.apache.org/">http://tika.apache.org/</a>. We place it in the same <em>tikaDir</em>&nbsp;directory and then we add the following entry to the <em>solrconfig.xml</em>&nbsp;file<em>:</em>
</p>
<pre class="brush:xml">&lt;lib dir="../tikaLib/" regex="tika-app-1.0.jar" /&gt;</pre>
<h3>Test documents</h3>
<p>For the testing purposes I decided to prepare three documents. The first was in English, the second one in Polish and the third one in German. Their content was downloaded from Wikipedia. They look as follows:</p>
<h4>tika_en.xml</h4>
<pre class="brush:xml">&lt;add&gt;
&lt;doc&gt;
  &lt;field name="id"&gt;1&lt;/field&gt;
  &lt;field name="title"&gt;Water&lt;/field&gt;
  &lt;field name="body"&gt;Water is a chemical substance with the chemical formula H2O. A water molecule contains one oxygen and two hydrogen atoms connected by covalent bonds. Water is a liquid at ambient conditions, but it often co-exists on Earth with its solid state, ice, and gaseous state (water vapor or steam). Water also exists in a liquid crystal state near hydrophilic surfaces.[1][2] Under nomenclature used to name chemical compounds, Dihydrogen monoxide is the scientific name for water, though it is almost never used.&lt;/field&gt;
&lt;/doc&gt;
&lt;/add&gt;</pre>
<h4>tika_pl.xml</h4>
<pre class="brush:xml">&lt;add&gt;
&lt;doc&gt;
  &lt;field name="id"&gt;2&lt;/field&gt;
  &lt;field name="title"&gt;Woda&lt;/field&gt;
  &lt;field name="body"&gt;Woda (tlenek wodoru; nazwa systematyczna IUPAC: oksydan) – związek chemiczny o wzorze H2O, występujący w warunkach standardowych w stanie ciekłym. W stanie gazowym wodę określa się mianem pary wodnej, a w stałym stanie skupienia – lodem. Słowo woda jako nazwa związku chemicznego może się odnosić do każdego stanu skupienia.&lt;/field&gt;
&lt;/doc&gt;
&lt;/add&gt;</pre>
<h4>tika_de.xml</h4>
<pre class="brush:xml">&lt;add&gt;
&lt;doc&gt;
  &lt;field name="id"&gt;3&lt;/field&gt;
  &lt;field name="title"&gt;Wasser&lt;/field&gt;
  &lt;field name="body"&gt;Wasser (H2O) ist eine chemische Verbindung aus den Elementen Sauerstoff (O) und Wasserstoff (H). Wasser ist die einzige chemische Verbindung auf der Erde, die in der Natur in allen drei Aggregatzuständen vorkommt. Die Bezeichnung Wasser wird dabei besonders für den flüssigen Aggregatzustand verwendet. Im festen (gefrorenen) Zustand spricht man von Eis, im gasförmigen Zustand von Wasserdampf.&lt;/field&gt;
&lt;/doc&gt;
&lt;/add&gt;</pre>
<h3>More testing</h3>
<p>To index the data I used the following shell commands:
</p>
<pre class="brush:xml">curl 'http://localhost:8983/solr/update?update.chain=langid' --data-binary @tika_pl.xml -H 'Content-type:application/xml'
curl 'http://localhost:8983/solr/update?update.chain=langid' --data-binary @tika_en.xml -H 'Content-type:application/xml'
curl 'http://localhost:8983/solr/update?update.chain=langid' --data-binary @tika_de.xml -H 'Content-type:application/xml'
curl 'http://localhost:8983/solr/update?update.chain=langid' --data-binary '&lt;commit/&gt;' -H 'Content-type:application/xml'</pre>
<p>It is worth to notice the additional <em>update.chain=langid</em>&nbsp;parameter added to the request. This parameter is used to tell Solr which update processor to use when indexing the data. In the example we told Solr that it should use our defined update processor.</p>
<h3>Indexed data</h3>
<p>So let&#8217;s have a look at the indexed data. We will do that by running the following query: <em>q=*:*&amp;indent=true</em>.
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;lst name="responseHeader"&gt;
  &lt;int name="status"&gt;0&lt;/int&gt;
  &lt;int name="QTime"&gt;0&lt;/int&gt;
  &lt;lst name="params"&gt;
    &lt;str name="indent"&gt;true&lt;/str&gt;
    &lt;str name="q"&gt;*:*&lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;result name="response" numFound="3" start="0"&gt;
  &lt;doc&gt;
    &lt;str name="body"&gt;Woda (tlenek wodoru; nazwa systematyczna IUPAC: oksydan) – związek chemiczny o wzorze H2O, występujący w warunkach standardowych w stanie ciekłym. W stanie gazowym wodę określa się mianem pary wodnej, a w stałym stanie skupienia – lodem. Słowo woda jako nazwa związku chemicznego może się odnosić do każdego stanu skupienia.&lt;/str&gt;
    &lt;str name="id"&gt;2&lt;/str&gt;
    &lt;str name="lang"&gt;pl&lt;/str&gt;
    &lt;str name="title"&gt;Woda&lt;/str&gt;
  &lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name="body"&gt;Water is a chemical substance with the chemical formula H2O. A water molecule contains one oxygen and two hydrogen atoms connected by covalent bonds. Water is a liquid at ambient conditions, but it often co-exists on Earth with its solid state, ice, and gaseous state (water vapor or steam). Water also exists in a liquid crystal state near hydrophilic surfaces.[1][2] Under nomenclature used to name chemical compounds, Dihydrogen monoxide is the scientific name for water, though it is almost never used.&lt;/str&gt;
    &lt;str name="id"&gt;1&lt;/str&gt;
    &lt;str name="lang"&gt;en&lt;/str&gt;
    &lt;str name="title"&gt;Water&lt;/str&gt;
  &lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name="body"&gt;Wasser (H2O) ist eine chemische Verbindung aus den Elementen Sauerstoff (O) und Wasserstoff (H). Wasser ist die einzige chemische Verbindung auf der Erde, die in der Natur in allen drei Aggregatzuständen vorkommt. Die Bezeichnung Wasser wird dabei besonders für den flüssigen Aggregatzustand verwendet. Im festen (gefrorenen) Zustand spricht man von Eis, im gasförmigen Zustand von Wasserdampf.&lt;/str&gt;
    &lt;str name="id"&gt;3&lt;/str&gt;
    &lt;str name="lang"&gt;de&lt;/str&gt;
    &lt;str name="title"&gt;Wasser&lt;/str&gt;
  &lt;/doc&gt;
&lt;/result&gt;
&lt;/response&gt;</pre>
<p>As you can see, Solr with the use of Tika, was able to identify the languages of the indexed documents. Of course, let&#8217;s not be too optimistic, because mistakes happen, especially when dealing with multi-language documents, but that&#8217;s understandable.</p>
<h3>To sum up</h3>
<p>You should remember, that the language identification feature is not perfect and can make mistakes. Also remember, that the longer the documents, the better the functionality will work. Of course the problem is that we can&#8217;t use the language identification during query time, but it&#8217;s not only problem with Solr and Tika. You can deal with that by identifying your user, it&#8217;s web browser or place he is located in.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2012/01/23/document-language-identification/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Solr 4.0: DocTransformers first look</title>
		<link>https://solr.pl/en/2011/12/05/solr-4-0-doctransformers-first-look/</link>
					<comments>https://solr.pl/en/2011/12/05/solr-4-0-doctransformers-first-look/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 05 Dec 2011 20:55:51 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[4.0]]></category>
		<category><![CDATA[doc]]></category>
		<category><![CDATA[document]]></category>
		<category><![CDATA[first]]></category>
		<category><![CDATA[first look]]></category>
		<category><![CDATA[look]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[transformer]]></category>
		<category><![CDATA[transformers]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=386</guid>

					<description><![CDATA[In todays entry we will look at the next feature that will come with version 4.0 of Apache Solr. We will look at the functionality which enables us to modify the fields in Solr result list. Do I need it]]></description>
										<content:encoded><![CDATA[<p>In todays entry we will look at the next feature that will come with version 4.0 of Apache Solr. We will look at the functionality which enables us to modify the fields in Solr result list.</p>
<p><span id="more-386"></span></p>
<h3>Do I need it ?</h3>
<p>Till now, we didn&#8217;t have much choice when it comes to the results returned by Solr. When Solr 4.0 will be published we will be given a new tool, so called <em>DocTransformers</em>. This feature enables us to modify the fields of the documents returned in the search results by Solr. Looking at what is available now we can for example change the names of the fields returned or mark the documents that were added by the <em>QueryElevationComponent</em>. Right now there are only a few implementation, but implementing your own <em>DocTranformer </em>is not hard.</p>
<h3>What is already available ?</h3>
<p>At the exact moment we are writing this, the following transformers are available:</p>
<ul>
<li>One that enables you to mark the documents that were added by the <em>QueryElevationComponent</em>.</li>
<li>One that enables you to add the explain information to the document.</li>
<li>One that enables you to add static value as a field of the document.</li>
<li>One that enables you to add the shard if from which the document was fetched.</li>
<li>One that enables you to add the <em>docid</em> as the document field (identifier used by Lucene).</li>
</ul>
<h3>How to use DocTransformers ?</h3>
<p>Lets look at how to use <em>DocTransformers</em>. To do that I&#8217;ve downloaded <em>trunk</em> version of Apache Solr (4.0) from the svn repository and I&#8217;ve run the example deployment. Next, I&#8217;ve indexed the example data and I&#8217;ve run the following query:
</p>
<pre class="brush:xml">http://localhost:8983/solr/select?q=encoded&amp;fl=name,score,[docid],[explain]</pre>
<p>If you look at the <em>fl</em> parameter you will notice that we told Solr that we want the <em>name</em> field in the results, the <em>score</em> of the document and two <em>DocTransformers</em>: <em>[docid]</em> and <em>[explain]</em>. In result I&#8217;ve got the following XML:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
 &lt;lst name="responseHeader"&gt;
  &lt;int name="status"&gt;0&lt;/int&gt;
  &lt;int name="QTime"&gt;2&lt;/int&gt;
  &lt;lst name="params"&gt;
    &lt;str name="q"&gt;encoded&lt;/str&gt;
    &lt;str name="fl"&gt;name,score,[docid],[explain]&lt;/str&gt;
  &lt;/lst&gt;
 &lt;/lst&gt;
 &lt;result name="response" numFound="2" start="0" maxScore="0.50524884"&gt;
 &lt;doc&gt;
  &lt;str name="name"&gt;Test with some GB18030 encoded characters&lt;/str&gt;
  &lt;float name="score"&gt;0.50524884&lt;/float&gt;
  &lt;int name="[docid]"&gt;0&lt;/int&gt;
  &lt;str name="[explain]"&gt;
  0.50524884 = (MATCH) weight(text:encoded in 0) [DefaultSimilarity], result of:
    0.50524884 = score(doc=0,freq=1.0 = termFreq=1), product of:
      1.0000001 = queryWeight, product of:
        3.2335923 = idf(docFreq=2, maxDocs=28)
        0.3092536 = queryNorm
      0.5052488 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1
        3.2335923 = idf(docFreq=2, maxDocs=28)
        0.15625 = fieldNorm(doc=0)
  &lt;/str&gt;
 &lt;/doc&gt;
 &lt;doc&gt;
  &lt;str name="name"&gt;Test with some UTF-8 encoded characters&lt;/str&gt;
  &lt;float name="score"&gt;0.4041991&lt;/float&gt;
  &lt;int name="[docid]"&gt;25&lt;/int&gt;
  &lt;str name="[explain]"&gt;
  0.4041991 = (MATCH) weight(text:encoded in 25) [DefaultSimilarity], result of:
    0.4041991 = score(doc=25,freq=1.0 = termFreq=1), product of:
      1.0000001 = queryWeight, product of:
        3.2335923 = idf(docFreq=2, maxDocs=28)
        0.3092536 = queryNorm
      0.40419903 = fieldWeight in 25, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1
        3.2335923 = idf(docFreq=2, maxDocs=28)
        0.125 = fieldNorm(doc=25)
  &lt;/str&gt;
 &lt;/doc&gt;
&lt;/result&gt;
&lt;/response&gt;</pre>
<p>As you can see, Solr did what we asked for.</p>
<h3>Your own implementation</h3>
<p>Let&#8217;s discuss, who to implement you own <em>DocTransformer</em>. Below, you have an example class named <em>RenameFieldsTransformer </em>from the <em>org.apache.solr.response.transform</em> package in Apache Solr source code. In general, all you have to do is override the following two methods from the <em>DocTransformer</em> class from <em>org.apache.solr.response.transform</em> package:</p>
<ul>
<li><code>String getName()</code> &#8211; method returning transformers name,</li>
<li><code>void transform(SolrDocument doc, int docid)</code> &#8211; method which makes the actual transformation.</li>
</ul>
<p>Implementation looks like this:
</p>
<pre class="brush:java">public class RenameFieldsTransformer extends DocTransformer {
 final NamedList&lt;String&gt; rename;

 public RenameFieldsTransformer( NamedList&lt;String&gt; rename ) {
  this.rename = rename;
 }

 @Override
 public String getName() {
  StringBuilder str = new StringBuilder();
  str.append( "Rename[" );
  for( int i=0; i&lt; rename.size(); i++ ) {
   if( i &gt; 0 ) {
    str.append( "," );
   }
   str.append( rename.getName(i) ).append( "&gt;&gt;" ).append( rename.getVal( i ) );
  }
  str.append( "]" );
  return str.toString();
 }

 @Override
 public void transform(SolrDocument doc, int docid) {
  for( int i=0; i&lt;rename.size(); i++ ) {
   Object v = doc.remove( rename.getName(i) );
   if( v != null ) {
    doc.setField(rename.getVal(i), v);
   }
  }
 }
}</pre>
<p>The code shown above enables us to rename the fields returned in the results. As you can see the <em>transform</em> method iterates through all the values in <em>rename</em> class variable. The <em>rename</em> variable consist of name value pairs which are field name and the name it should have after the transformation. You must also remember that in order to use your own transformer you need to add it&#8217;s configuration to the <em>solrconfig.xml</em> file. Here is the example which can be found on Solr wiki page:
</p>
<pre class="brush:xml">&lt;transformer name="elevated" class="org.apache.solr.response.transform.EditorialMarkerFactory" /&gt;</pre>
<h3>To sum up</h3>
<p>You should remember that the describes functionality is marked as experimental and can change its behavior when Lucene and Solr 4.0 will be released. We will get back to this topic as soon as Solr 4.0 will be released.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2011/12/05/solr-4-0-doctransformers-first-look/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Optimization &#8211; document cache</title>
		<link>https://solr.pl/en/2011/08/29/optimization-document-cache/</link>
					<comments>https://solr.pl/en/2011/08/29/optimization-document-cache/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 29 Aug 2011 19:48:04 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[cache]]></category>
		<category><![CDATA[document]]></category>
		<category><![CDATA[document cache]]></category>
		<category><![CDATA[documentCache]]></category>
		<category><![CDATA[filter]]></category>
		<category><![CDATA[solr]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=366</guid>

					<description><![CDATA[A few months ago (here) we looked at filterCache. I&#8217;ve decided to update the optimization topic and take a look at the documentCache. What it contains ? So let&#8217;s start with information about the information that documentCache holds. So documentCache]]></description>
										<content:encoded><![CDATA[<p>A few months ago (<a href="http://solr.pl/en/2011/02/07/optimization-filter-cache/" target="_blank" rel="noopener noreferrer">here</a>) we looked at <em>filterCache</em>. I&#8217;ve decided to update the optimization topic and take a look at the <em>documentCache</em>.</p>
<p><span id="more-366"></span></p>
<h3>What it contains ?</h3>
<p>So let&#8217;s start with information about the information that <em>documentCache </em>holds. So <em>documentCache</em> contain Lucene documents that were fetched from the index. So little and so much.</p>
<h3>What it is used for ?</h3>
<p>Every object (Lucene document) stored in <em>documentCache</em> contains a list of references to the fields, that are stored with the document. Thanks to this, when a document is fetched and put into the cache it doesn&#8217;t have to be fetched again while processing another query. And this is why the number of I/O operations is reduces when rendering the query results list.</p>
<h3>What to remember when using documentCache ?</h3>
<p>When using <em>documentCache</em> you have to remember about to important things:</p>
<ol>
<li><em>documentCache</em> can&#8217;t be autowarmed because it operates on identifiers that change after every <em>commit </em>operation.</li>
<li>If you use lazy field loading (<em>enableLazyFieldLoading=true</em>) <em>documentCache</em> functionality is somehow limited. This means that the document stored in the <em>documentCache</em> will contain only those fields that were passed to the <em>fl </em>parameter. If the next query will try to get additional fields for the document stored in the cache, those additional fields will be fetched from the index.</li>
</ol>
<h3>Definition</h3>
<p>The standard <em>documentCache </em>definition looks like this:
</p>
<pre class="brush:xml">&lt;documentCache
      class="solr.FastLRUCache"
      size="16384"
      initialSize="16384"/&gt;</pre>
<p>Let&#8217;s recall those parameters:</p>
<ul>
<li><em>class</em> &#8211; class implementing the cache,</li>
<li><em>size</em> &#8211; the maximum cache size,</li>
<li><em>initialSize</em> &#8211; initial size of the cache.</li>
</ul>
<h3>How to configure ?</h3>
<p>The usual question about cache &#8211; what size should I set ? According to the information from Solr wiki (<a href="http://wiki.apache.org/solr/SolrCaching#documentCache" target="_blank" rel="noopener noreferrer">http://wiki.apache.org/solr/SolrCaching#documentCache</a>), the maximum size shouldn&#8217;t be less than the product of concurrent queries and the maximum number of documents fetched by the query. A simple relation that should ensure that Solr won&#8217;t have to fetch documents from the index during query processing.</p>
<h3>Last few words</h3>
<p>In the case of <em>documentCache</em> we don&#8217;t have to worry about how we construct our queries to properly use this cache. But please remember that <em>documentCache</em> requires memory, the more memory, the more field you stored in the index.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2011/08/29/optimization-document-cache/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
