<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>field &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/field-2/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Sat, 14 Nov 2020 15:12:30 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>Category Routed Aliases</title>
		<link>https://solr.pl/en/2019/10/21/category-routed-aliases/</link>
					<comments>https://solr.pl/en/2019/10/21/category-routed-aliases/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 21 Oct 2019 14:12:01 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[collection]]></category>
		<category><![CDATA[field]]></category>
		<category><![CDATA[routed]]></category>
		<category><![CDATA[routing]]></category>
		<category><![CDATA[solr]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=1004</guid>

					<description><![CDATA[Through the lifetime of Solr we were given the possibility to work with cores, then collections and finally aliases &#8211; the alternative names for collections. Aliases allow the user to give your collection a new, virtual name and group multiple]]></description>
										<content:encoded><![CDATA[
<p>Through the lifetime of Solr we were given the possibility to work with cores, then collections and finally aliases &#8211; the alternative names for collections. Aliases allow the user to give your collection a new, virtual name and group multiple collections under that single virtual name. This allows isolation of the real collection name from the name that the client application is using. That allows changing the collection in the background without the need of bringing down the whole cluster and make your application or product unavailable. In Solr we have the option to use two aliases groups:</p>



<span id="more-1004"></span>



<ul class="wp-block-list"><li>standard aliases that group collections under a virtual name<g class="gr_ gr_6 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" id="6" data-gr-id="6">,</g> </li><li>routed aliases that route your requests. </li></ul>



<p>The routed aliases can be further divided into two categories &#8211; time routed aliases and category routed aliases. We will be looking into the second category in this blog post.</p>



<h2 class="wp-block-heading">Why Do I Need Category Routed Aliases?</h2>



<p>The problem of the standard aliases is the indexing limitation. If you will create an alias that covers two collections, we will not be able to really control where the data will be put &#8211; i.e., in Solr 8.2 the first collection from the alias will be used. Let me demonstrate you how that works. </p>



<p>First, let&#8217;s create two collections using API v2. We will start with the first collection:</p>



<pre class="wp-block-code"><code class="">curl -XPOST -H 'Content-type:application/json' 'http://localhost:8983/v2/c' -d '{ 
 "create" : {
  "name" : "test_1",
  "numShards" : 1,
  "replicationFactor" : 1
 }
}'</code></pre>



<p>And then the second collection:</p>



<pre class="wp-block-code"><code class="">curl -XPOST -H 'Content-type:application/json' 'http://localhost:8983/v2/c' -d '{ 
 "create" : {
  "name" : "test_2",
  "numShards" : 1,
  "replicationFactor" : 1
 }
}'</code></pre>



<p>So we have two collections one called <em>test_1</em> and the second one called <em>test_2</em>. We should create an alias grouping those two collections called <em>test</em>. We can do that using the following command:</p>



<pre class="wp-block-code"><code class="">curl -XPOST -H 'Content-type:application/json' 'http://localhost:8983/v2/c' -d '{ 
 "create-alias" : {
  "name" : "test",
  "collections" : [ "test_1", "test_2" ]
 }
}'</code></pre>



<p>Now let&#8217;s index the document by using the following command:</p>



<pre class="wp-block-code"><code class="">curl -XPOST -H 'Content-type:application/json' 'http://localhost:8983/solr/test/update?commit=true' -d '[
 {
  "id" : 1,
  "name" : "Test indexing"
 }
]'</code></pre>



<p>That command will be successful in Solr 8.2 and the document will be put into the <em>test_1</em> collection, you can check it yourself <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> </p>



<p>And of course this is not the only problem. If you would like to physically separate the data of multiple tenants you need to prepare your application that handles sending data to Solr to do that. You could still use the alias for reading part though.</p>



<p>To overcome that problem we can use category routed aliases and let SolrCloud do the work. </p>



<h2 class="wp-block-heading">Category Routed Aliases</h2>



<p>The idea behind the category routed aliases is to manage the alias and the collections that are grouped using it by using a value inside a certain, defined field. Given that we basically get partitioning based on a certain field &#8211; for example on the company name. That way, each company that we store index data for will be able to have their own collection and we will not have to worry about that manually during indexing or querying time.</p>



<figure class="wp-block-image"><img decoding="async" src="https://solr.pl/wp-content/uploads/2019/10/Category-Routing-Routing.png" alt="" class="wp-image-4738"/><figcaption>Simplified view over category routed aliases</figcaption></figure>



<h2 class="wp-block-heading">Creating Category Routed Aliases</h2>



<p>The process of creation of the category routed aliases is slightly different from what we did with the standard aliases. We do not start with the creation of the collections &#8211; those will be created automatically. Instead we are starting with alias creation. </p>



<p>Let&#8217;s create our first category based alias by using the following command:</p>



<pre class="wp-block-code"><code class="">curl -XPOST -H 'Content-type:application/json' 'http://localhost:8983/v2/c' -d '{ 
 "create-alias" : {
  "name" : "test_cra_company",
  "router" : {
   "name" : "category",
   "field" : "company_name",
   "maxCardinality" : 2
  },
  "create-collection" : {
   "numShards" : "1",
   "replicationFactor" : "1",
   "config" : "_default"
  }
 }
}'</code></pre>



<p>Let&#8217;s stop for a minute here. We&#8217;ve used the <em>create-alias</em> command to create a new alias called <em>test_cra_company</em>. We provided a few properties here. First of all you ca see the <em>router</em> object that includes three properties:</p>



<ul class="wp-block-list"><li><em>name</em> &#8211; the name of the router that can be used. For now Solr supports <em>time</em> and <em>category</em>. The <em>time</em> creates time-routed alias, while the <em>category</em> created the category based alias. </li><li><em>field</em> &#8211; the name of the field that will be used for routing. </li><li><em>maxCardinality</em> &#8211; the maximum number of unique values that the field can take to avoid creating too many collections.</li></ul>



<p>Because the routed aliases will create collections in the background we need to provide the properties for the collections. We do that using the <em>create-collection</em> object in our request. We provided the number of shards, the replication factor and the name of the configuration that will be used. </p>



<p><strong>In the examples, for <g class="gr_ gr_4 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Punctuation only-ins replaceWithoutSep" id="4" data-gr-id="4">simplicity</g> we are using the default, data-driven schema. This shouldn&#8217;t be used for production when using time or category routed aliases.</strong></p>



<h2 class="wp-block-heading">How <g class="gr_ gr_3 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar multiReplace" id="3" data-gr-id="3">Does</g> the Updates in the Routed Aliases Work Internally?</h2>



<p>When Solr processes an update request an UpdateRequestProcessor is initialized. In the SolrCloud, like in our case, the DistributedUpdateProcessor is initialized, at least that is the case when we use standard aliases. When the time or category routed aliases are used the RoutedAliasUpdateProcessor is used before the actual DistributedUpdateProcessor. It is done automatically and doesn&#8217;t have to be done manually. Then the RoutedAliasUpdateProcessor is responsible for routing the data to the appropriate collection and as we saw if we don&#8217;t have any kind of routed aliases Solr will just use the first collection on the alias list and use it to perform the update operation.</p>



<p>What you should also know is that there will be a special place-holder collection created <g class="gr_ gr_8 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" id="8" data-gr-id="8">called </g><code>test_cra_company__CRA__NEW_CATEGORY_ROUTED_ALIAS_WAITING_FOR_DATA__TEMP</code><g class="gr_ gr_8 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" id="8" data-gr-id="8"> that</g> will be eventually deleted and the collections that are created in the background will be <g class="gr_ gr_11 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" id="11" data-gr-id="11">named </g><code>test_cra_company__CRA__&lt;VALUE_OF_THE_FIELD&gt;</code><g class="gr_ gr_11 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" id="11" data-gr-id="11">,</g> so for <g class="gr_ gr_9 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" id="9" data-gr-id="9">example </g><code>test_cra_company__CRA__company1</code><g class="gr_ gr_9 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" id="9" data-gr-id="9"> if</g> <g class="gr_ gr_10 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" id="10" data-gr-id="10">the </g><code>company_name</code><g class="gr_ gr_10 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" id="10" data-gr-id="10"> field</g> will have the value of <g class="gr_ gr_12 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" id="12" data-gr-id="12">the </g><code>company1</code><g class="gr_ gr_12 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" id="12" data-gr-id="12">.</g> This provides naming limitations, but we will talk about it later in the blog post.</p>



<h2 class="wp-block-heading">Indexing Data With Routed Aliases</h2>



<p>Let&#8217;s now see the alias in action. To do that we will index two documents using the following commands:</p>



<pre class="wp-block-code"><code class="">curl -XPOST -H 'Content-type:application/json' 'http://localhost:8983/solr/test_cra_company/update?commit=true' -d '[
 {
  "id" : 1,
  "name" : "Test indexing",
  "company_name" : "company1"
 }
]'</code></pre>



<pre class="wp-block-code"><code class="">curl -XPOST -H 'Content-type:application/json' 'http://localhost:8983/solr/test_cra_company/update?commit=true' -d '[
 {
  "id" : 2,
  "name" : "Test indexing",
  "company_name" : "company2"
 }
]'</code></pre>



<p>Remember the <em>maxCardinality</em> property that we&#8217;ve set to the value of <em>2</em>? If we will try to use a third company name, for example using the following command:</p>



<pre class="wp-block-code"><code class="">curl -XPOST -H 'Content-type:application/json' 'http://localhost:8983/solr/test_cra_company/update?commit=true' -d '[
 {
  "id" : 3,
  "name" : "Test indexing",
  "company_name" : "company3"
 }
]'</code></pre>



<p>Solr will get back to us with the following error:</p>



<pre class="wp-block-code"><code class="">{
  "responseHeader":{
    "status":400,
    "QTime":1},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"Max cardinality 2 reached for Category Routed Alias: test_cra_company",
    "code":400}}</code></pre>



<h2 class="wp-block-heading">Queries</h2>



<p>We can also try to search on our data, for example using a match all query:</p>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/test_cra_company/select?q=*:*</code></pre>



<p>Solr will return both of the indexed documents:</p>



<pre class="wp-block-code"><code class="">{
  "responseHeader": {
    "zkConnected": true,
    "status": 0,
    "QTime": 12,
    "params": {
      "q": "*:*"
    }
  },
  "response": {
    "numFound": 2,
    "start": 0,
    "maxScore": 1,
    "docs": [
      {
        "id": "1",
        "name": [
          "Test indexing"
        ],
        "company_name": [
          "company1"
        ],
        "_version_": 1647547697554522000
      },
      {
        "id": "2",
        "name": [
          "Test indexing"
        ],
        "company_name": [
          "company2"
        ],
        "_version_": 1647547721335177200
      }
    ]
  }
}</code></pre>



<p>We can also filter our query as we would usually do:</p>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/test_cra_company/select?q=*:*&amp;fq=company_name:company2</code></pre>



<p>And the response will contain only a single document, which is expected:</p>



<pre class="wp-block-code"><code class="">{
  "responseHeader": {
    "zkConnected": true,
    "status": 0,
    "QTime": 25,
    "params": {
      "q": "*:*",
      "fq": "company_name:company2"
    }
  },
  "response": {
    "numFound": 1,
    "start": 0,
    "maxScore": 1,
    "docs": [
      {
        "id": "2",
        "name": [
          "Test indexing"
        ],
        "company_name": [
          "company2"
        ],
        "_version_": 1647547721335177200
      }
    ]
  }
}</code></pre>



<p>As we can see, everything works as intended. </p>



<p>The collections that were created during our simple test looks as follows:</p>



<figure class="wp-block-image"><img decoding="async" src="https://solr.pl/wp-content/uploads/2019/10/Screenshot-2019-10-16-at-14.44.43-1024x82.png" alt="" class="wp-image-4735"/></figure>



<h2 class="wp-block-heading">Limitations</h2>



<p>There are a few limitations when it comes to the routed aliases, especially the category routed aliases. </p>



<p>The first limitation is naming. The values of our field that we use for routing needs to be ASCII based. Otherwise Solr will not be able to use the name for the collection name and we will run into issues. You should remember about that.</p>



<p>The second thing is deletion of the alias or collections that are handled. There is no automated way of doing that, so you can&#8217;t easily remove a category. So the procedure of removing a category would be:</p>



<ul class="wp-block-list"><li>Ensuring that there will not be more document with the category that we want to remove.</li><li>Modifying the alias definition in Zookeeper by removing the collection that is responsible for handling the data of the category that we want to remove. </li><li>Delete the collection using Solr API &#8211; you need to remove it from the alias first, otherwise Solr will fail to delete it.</li></ul>



<p>Currently (in Solr 8.2 as of writing this post) the update is distributed to an <g class="gr_ gr_5 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar only-ins doubleReplace replaceWithoutSep" id="5" data-gr-id="5">appropriate</g> collection based on the value of the category, while the query is run against all the collections defined by the alias. Improving the query time execution is mentioned as one of the possible improvements for the routed aliases feature in Solr.</p>



<p>Finally remember that the collection creation takes time, usually up to 3 seconds, depending on how loaded your SolrCloud cluster is. Take that in mind when designing and implementing a system that will use Solr with routed aliases. Your system needs to be able to handle a longer delay when a first value will appear in the document leading to collection creation.</p>



<h2 class="wp-block-heading">Summary</h2>



<p>As you can see the category routed aliases provide a very nice and convenient way of automatically creating collections based on the value of a field. So if we need that and we want Solr to take care of that for us &#8211; this is one of the ways to go, especially in the newer Solr versions. Is the feature perfect &#8211; no. Is there a room for improvement &#8211; yes and there are already possible improvements that can be done and are mentioned in the official documentation of Solr. Hopefully we will see them in the next Solr versions. </p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2019/10/21/category-routed-aliases/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Solr 4.1: Stored fields compression</title>
		<link>https://solr.pl/en/2012/11/19/solr-4-1-stored-fields-compression/</link>
					<comments>https://solr.pl/en/2012/11/19/solr-4-1-stored-fields-compression/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 19 Nov 2012 22:53:02 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[4.1]]></category>
		<category><![CDATA[compres]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[field]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[stored]]></category>
		<category><![CDATA[stored field]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=478</guid>

					<description><![CDATA[Despite the fact that Lucene and Solr 4.0 is still very fresh we decided that its time to take a look at the changes in the approaching 4.1 version. One of those changes will be stored fields compression to decrease]]></description>
										<content:encoded><![CDATA[<p>Despite the fact that Lucene and Solr 4.0 is still very fresh we decided that its time to take a look at the changes in the approaching 4.1 version. One of those changes will be stored fields compression to decrease the size of the index size, when we use such fields. Let&#8217;s take a look and see how that works.</p>
<p><span id="more-478"></span></p>
<h3>Some theory</h3>
<p>In case our index consists of many stored fields they can be consuming most space when compared to other information in the index. How to know how much space the <em>stored</em> fields take ? Its easy &#8211; just go to the directory that holds your index and check how much space the files with <em>.fdt</em> extension takes. Despite the fact, that stored fields don&#8217;t influence the search performance directly, your I/O subsystem and its cache can be forced to work much harder because of the larger amount of data on the disk. Because of that your queries can be executed longer and you may need more time to index your data.</p>
<p>With the incoming release of Lucene 4.1 stored fields will be compressed with the use of LZ4 algorithm (<a href="http://code.google.com/p/lz4/">http://code.google.com/p/lz4/</a>), which should decrease the size of the index when we use high number of stored fields, but also shouldn&#8217;t be CPU demanding when it comes to compression and decompression.</p>
<h3>Test data</h3>
<p>For the discussed functionality tests we&#8217;ve used Polish Wikipedia articles data, from 2012.11.10 (<a href="http://dumps.wikimedia.org/plwiki/20121110/plwiki-20121110-pages-articles.xml.bz2" target="_blank" rel="noopener noreferrer">http://dumps.wikimedia.org/plwiki/20121110/plwiki-20121110-pages-articles.xml.bz2</a>). Unpacked XML file was about 4.7GB on disk.</p>
<h3>Index structure</h3>
<p>We&#8217;ve used the following index structure to index the above data:
</p>
<pre class="brush:xml">&lt;field name="id" type="string"  indexed="true" stored="true" required="true"/&gt;
&lt;field name="title" type="text" indexed="true" stored="true"/&gt;
&lt;field name="revision" type="int" indexed="true" stored="true"/&gt;
&lt;field name="user" type="string" indexed="true" stored="true"/&gt;
&lt;field name="userId" type="int" indexed="true" stored="true"/&gt;
&lt;field name="text" type="text" indexed="true" stored="true"/&gt;
&lt;field name="timestamp" type="date" indexed="true" stored="true"/&gt;
&lt;field name="_version_" type="long" indexed="true" stored="true"/&gt;</pre>
<h3>DIH configuration</h3>
<p>We&#8217;ve used the following DIH configuration in order to index Wikipedia data:
</p>
<pre class="brush:xml">&lt;dataConfig&gt;
 &lt;dataSource type="FileDataSource" encoding="UTF-8" /&gt;
 &lt;document&gt;
  &lt;entity name="page" processor="XPathEntityProcessor" stream="true" forEach="/mediawiki/page/" url="/home/data/wikipedia/plwiki-20121110-pages-articles.xml" transformer="RegexTransformer,DateFormatTransformer"&gt;
   &lt;field column="id" xpath="/mediawiki/page/id" /&gt;
   &lt;field column="title" xpath="/mediawiki/page/title" /&gt;
   &lt;field column="revision" xpath="/mediawiki/page/revision/id" /&gt;
   &lt;field column="user" xpath="/mediawiki/page/revision/contributor/username" /&gt;
   &lt;field column="userId" xpath="/mediawiki/page/revision/contributor/id" /&gt;
   &lt;field column="text" xpath="/mediawiki/page/revision/text" /&gt;
   &lt;field column="timestamp" xpath="/mediawiki/page/revision/timestamp" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" /&gt;
   &lt;field column="$skipDoc" regex="^#REDIRECT .*" replaceWith="true" sourceColName="text"/&gt;
  &lt;/entity&gt;
 &lt;/document&gt;
&lt;/dataConfig&gt;</pre>
<h3>Indexing time</h3>
<p>In both cases indexing time was very similar, for the same amount of documents (there was 1.301.394 documents after indexing). In case of <strong>Solr 4.0</strong> indexing took <strong>14 minutes and 33 seconds</strong>. In case of <strong>Solr 4.1</strong> indexing took <strong>14 minutes and 43 seconds</strong>. As you can see Solr 4.1 was slightly slower, but because I made the tests on my laptop, we can assume that the indexing performance is very similar.</p>
<h3>Index size</h3>
<p>The size of the index is what interest us the most in this case. In case of <strong>Solr</strong> <strong>4.0 </strong>the index created with the Wikipedia data was about <strong>5.1GB</strong> &#8211; <strong>5.464.809.863</strong> bytes. In case of Solr 4.1 the index weighted approximately <strong>3.24GB</strong> &#8211; <strong>3.480.457.399</strong> bytes. So when comparing index created by Solr 4.0 to the one created by Solr 4.1 we got about<strong> 35%</strong> smaller index.</p>
<h3>Wrapping up</h3>
<p>You can clearly see, that the gain from compressing stored fields is quite big. Despite the fact that we need additional CPU cycles for compression handling we benefit from less I/O subsystem pressure and we can be sure that the gain will be greater than the loss of a few CPU cycles. After seeing this I&#8217;m not wondering with the stored fields compressing is turned on by default in Lucene 4.1 and thus in Solr 4.1 too. However if you would like to turn off that behavior you&#8217;ll need to implement your own codec &#8211; one that doesn&#8217;t use compression, at least for now. However you don&#8217;t need to fork Lucene code to do that and this again shows how powerful the flexible indexing is.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2012/11/19/solr-4-1-stored-fields-compression/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Solr 3.6: CurrencyField</title>
		<link>https://solr.pl/en/2012/03/19/solr-3-6-currencyfield/</link>
					<comments>https://solr.pl/en/2012/03/19/solr-3-6-currencyfield/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 19 Mar 2012 22:43:17 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[currency]]></category>
		<category><![CDATA[currencyfield]]></category>
		<category><![CDATA[exchange]]></category>
		<category><![CDATA[field]]></category>
		<category><![CDATA[solr]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=447</guid>

					<description><![CDATA[The incoming Solr 3.6 will bring us an interesting feature in the form of currency handling. Some may ask &#8220;What for ? We can just use float and we can use it for currency handling&#8221;. So let&#8217;s take a look]]></description>
										<content:encoded><![CDATA[<p>The incoming Solr 3.6 will bring us an interesting feature in the form of currency handling. Some may ask &#8220;What for ? We can just use float and we can use it for currency handling&#8221;. So let&#8217;s take a look at <em>solr.CurrencyField</em> which will be presented in Solr 3.6.</p>
<p><span id="more-447"></span></p>
<h3>Configuration</h3>
<p>Lets start with the field type configuration, which is quite typical when it comes to Solr. To the <em>schema.xml </em>file we need to add another field type declaration, like the following:
</p>
<pre class="brush:xml">&lt;fieldType class="solr.CurrencyField" name="currencyField" defaultCurrency="USD" currencyConfig="currencyExchange.xml" /&gt;</pre>
<p>In the above configuration we see two additional attributes (compared to the usual type definition) that define <em>currencyField</em> behavior. First we see the <em>defaultCurrency</em> attribute, which define the default currency for a given field. It defines in which form data will be written into the index (change of this attribute value requires reindexing). The second attribute, the <em>currencyConfig</em> defines an exchange file. Its worth to remember that using the second parameter only makes sense with the default exchange provider (<em>FileExchangeRateProvided</em>) distributed with Solr. But let&#8217;s take a look at the <em>currencyExchange.xml</em> file:</p>
<h3>Exchange configuration file for FileExchangeRateProvider</h3>
<p>So now, lets look at the contents of the <em>currencyExchange.xml</em> file which provides the exchange rates for the default exchange provider:
</p>
<pre class="brush:xml">&lt;currencyConfig version="1.0"&gt;
 &lt;rates&gt;
  &lt;rate from="USD" to="PLN" rate="3.1"/&gt;
  &lt;rate from="EUR" to="PLN" rate="2.5"/&gt;
  &lt;rate from="USD" to="EUR" rate="2.5"/&gt;
 &lt;/rates&gt;
&lt;/currencyConfig&gt;</pre>
<p>As you can see the structure is quite simple, we define the base currency (<em>from</em>), output currency (<em>to</em>) and the exchange rate (<em>rate</em>). So it&#8217;s quite simple <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<h3>Data indexing</h3>
<p>In order to index the data with the defined <em>currencyField</em> we should specify the value and the currency prefixed with a comma character. For example:
</p>
<pre class="brush:xml">&lt;field name="price"&gt;21.99,EUR&lt;/field&gt;</pre>
<h3>Querying</h3>
<p>Querying is more or less like index. You have to pass two informations to Solr &#8211; the value and currency. Let&#8217;s look at the two example filters using currency, the first one with a simple term query and the second one a range query:
</p>
<pre class="brush:xml">fq=price:29.99,PLN</pre>
<pre class="brush:xml">fq=price:[10.00 TO 29.99,EUR]</pre>
<p>As you can see, after setting the value (or range) we have to specify the comma character and the currency we are interested in. Whats more, we are not bound to the default currency &#8211; we can query other currencies as long as we have the exchange rate defined. This means that Solr will make the exchange calculation for us. Please remember, that for now, the values returned in the results will only contain the default currency and there is no way to change that behavior.</p>
<h3>Your own currency provider</h3>
<p>In addition to the default exchange provider, Solr will enable us to plug in our own implementation. In order to do that we need to develop a class that implements&nbsp;<em>org.apache.solr.schema.ExchangeRateProvider</em> interface and let Solr know by passing the class name as the value of <em>providerClass</em> attribute defined for our currency type. Assuming that we have the class <em>pl.solr.schema.DynamicRateExchangeProvider</em> that implements the mentioned interface and we would like to use that class, our field type definition would look something like that:
</p>
<pre class="brush:xml">&lt;fieldType class="solr.CurrencyField" name="currencyField" defaultCurrency="USD" providerClass="pl.solr.schema.DynamicRateExchangeProvider" /&gt;</pre>
<p>I like this feature as we are not bound to XML exchange file, but we can develop our own implementation which for example use webservice to dynamically read data from some external source.</p>
<h3>What&#8217;s left to implement ?</h3>
<p>When the post was published you could not use range faceting on fields based on <em>CurrencyField</em> type.</p>
<h3>To sum up</h3>
<p>In my opinion the <em>CurrencyField </em>is a functionality worth waiting for. Instead of calculating exchange in our application we can let Solr do that for us. In addition to that we will be able to plug in our own implementations of exchange providers, which will enable us to write connectors to different systems and just watch how Solr does all the work for us <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2012/03/19/solr-3-6-currencyfield/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Solr 4.0: new fl parameter functionalities &#8211; first look</title>
		<link>https://solr.pl/en/2011/11/22/solr-4-0-new-fl-parameter-functionalities-first-look/</link>
					<comments>https://solr.pl/en/2011/11/22/solr-4-0-new-fl-parameter-functionalities-first-look/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Tue, 22 Nov 2011 20:54:35 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[.0]]></category>
		<category><![CDATA[alias]]></category>
		<category><![CDATA[field]]></category>
		<category><![CDATA[fields]]></category>
		<category><![CDATA[fl]]></category>
		<category><![CDATA[function]]></category>
		<category><![CDATA[pseudo]]></category>
		<category><![CDATA[pseudo fields]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=382</guid>

					<description><![CDATA[In connection with the work of slowly upcoming release of Apache Solr version 4.0 I thought that it is time to bring some light on the functionalities that you will get into your own hands with the release of Apache]]></description>
										<content:encoded><![CDATA[<p>In connection with the work of slowly upcoming release of Apache Solr version 4.0 I thought that it is time to bring some light on the functionalities that you will get into your own hands with the release of Apache Solr 4.0. The first change we will look at is a simple, but albeit useful functionality called <em>pseudo fields</em>, together with additional features related to the <em>fl</em> parameter.</p>
<p><span id="more-382"></span></p>
<h3>Lets begin</h3>
<p>The Apache Solr 4.0 has changed slightly how <em>fl</em> parameter can be used. In 4.0 this parameters can be added to the query multiple times and Solr will take all the values into consideration. Sometimes it will be useful, at least in my case.</p>
<h3>Custom field names</h3>
<p>In Solr 4.0 you will be able to rename fields that are returned in the results. Imagine that, depending on the query we would like rename fields like <em>price_en</em>, <em>price_pl</em> or <em>price_fr</em> to price. In Solr 4.0, we can do this by placing the following query:
</p>
<pre class="brush:xml">fl=price:price_pl</pre>
<p>This will cause the field <em>price_pl</em> to be returned as <em>price</em>.</p>
<h3>All fields with a common name start</h3>
<p>If we want Solr to return the all fields whose name starts with the word price (useful for dynamic fields) all we will need to do is add the following parameter to the query:
</p>
<pre class="brush:xml">fl=price*</pre>
<h3>Returning function values</h3>
<p>The last functionality, which we will look at today, is the ability add the result of the function, as a field in the documents returned by Solr. Thus, in Solr 4.0 will have the option to add such values as sum of prices, or calculated distance between two points. Quite useful. To use this functionality You will have to add the appropriate function call to the <em>fl</em> parameter for example:
</p>
<pre class="brush:xml">fl=*,stock:sum(stockMain,stockShop)</pre>
<p>The above will result in Solr returning all the fields for the document (value *) and a field named <em>stock</em>, which will be the sum of two fields: <em>stockShop</em> and <em>stockMain</em>.</p>
<h3>A few words at the end</h3>
<p>In addition to the new features mentioned above, there is one more thing that is connected to the parameter <em>fl</em> &#8211; the DocTransformer. However I decided to leave it for a separate blog post.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2011/11/22/solr-4-0-new-fl-parameter-functionalities-first-look/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Quick look &#8211; FieldCollapsing</title>
		<link>https://solr.pl/en/2010/09/20/quick-look-fieldcollapsing/</link>
					<comments>https://solr.pl/en/2010/09/20/quick-look-fieldcollapsing/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 20 Sep 2010 12:12:39 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[4.0]]></category>
		<category><![CDATA[collapsing]]></category>
		<category><![CDATA[field]]></category>
		<category><![CDATA[fieldcollapsing]]></category>
		<category><![CDATA[grouping]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[lucene 4.0]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[solr 4.0]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=77</guid>

					<description><![CDATA[FieldCollapsing, or in other words grouping of search results has just been commited to the svn repository. I decided to take a look at this functionality and see how it works. I want to begin with brief information &#8211; FieldCollapsing]]></description>
										<content:encoded><![CDATA[<p>FieldCollapsing, or in other words grouping of search results has just been commited to the svn repository. I decided to take a look at this  functionality and see how it works.</p>
<p><span id="more-77"></span></p>
<p>I  want to begin with brief information &#8211; FieldCollapsing is only  available in version 4.0 of Solr, which is a development version of Solr  project, and it&#8217;s rather unlikely to be transfered to version 3.X.</p>
<h3>FieldCollapsing &#8211; what is it ?</h3>
<p>Imagine that our index contains information about companies from different cities. We  want to show our users one (or, for example two or three) companies in  each city, of course, the companies that meet the search criteria. How to do that &#8211; just use the FieldCollapsing mechanism. It allows the returned results to be grouped based on field contents. The search results can be grouped into a single document, or a fixed quantity of documents.</p>
<h3>Parameters</h3>
<p>Similarly,  as with most features available in Solr, the behavior of  FieldCollapsing mechanism can be configured through a number of  parameters, here they are:</p>
<ul>
<li> <em>group </em>&#8211; setting this parameter to true enables FieldCollapsing mechanism. The default value is <em>false</em>.</li>
<li><em>group.field</em> &#8211; this parameter determines on the contents of what field grouping is going to take place.</li>
<li><em>group.func </em>&#8211; definition of function, based on the outcome of which grouping will be made.</li>
<li><em>group.limit</em> &#8211; the number of documents returned in each group. The default is 1.</li>
<li><em>group.sort </em>&#8211; parameter specifying how to sort the documents in groups. The default value is the value <em>score desc</em>.</li>
</ul>
<p>It  is worth noting that the rows parameter passed to the query will  determine the number of groups to be returned in search results not the  amount of individual documents. Sort parameter behaviour is also changed. This parameter will tell Solr how to sort groups not individual documents. Groups wil be sorted based on the content of fields of the first documents in every group.</p>
<h3>Search Results</h3>
<p>Search results are different from those to which we are accustomed. They are grouped according to the parameters that we have passed. The  main element of the search results are no longer documents &#8211; when we  use FieldCollapsing the main search result element is a group of  documents. Within the groups the documents are shown (their number is defined by group.limit parameter). For example, making the following query:
</p>
<pre class="brush:xml">http://localhost:8983/solr/select/?q=*:*&amp;group=true&amp;group.field=instock&amp;indent=true</pre>
<p>to  Solr which index  was created by indexing all documents in XML format  from a catalog <em>exampledocs </em>will result in getting the following  response:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;lst name="responseHeader"&gt;
  &lt;int name="status"&gt;0&lt;/int&gt;
  &lt;int name="QTime"&gt;0&lt;/int&gt;
  &lt;lst name="params"&gt;
    &lt;str name="group.field"&gt;inStock&lt;/str&gt;
    &lt;str name="group"&gt;true&lt;/str&gt;
    &lt;str name="indent"&gt;true&lt;/str&gt;
    &lt;str name="q"&gt;*:*&lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;lst name="grouped"&gt;
  &lt;lst name="inStock"&gt;
    &lt;int name="matches"&gt;19&lt;/int&gt;
    &lt;arr name="groups"&gt;
     &lt;lst&gt;
        &lt;str name="groupValue"&gt;T&lt;/str&gt;
        &lt;result name="doclist" numFound="15" start="0"&gt;
          &lt;doc&gt;
            &lt;arr name="cat"&gt;&lt;str&gt;electronics&lt;/str&gt;&lt;str&gt;hard drive&lt;/str&gt;&lt;/arr&gt;
            &lt;arr name="features"&gt;&lt;str&gt;7200RPM, 8MB cache, IDE Ultra ATA-133&lt;/str&gt;&lt;str&gt;NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor&lt;/str&gt;&lt;/arr&gt;
            &lt;str name="id"&gt;SP2514N&lt;/str&gt;
            &lt;bool name="inStock"&gt;true&lt;/bool&gt;
            &lt;str name="manu"&gt;Samsung Electronics Co. Ltd.&lt;/str&gt;
            &lt;date name="manufacturedate_dt"&gt;2006-02-13T15:26:37Z&lt;/date&gt;
            &lt;str name="name"&gt;Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133&lt;/str&gt;
            &lt;int name="popularity"&gt;6&lt;/int&gt;
            &lt;float name="price"&gt;92.0&lt;/float&gt;
            &lt;str name="store"&gt;45.17614,-93.87341&lt;/str&gt;
            &lt;double name="store_0_d"&gt;45.17614&lt;/double&gt;
            &lt;double name="store_1_d"&gt;-93.87341&lt;/double&gt;
            &lt;str name="store_lat_lon"&gt;45.17614,-93.87341&lt;/str&gt;
          &lt;/doc&gt;
        &lt;/result&gt;
      &lt;/lst&gt;
      &lt;lst&gt;
        &lt;str name="groupValue"&gt;F&lt;/str&gt;
        &lt;result name="doclist" numFound="4" start="0"&gt;
          &lt;doc&gt;
            &lt;arr name="cat"&gt;&lt;str&gt;electronics&lt;/str&gt;&lt;str&gt;connector&lt;/str&gt;&lt;/arr&gt;
            &lt;arr name="features"&gt;&lt;str&gt;car power adapter, white&lt;/str&gt;&lt;/arr&gt;
            &lt;str name="id"&gt;F8V7067-APL-KIT&lt;/str&gt;
            &lt;bool name="inStock"&gt;false&lt;/bool&gt;
            &lt;str name="manu"&gt;Belkin&lt;/str&gt;
            &lt;date name="manufacturedate_dt"&gt;2005-08-01T16:30:25Z&lt;/date&gt;
            &lt;str name="name"&gt;Belkin Mobile Power Cord for iPod w/ Dock&lt;/str&gt;
            &lt;int name="popularity"&gt;1&lt;/int&gt;
            &lt;float name="price"&gt;19.95&lt;/float&gt;
            &lt;str name="store"&gt;45.17614,-93.87341&lt;/str&gt;
            &lt;double name="store_0_d"&gt;45.17614&lt;/double&gt;
            &lt;double name="store_1_d"&gt;-93.87341&lt;/double&gt;
            &lt;str name="store_lat_lon"&gt;45.17614,-93.87341&lt;/str&gt;
            &lt;float name="weight"&gt;4.0&lt;/float&gt;
          &lt;/doc&gt;
        &lt;/result&gt;
      &lt;/lst&gt;
    &lt;/arr&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;/response&gt;</pre>
<h3>At the end</h3>
<p>An interesting feature that will certainly find use in some systems. However, please note that this functionality will be further developed. So far there is no support for distributed search and for grouping on multivalued fields. At  this time there&#8217;s no point of a performance testing, first because of  the changes that will come to the mechanism, and secondly because of the  fact that this is Lucene and Solr 4.0 which are both in development. However, I will be definitely watching how this functionality evolves <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2010/09/20/quick-look-fieldcollapsing/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>What is schema.xml?</title>
		<link>https://solr.pl/en/2010/08/16/what-is-schema-xml/</link>
					<comments>https://solr.pl/en/2010/08/16/what-is-schema-xml/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 16 Aug 2010 12:05:34 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[field]]></category>
		<category><![CDATA[filter]]></category>
		<category><![CDATA[schema]]></category>
		<category><![CDATA[schema.xml]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[token]]></category>
		<category><![CDATA[tokenizer]]></category>
		<category><![CDATA[type]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=64</guid>

					<description><![CDATA[One of the configuration files that describe each implementation Solr is schema.xml file. It describes one of the most important things of the implementation &#8211; the structure of the data index. The information contained in this file allow you to]]></description>
										<content:encoded><![CDATA[<p>One of the configuration files that describe each implementation Solr is <em>schema.xml</em> file. It describes one of the most important things of the implementation &#8211; the structure of the data index. The information contained in this file allow you to control how Solr behaves when indexing the data, or when making queries. <em>Schema.xml</em> is not only the very structure of the index, is also detailed information about data types that have a large influence on the behavior Solr, and usually are treated with neglect. This entry will try to bring some insight about <em>schema.xml</em>.</p>
<p><span id="more-64"></span></p>
<p><em>Schema.xml</em> file consists of several parts:</p>
<ul>
<li>version,</li>
<li>type definitions,</li>
<li>field definitions,</li>
<li>copyField section,</li>
<li>additional definitions.</li>
</ul>
<h3>Version</h3>
<p>The first thing we come across in the <em>schema.xml</em> file is the version. This is the information for Solr how to treat some of the attributes in <em>schema.xml</em> file. The definition is as follows:
</p>
<pre class="brush:xml">&lt;schema name="example" version="1.3"&gt;</pre>
<p>Please note that this is not the definition of the version from the perspective of your project. At this point Solr supports four versions of a <em>schema.xml</em> file:</p>
<ul>
<li>1.0 &#8211; <em>multiValued </em>attribute does not exist, all fields are multivalued by default.</li>
<li>1.1 &#8211; introduced <em>multiValued </em>attribute, the default attribute value is <em>false</em>.</li>
<li>1.2 &#8211; introduced <em>omitTermFreqAndPositions </em>attribute, the default value is <em>true</em> for all fields, besides text fields.</li>
<li>1.3 &#8211; removed the possibility of an optional compression of fields.</li>
</ul>
<h3>Type definitions</h3>
<p>Type definitions can be logically divided into two separate sections &#8211; the simple types and complex types. Simple types as opposed to the complex types do not have a defined filters and tokenizer.</p>
<p><strong>Simple types</strong></p>
<p>First thing we see in the <em>schema.xml</em> file after version are types definition. Each type is described as a number of attributes defining the behavior of that type. First, some attributes that describe each type and are mandatory:</p>
<ul>
<li><em>name </em>&#8211; name of the type (required attribute).</li>
<li><em>class </em>&#8211; class that is responsible for the implementation. Please note  that classes are delivered from standard Solr packaged will have names  with &#8216;solr&#8217; prefix.</li>
</ul>
<p>Besides the two mentioned above, types can have the following optional attributes:</p>
<ul>
<li> <em>sortMissingLast </em>&#8211; attribute specifying how values in a field based on this type should be treated in case of sorting. When set to <em>true</em> documents without value in a field of this type will always be at the end of the results list regardless of sort order. The default attribute value is <em>false</em>. Attribute can be used only for types that are considered by Lucene as a string.</li>
<li><em>sortMissingFirst </em>&#8211; attribute specifying how values in a field based on  this type should be treated in case of sorting. When set to <em>true</em> documents without value in a field of this type will always be at the  first positions of the results list regardles of sort order. The default  attribute value is <em>false</em>. Attribute can be used only for types that are considered by Lucene as a string.</li>
<li><em>omitNorms </em>&#8211; attribute specifying whether field normalization should take place.</li>
<li><em>omitTermFreqAndPositions </em>&#8211; attribute specifying whether term frequency and term positions should be calculated.</li>
<li><em>indexed </em>&#8211; attribute specifying whether the field based on this type will keep their original values.</li>
<li><em>positionIncrementGap </em>&#8211; attribute specifying how many position Lucene should skip.</li>
</ul>
<p>It is worth remembering that in the default settings <em>sortMissingLast </em>and <em>sortMissingFirst</em> attributes Lucene will apply behavior of placing a document with blank field values at the beginning of the ascending sort, and at the end of the list of results for descending sorting.</p>
<p>One more options for simple types, but only those based on <em>Trie*Field</em> classes:</p>
<ul>
<li><em>precisionStep</em> &#8211; attribute specifying the number of bits of precision. The greater the number of bits, the faster the queries based on numerical ranges. This however, also increases the size of the index, as more values are indexed. Set attribute value to 0 to disable the functionality of indexing at various precisions.</li>
</ul>
<p>An example of a simple type defined:
</p>
<pre class="brush:xml">&lt;fieldType name="string" class="solr.StrField" sortMissingLast="<em>true</em>" omitNorms="<em>true</em>"/&gt;</pre>
<p><strong>Complex types</strong></p>
<p>In addition to simple types, <em>schema.xml</em> file may include types consisting of a tokenizer and filters. Tokenizer is responsible for dividing the contents of the field in the tokens, while the filters are responsible for further token analysis. For example, the type that is responsible for dealing with the texts in Polish, would consist of a tokenizer in charge of the division of words based on whitespace, commas and periods. Filters for that type could be responsible for bringing generated tokens to lowercase, further division of tokens (for example on the basis of dashes), and then bringing tokens to the basic form.</p>
<p>Complex types, like simple types, have their name (<em>name </em>attribute) and the class which is responsible for implementation (<em>class </em>attribute). They can also be characterized by other attributes as described in the case of simple types (on the same basis). In addition, however, complex types can have a definition of tokenizer and filters to be used at the stage of indexing, and at the stage of query. As most of you know, for a given phase (indexing, or query) there can can be many filters defined but only one tokenizer. For example, just looks like a text type definition look like in the example provided with Solr:
</p>
<pre class="brush:xml">&lt;fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="<em>true</em>"&gt;
   &lt;analyzer type="index"&gt;
      &lt;tokenizer class="solr.WhitespaceTokenizerFactory"/&gt;
      &lt;filter class="solr.StopFilterFactory" ignoreCase="<em>true</em>" words="stopwords.txt" enablePositionIncrements="<em>true</em>" /&gt;
      &lt;filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/&gt;
      &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
      &lt;filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/&gt;
      &lt;filter class="solr.PorterStemFilterFactory"/&gt;
   &lt;/analyzer&gt;
   &lt;analyzer type="query"&gt;
      &lt;tokenizer class="solr.WhitespaceTokenizerFactory"/&gt;
      &lt;filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="<em>true</em>" expand="<em>true</em>"/&gt;
      &lt;filter class="solr.StopFilterFactory" ignoreCase="<em>true</em>" words="stopwords.txt" enablePositionIncrements="<em>true</em>" /&gt;
      &lt;filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/&gt;
      &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
      &lt;filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/&gt;
      &lt;filter class="solr.PorterStemFilterFactory"/&gt;
   &lt;/analyzer&gt;
&lt;/fieldType&gt;</pre>
<p>It is worth noting that there is an additional attribute for the text field type:</p>
<ul>
<li> <em>autoGeneratePhraseQueries</em></li>
</ul>
<p>This attribute is responsible for telling filters how to behave when dividing tokens. Some filters (such as <em>WordDelimiterFilter</em>) can divide tokens into a set of tokens. Setting the attribute to <em>true</em> (default value) will automatically generate phrase queries. This means that <em>WordDelimiterFilter </em>will divide the word &#8220;wi-fi&#8221; into two tokens &#8220;wi&#8221; and &#8220;fi&#8221;. With autoGeneratePhraseQueries set to <em>true</em> query sent to Lucene will look like <code>"field:wi fi"</code>, while with set to <em>false</em> Lucene query will look like <code>field:wi OR field:fi</code>. However, please note, that this attribute only behaves well with tokenizers based on white spaces.</p>
<p>Returning to the type definition. As you can see, I gave an example which has two main sections:
</p>
<pre class="brush:xml">&lt;analyzer type="index"&gt;</pre>
<p>and
</p>
<pre class="brush:xml">&lt;analyzer type="query"&gt;</pre>
<p>The first section is responsible for the definition of the type, which will be used for indexing documents, the second section is responsible for the definition of type used for queries to fields based on this type. Note that if you want to use the same definitions for indexing and query phase, you can opt out of the two sections. Then our definition will look like this:
</p>
<pre class="brush:xml">&lt;fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="<em>true</em>"&gt;
   &lt;tokenizer class="solr.WhitespaceTokenizerFactory"/&gt;
   &lt;filter class="solr.StopFilterFactory" ignoreCase="<em>true</em>" words="stopwords.txt" enablePositionIncrements="<em>true</em>" /&gt;
   &lt;filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/&gt;
   &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
   &lt;filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/&gt;
   &lt;filter class="solr.PorterStemFilterFactory"/&gt;
&lt;/fieldType&gt;</pre>
<p>As I mentioned in the definition of each complex type there is a tokenizer and a series of filters (though not necessarily). I will not describe each filter and tokenizer available in Solr. This information is available at the following address: <a href="http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters" target="_blank" rel="noopener noreferrer">http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters</a>.</p>
<p>At the end I wanted to add an important thing. Starting from 1.4 Solr tokenizer does not need to be the first mechanism that deals with the analysis of the field. Solr 1.4 introduced new filters &#8211; <em>CharFilters </em>that operate on the field before tokenizer and transmit the result to the tokenizer. It is worth to know because it might come in useful.</p>
<p><strong>Multi-dimensional types</strong></p>
<p>At the end I left myself a little addition &#8211; a novelty in Solr 1.4 &#8211; multi-dimensional fields &#8211;  fields consisting of a number of other fields. Generally speaking, the assumption of this type of field was simple &#8211; to store in Solr pairs of values, triples or more related data, such as georaphical point coordinates. In practice this is realized by means of dynamic fields, but let me not get into the implementation details. The sample type definition that will consist two fields:
</p>
<pre class="brush:xml">&lt;fieldType name="location" class="solr.PointType" dimension="2" subFieldSuffix="_d"/&gt;</pre>
<p>In addition to standard attributes: name and class there are two others:</p>
<ul>
<li> dimension &#8211; the number of dimensions (used by the class attribute <em>solr.PointType</em>).</li>
<li>subFieldSuffix &#8211; suffix, which will be added to the dynamic fields  created by that type. It is important to remember that the field based  on the presented type will create three fields in the index &#8211; the actual  field (for example named mylocation and two additional dynamic fields).</li>
</ul>
<h3><strong>Field Definitions</strong></h3>
<p>Definitions of the fields is another section in the <em>schema.xml</em> file, the section, which in theory should be of interest to us the most during the design of Solr index. As a rule, we find here two kinds of field definitions:</p>
<ol>
<li>Static Fields</li>
<li>Dynamic Fields</li>
</ol>
<p>These fields are treated differently by the Solr. The first type of fields, are fields that are available under one name. Dynamic fields are fields that are available under many names &#8211; actually their name are a simple regular expression (name starting or ending with a &#8216;*&#8217; sign). Please note that Solr first selects the static field, then the dynamic field. In addition, if the field name matches more than one definition, Solr will select a field with a longer name pattern.</p>
<p>Returning to the definition of the fields (both static and dynamic), they consist of the following attributes:</p>
<ul>
<li><em>name </em>&#8211; the name of the field (required attribute).</li>
<li><em>type </em>&#8211; type of field, which is one of the pre-defined types (required attribute).</li>
<li><em>indexed </em>&#8211; if a field is to be indexed (set to <em>true</em>, if you want to search or sort on this field).</li>
<li><em>stored </em>&#8211; whether you want to store the original values (set to <em>true</em>, if we want to retrieve the original value of the field).</li>
<li><em>omitNorms </em>&#8211; whether you want norms to be ignored (set to <em>true</em> for the fields for which You will apply the full-text search).</li>
<li><em>termVectors </em>&#8211; set to <em>true</em> in the case when we want to keep so called term vectors. The default parameter value is <em>false</em>. Some features require setting this parameter to <em>true</em> (eg <em>MoreLikeThis </em>or <em>FastVectorHighlighting</em>).</li>
<li><em>termPositions </em>&#8211; set to <em>true</em>, if You want to keep term positions with the term vector. Setting to <em>true</em> will cause the index to expand its size.</li>
<li><em>termOffsets </em>&#8211; set to <em>true</em>, if You want to keep term offsets together with term vector. Setting to <em>true</em> will cause the index to expand its size.</li>
<li><em>default </em>&#8211; the default value to be given to the field when the document was not given any value in this field.</li>
</ul>
<p>The following examples of definitions of fields:
</p>
<pre class="brush:xml">&lt;field name="id" type="string" indexed="<em>true</em>" stored="<em>true</em>" required="<em>true</em>" /&gt;
&lt;field name="includes" type="text" indexed="<em>true</em>" stored="<em>true</em>" termVectors="<em>true</em>" termPositions="<em>true</em>" termOffsets="<em>true</em>" /&gt;
&lt;field name="timestamp" type="date" indexed="<em>true</em>" stored="<em>true</em>" default="NOW" multiValued="<em>false</em>"/&gt;
&lt;dynamicField name="*_i" type="int" indexed="<em>true</em>" stored="<em>true</em>"/&gt;</pre>
<p>And finally, additional information to remember. In addition to the attributes listed above in the fields definition, we can overwrite the attributes that have been defined for type (eg whether a field is to be multiValued &#8211; the above example for a field called timestamp). Sometimes, this functionality can be useful if you need a specific field whose type is slightly different from other types (as in the example &#8211; only multiValued attribute). Of course, keep in mind the limitations imposed on the individual attributes associated with types.</p>
<h3>CopyField section</h3>
<p>In short, this section is responsible for copying the contents of fields to other fields. We define the field which value should be copied, and the destination field. Please note that copying takes place before the field value is analyzed. Example copyField definition:
</p>
<pre class="brush:xml">&lt;copyField source="category" dest="text"/&gt;</pre>
<p>For the sake of accuracy, occurring attributes mean:</p>
<ul>
<li>source &#8211; the source field,</li>
<li>dest &#8211; the destination field.</li>
</ul>
<h3>Additional definitions</h3>
<p><strong>1. Unique key definition</strong></p>
<p>The definition of a unique key that makes possible to unambiguously identify the document. Defining a unique key is not necessary, but is recommended. Sample definition:
</p>
<pre class="brush:xml">&lt;uniqueKey&gt;id&lt;/uniqueKey&gt;</pre>
<p><strong>2. Default search field definition</strong></p>
<p>The Section is responsible for defining a default search field, which Solr use in case You have not given any field. Sample definition:
</p>
<pre class="brush:xml">&lt;defaultSearchField&gt;content&lt;/defaultSearchField&gt;</pre>
<p><strong>3. Default logical operator definition</strong></p>
<p>This section is responsible for the definition of default logical operator that will be used. Sample definition looks as follows:
</p>
<pre class="brush:xml">&lt;solrQueryParser defaultOperator="OR" /&gt;</pre>
<p>Possible values are: <em>OR </em>and <em>AND</em>.</p>
<p><strong>4. Defining similarity</strong></p>
<p>Finally we define the similarity that we will use. It is rather a topic for another post, but you must know that if necessary You can change the default similarity (currently in Solr trunk there are already two classes of similarity). The sample definition is as follows:
</p>
<pre class="brush:xml">&lt;similarity class="pl.solr.similarity.CustomSimilarity" /&gt;</pre>
<h3>A few words at the end</h3>
<p>Information presented above should give some insight on what <em>schema.xml</em> file is and what correspond to the different sections in this file. Soon I will try to write what You should avoid when designing the index.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2010/08/16/what-is-schema-xml/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
