<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>structure &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/structure/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Thu, 12 Nov 2020 12:59:24 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>
	<item>
		<title>Solr 4.2: Index structure reading API</title>
		<link>https://solr.pl/en/2013/05/20/solr-4-2-index-structure-reading-api/</link>
					<comments>https://solr.pl/en/2013/05/20/solr-4-2-index-structure-reading-api/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 20 May 2013 11:58:51 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[4.2]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[schema]]></category>
		<category><![CDATA[schema api]]></category>
		<category><![CDATA[schema.xml]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[structure]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=554</guid>

					<description><![CDATA[With the release of Solr 4.2 we&#8217;ve got the possibility to use the HTTP protocol to get information about Solr index structure. Of course, if one wanted to do that prior to Solr 4.2 it could be achieved by fetching]]></description>
										<content:encoded><![CDATA[<p>With the release of Solr 4.2 we&#8217;ve got the possibility to use the HTTP protocol to get information about Solr index structure. Of course, if one wanted to do that prior to Solr 4.2 it could be achieved by fetching the <em>schema.xml</em> file, parsing it and then getting the needed information. However when Solr 4.2 was released we&#8217;ve got a dedicated API which can return the information we need without the need of parsing the whole <em>schema.xml</em> file.</p>
<p><span id="more-554"></span></p>
<h3>Possibilities</h3>
<p>Let&#8217;s look at the new API by example.</p>
<h4>Getting information in XML format</h4>
<p>Many Solr users are used to getting their data in the XML format, at least when using Solr HTTP API. However, the schema API uses JSON as the default format. In order to get the data in the XML format in all the below examples, you&#8217;ll need to appeng the <em>wt=xml</em> parameter to the call, for example like that:
</p>
<pre class="brush:bash">$curl 'http://localhost:8983/solr/collection1/schema/fieldtypes?wt=xml'</pre>
<h4>Defined fields information</h4>
<p>Let&#8217;s start by looking at how to fetch information about the fields that are defined in Solr. In order to do that we have the following possibilities:</p>
<ol>
<li>Get information about all the fields defined in the index</li>
<li>Get information for a one, explicitly defined field</li>
</ol>
<p>In the first case we should use the following command:
</p>
<pre class="brush:bash">$curl 'http://localhost:8983/solr/collection1/schema/fields'</pre>
<p>In second case we should add the <em>/</em> character and the field name to the above command. For example in order to get the information about the <em>author</em> field we should use the following command:
</p>
<pre class="brush:bash">$curl 'http://localhost:8983/solr/collection1/schema/fields/author'</pre>
<p>Solr response for the first command will be similar to the following one:
</p>
<pre class="brush:plain">{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "fields":[{
      "name":"_version_",
      "type":"long",
      "indexed":true,
      "stored":true},
    {
      "name":"author",
      "type":"text_general",
      "indexed":true,
      "stored":true},
    {
      "name":"cat",
      "type":"string",
      "multiValued":true,
      "indexed":true,
      "stored":true},
    {
      "name":"category",
      "type":"text_general",
      "indexed":true,
      "stored":true},
    {
      "name":"id",
      "type":"string",
      "multiValued":false,
      "indexed":true,
      "required":true,
      "stored":true,
      "uniqueKey":true},
    {
      "name":"url",
      "type":"text_general",
      "indexed":true,
      "stored":true},
    {
      "name":"weight",
      "type":"float",
      "indexed":true,
      "stored":true}]}</pre>
<p>On the other hand the response for the second command would be as follows:
</p>
<pre class="brush:plain">{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "field":{
    "name":"author",
    "type":"text_general",
    "indexed":true,
    "stored":true}}</pre>
<h4>Getting information about defined dynamic fields</h4>
<p>Similar to what information we can get about the fields defined in the <em>schema.xml</em> we can get the information about dynamic fields. Again we have to options:</p>
<ol>
<li>Get information about all dynamic fields</li>
<li>Get information about specific dynamic field pattern</li>
</ol>
<p>In order to get all the information about dynamic fields we should use the following command:
</p>
<pre class="brush:bash">$curl 'http://localhost:8983/solr/collection1/schema/dynamicfields'</pre>
<p>In order to get information about a specific pattern we append the <em>/&nbsp;</em>character followed by the pattern, for example like this:
</p>
<pre class="brush:bash">$curl 'http://localhost:8983/solr/collection1/schema/dynamicfields/random_*'</pre>
<p>Solr will return the following response for the first query:
</p>
<pre class="brush:plain">{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "dynamicfields":[{
      "name":"*_coordinate",
      "type":"tdouble",
      "indexed":true,
      "stored":false},
    {
      "name":"ignored_*",
      "type":"ignored",
      "multiValued":true},
    {
      "name":"random_*",
      "type":"random"},
    {
      "name":"*_p",
      "type":"location",
      "indexed":true,
      "stored":true},
    {
      "name":"*_c",
      "type":"currency",
      "indexed":true,
      "stored":true}]}</pre>
<p>And the following response will be returned for the second command:
</p>
<pre class="brush:plain">{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "dynamicfield":{
    "name":"random_*",
    "type":"random"}}</pre>
<h4>Getting field types</h4>
<p>As you probably guess, in a way similar to the above describes examples, we can also get the information about the field types defined in our <em>schema.xml</em> files. We can fetch the following information:</p>
<ol>
<li>All the field types defined in the <em>schema.xml</em> file</li>
<li>A single type</li>
</ol>
<p>To get all the defined field types we should run the following command:
</p>
<pre class="brush:bash">$curl 'http://localhost:8983/solr/collection1/schema/fieldtypes'</pre>
<p>The get information about a single type we should again add the <em>/</em> character and append the field type name to it, for example like this:
</p>
<pre class="brush:bash">$curl 'http://localhost:8983/solr/collection1/schema/fieldtypes/text_gl'</pre>
<p>Solr will return the following information in response to the first command:
</p>
<pre class="brush:plain">{
  "responseHeader":{
    "status":0,
    "QTime":3},
  "fieldTypes":[{
      "name":"alphaOnlySort",
      "class":"solr.TextField",
      "sortMissingLast":true,
      "omitNorms":true,
      "analyzer":{
        "class":"solr.TokenizerChain",
        "tokenizer":{
          "class":"solr.KeywordTokenizerFactory"},
        "filters":[{
            "class":"solr.LowerCaseFilterFactory"},
          {
            "class":"solr.TrimFilterFactory"},
          {
            "class":"solr.PatternReplaceFilterFactory",
            "replace":"all",
            "replacement":"",
            "pattern":"([^a-z])"}]},
      "fields":[],
      "dynamicFields":[]},
    {
      "name":"boolean",
      "class":"solr.BoolField",
      "sortMissingLast":true,
      "fields":["inStock"],
      "dynamicFields":["*_bs",
        "*_b"]},
    {
      "name":"text_gl",
      "class":"solr.TextField",
      "positionIncrementGap":"100",
      "analyzer":{
        "class":"solr.TokenizerChain",
        "tokenizer":{
          "class":"solr.StandardTokenizerFactory"},
        "filters":[{
            "class":"solr.LowerCaseFilterFactory"},
          {
            "class":"solr.StopFilterFactory",
            "words":"lang/stopwords_gl.txt",
            "ignoreCase":"true",
            "enablePositionIncrements":"true"},
          {
            "class":"solr.GalicianStemFilterFactory"}]},
      "fields":[],
      "dynamicFields":[]},
    {
      "name":"tlong",
      "class":"solr.TrieLongField",
      "precisionStep":"8",
      "positionIncrementGap":"0",
      "fields":[],
      "dynamicFields":["*_tl"]}]}</pre>
<p>In response to the second command Solr will return the following:
</p>
<pre class="brush:plain">{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "fieldType":{
    "name":"text_gl",
    "class":"solr.TextField",
    "positionIncrementGap":"100",
    "analyzer":{
      "class":"solr.TokenizerChain",
      "tokenizer":{
        "class":"solr.StandardTokenizerFactory"},
      "filters":[{
          "class":"solr.LowerCaseFilterFactory"},
        {
          "class":"solr.StopFilterFactory",
          "words":"lang/stopwords_gl.txt",
          "ignoreCase":"true",
          "enablePositionIncrements":"true"},
        {
          "class":"solr.GalicianStemFilterFactory"}]},
    "fields":[],
    "dynamicFields":[]}}</pre>
<p>As you can see, the amount information is nice as we are getting all the information about the field types and in addition to that the information which field are using give field (both dynamic and non-dynamic.</p>
<h4>Retrieving information about copyFields</h4>
<p>In addition to what we&#8217;ve discussed so far we are able to get information about copyFields section from the <em>schema.xml</em>. In order to do that one should run the following command:
</p>
<pre class="brush:bash">$curl 'http://localhost:8983/solr/collection1/schema/copyfields'</pre>
<p>And in response we will get the following data:
</p>
<pre class="brush:plain">{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "copyfields":[{
      "source":"author",
      "dest":"text"},
    {
      "source":"cat",
      "dest":"text"},
    {
      "source":"content",
      "dest":"text"},
    {
      "source":"content_type",
      "dest":"text"},
    {
      "source":"description",
      "dest":"text"},
    {
      "source":"features",
      "dest":"text"},
    {
      "source":"author",
      "dest":"author_s",
      "destDynamicBase":"*_s"}]}</pre>
<h3>The future</h3>
<p>In Solr 4.3 the described API was improved and is now being prepared to enable not only reading of the index structure, but also writing modifications to it with the use of HTTP requests. We can expect that feature in one of the upcoming versions of Apache Solr, so its worth waiting in my opinion, at least by those who needs it.</p>
<p>W Solr 4.3 opisywane API zostało usprawnione oraz jest przygotowywane do umożliwienia zmian w strukturze indeksu za pomocą protokołu HTTP. Możemy zatem spodziewać się, iż w jednej z kolejnych wersji serwera wyszukiwania Solr otrzymamy możliwość łatwej zmiany struktury indeksu, przynajmniej takich, które nie będą powodować konfliktów z już zaindeksowanymi danymi.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2013/05/20/solr-4-2-index-structure-reading-api/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>5 sins of schema.xml modifications</title>
		<link>https://solr.pl/en/2010/08/30/5-sins-of-schema-xml-modifications/</link>
					<comments>https://solr.pl/en/2010/08/30/5-sins-of-schema-xml-modifications/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 30 Aug 2010 12:08:35 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[attribute]]></category>
		<category><![CDATA[attributes]]></category>
		<category><![CDATA[error]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[index structure]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[mistake]]></category>
		<category><![CDATA[schema]]></category>
		<category><![CDATA[schema.xml]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[structure]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=71</guid>

					<description><![CDATA[I made a promise and here it is &#8211; the entry on the most common mistakes when designing Solr index, which is when You create or modify the schema.xml file for Your system implementation. Feel free to read on 😉]]></description>
										<content:encoded><![CDATA[<p>I made a promise and here it is &#8211; the entry on the most common mistakes when designing Solr index, which is when You create or modify the <em>schema.xml</em> file for Your system implementation. Feel free to read on <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<p><span id="more-71"></span></p>
<p>Each of us knows what is schema.xml file and what is (if not, I invite you to read the entry located at: <a href="http://solr.pl/2010/08/16/what-is-schema-xml/?lang=en" target="_blank" rel="noopener noreferrer">http://solr.pl/2010/08/16/what-is-schema-xml/?lang=en</a>). What are the most frequently commit errors creating or updating this file? I personally met with the following:</p>
<h3>1. Trash in the configuration</h3>
<p>I confess that the first principle is to keep the file <em>schema.xml</em> in the simplest possible form. Linked to this is a very important issue &#8211; this file should not be synonymous with chaos. In other word, do not stick with unnecessary comments, unwanted types, fields and so on. Order in the structure of the <em>schema.xml</em> file not only helps us to maintain this file and its modifications with ease, but also assures us that no information that is unnecessary will be stored in Solr index.</p>
<h3>2. Cosmetic changes to the default configuration</h3>
<p>How many of those who use Solr in their daily work took the default<em> schema.xml</em> file supplied in the example implementation Solr and only slightly modified the contents &#8211; for example, changing only the names of the fields ? I should raise my hand too, because I did it once. This is a pretty big mistake. Someone may ask why. Are you sure You need English stemming when implementing search for content written in Polish ? I think not. The same applies to field and type attributes like term vectors.</p>
<h3>3. No updates</h3>
<p>Sometimes I find the implementation of search based application, where update of Solr does not mean an update of <em>schema.xml</em> file. If it is a conscious decision, dictated by such costly or even impossible re-indexing of all data, I understand the situation. But there are cases where an upgrade would bring only benefits, and where costs of such upgrade would be minimal (eg less expensive re-index or slight changes in the application). Do not be afraid to update the <em>schema.xml</em> file &#8211; whether it is to update the fields, update types, whether the addition of newer stuff. A good example is the migration from Solr 1.3 to version 1.4 &#8211; newer version introduced significant changes associated with numeric types, where migration to the new types would result in great increase in query performance using those types (such as queries using value ranges).</p>
<h3>4. &#8220;I`ll use it one day&#8221;</h3>
<p>Adding new types, not removing unnecessary now, the same in the case of fields, or <em>copyField </em>definition. Most of us think &#8211; that old definition can be useful in the future, but remember that each type is some extra portion of memory needed by Solr, each field is a place in the index. My small advice &#8211; if you stop to use the type, field, or whatever else you have in your configuration file (not only in the <em>schema.xml</em>), simply remove it from this file. Applying this principle throughout the life cycle of the applications using Solr will ensure You that the index is in optimal condition, and after a few months since another feature implementation You will not need to be puzzled and as a result You will not need to dig into the application code to determine if the field is used in some forgotten code fragment.</p>
<h3>5. Attributes, attributes and again attributes</h3>
<p>Preservation of original values, adding term vectors and its properties are just examples of things we don`t need in every implementation. Sometimes we have more than required by the application index. A larger index, lower productivity, at least in some cases (eg, indexing). It is worth considering if you really need all this information, which we say to Solr to calculate and store. Removing some unnecessary, of course, from our point of view of information, may surprise us. Sometimes it is worth a try;)</p>
<p>Feel free to comment, because I will read eagerly, for what else we should pay attention to when modifying schema.xml file.</p>
<p>Finally, I think that it is worth to mention the article <em>&#8220;The Seven Deadly Sins of Solr&#8221;</em> LucidImagination published on the website at: <a href="http://www.lucidimagination.com/blog/2010/01/21/the-seven-deadly-sins-of-solr" target="_blank" rel="noopener noreferrer">http://www.lucidimagination.com/blog/2010/01/21/the-seven-deadly-sins-of-solr</a>. It describes bad practices when working with Solr. In my opinion, interesting reading. I highly recommend it.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2010/08/30/5-sins-of-schema-xml-modifications/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
