<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>configuration &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/configuration/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Wed, 11 Nov 2020 19:45:58 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>Solr filters: PatternReplaceCharFilter</title>
		<link>https://solr.pl/en/2011/05/09/solr-filters-patternreplacecharfilter/</link>
					<comments>https://solr.pl/en/2011/05/09/solr-filters-patternreplacecharfilter/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 09 May 2011 18:45:16 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[configuration]]></category>
		<category><![CDATA[filter]]></category>
		<category><![CDATA[filtering]]></category>
		<category><![CDATA[solr]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=266</guid>

					<description><![CDATA[Continuing the overview of the filters included in Solr today we look at the PatternReplaceCharFilter. As you might guess the task of the filter is to change the matching input stream parts that match the given regular expression. You have]]></description>
										<content:encoded><![CDATA[<p>Continuing the overview of the filters included in Solr today we look at the PatternReplaceCharFilter.</p>
<p>As you might guess the task of the filter is to change the matching input stream parts that match the given regular expression.</p>
<p><span id="more-266"></span></p>
<p>You have the following parameters:</p>
<ul>
<li><em>pattern</em> (required) – the value to be changed (regular expressions)</li>
<li><em>replacement</em> (default: &#8220;&#8221;) &#8211; the value that will be used as a replament for the fragment that matched the regular expression</li>
<li><em>blockDelimiters</em></li>
<li><em>maxBlockChars</em> (default: 10000, must be greater than 0) – buffer used for comparison</li>
</ul>
<h2>Use examples</h2>
<p>The use of a filter is simple &#8211; we add its definition to the type definition in schema.xml file, for example:
</p>
<pre class="brush:xml">&lt;fieldType name="textCharNorm" class="solr.TextField"&gt;
  &lt;analyzer&gt;
    &lt;charFilter class="solr.PatternReplaceCharFilterFactory" …/&gt;
    &lt;charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/&gt;
    &lt;tokenizer class="solr.WhitespaceTokenizerFactory"/&gt;
  &lt;/analyzer&gt;
&lt;/fieldType&gt;</pre>
<p>Poniżej przykładowe definicji dla różnych przypadków.</p>
<p>Below are examples of definitions for different cases.</p>
<h3>Cut pieces of text</h3>
<p>You just need to specify, in the pattern attribute, what we want to cut. Example:
</p>
<pre class="brush:xml">&lt;charFilter class="solr.PatternReplaceCharFilterFactory" pattern="#TAG" /&gt;</pre>
<p>which will suppress the content of the data elements: &#8220;#TAG&#8221;</p>
<h3>Text fragments replacement</h3>
<p>A similar case to the one above, but we want to convert text to another.
</p>
<pre class="brush:xml">&lt;charFilter class="solr.PatternReplaceCharFilterFactory" pattern="#TAG" replacement="[CENZORED]"/&gt;</pre>
<h3>Changing patterns</h3>
<p>The two above cases were trivial. What is the strength of this filter is handling regular expressions. (You use regular expressions, right?) The following example is simple &#8211; it hides all the numbers by turning them into stars. It also handles the numbers separated by hyphens, treating them as a single number.
</p>
<pre class="brush:xml">&lt;charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\\d+-*\\d+)+" replacement="*"/&gt;</pre>
<h3>Text Manipulation</h3>
<p>The replacement doesn&#8217;t have to be plain text. This filter supports references which allow you to refer to parts of the matched pattern. For details, refer to the documentation of regular expressions. In the following example, all multiplied characters are replaced with a single sign.
</p>
<pre class="brush:xml">&lt;charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(.)\\1" replacement="$1"/&gt;</pre>
<h2>Advanced Parameters</h2>
<p>So far I have not mentioned the following parameters: <em>blockDelimiters </em>and <em>maxBlockChars</em>. If you look at the source code you would see that those parameters are related to the way the filter is implemented. <em>CharFilter </em> operates on a single character, and pattern matching requires an internal buffer to read more characters. <em>MaxBlockChars </em>allows you to specify the size of the buffer. You do not have to worry about it, if the pattern you defined, does not match piece of text larger than 10k characters). <em>BlockDelimiters </em>can further optimize filling of the buffer. It can be used if the information in the analyzed field is somehow divided into sections (eg, it is a CSV, sentences, etc.). It  is a text that informs the scanner, that a new section starts,  therefore, parts matched in the previous section are no longer useful.</p>
<h2>Limits</h2>
<p>An  important limitation of the filter is that it directly manipulates the  input data and does not keep information related to the original text. This  means that if the filter removes a portion of the string, or add a new  fragment, tokenizer will not notice that and the location of tokens in  the original box will not be saved properly. You should be aware of that when using queries that operate on the relative positions of tokens or if you use highlighting.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2011/05/09/solr-filters-patternreplacecharfilter/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>6 sins of solrconfig.xml modifications</title>
		<link>https://solr.pl/en/2010/09/13/6-sins-of-solrconfig-xml-modifications/</link>
					<comments>https://solr.pl/en/2010/09/13/6-sins-of-solrconfig-xml-modifications/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 13 Sep 2010 12:11:32 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[configuration]]></category>
		<category><![CDATA[proper configuration]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[solrconfig]]></category>
		<category><![CDATA[solrconfig.xml]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=75</guid>

					<description><![CDATA[Solrconfig.xml file is another file that defines the behavior Solr. Unlike a file that describes the structure of the index file solrconfig.xml determines the functionality available in Solr. Just like in the case schema.xml file we can distinguish a number]]></description>
										<content:encoded><![CDATA[<p><em>Solrconfig.xml</em> file is another file that defines the behavior Solr. Unlike a file that describes the structure of the index file <em>solrconfig.xml</em> determines the functionality available in Solr. Just  like in the case <em>schema.xml</em> file we can distinguish a number of  standard mistakes made by those who implement Solr, and I&#8217;m not talking  only about people who have little experience with Solr. In order to learn some of those mistakes I invite you to read the following entry.</p>
<p><span id="more-75"></span></p>
<p>At  the beginning I wanted to point out that the following examples are not  all mistakes that can be made, this are only examples of what should be  considered when using Solr.</p>
<h2>1. I`m sure I`ll need it</h2>
<p>As in case of <em>schema.xml</em> file, I suggest minimalism in <em>solrconfig.xml</em> too. If we know that we will only use response in JSON format there is no need to configure additional response formats.  I  often come across situations when a person setup Solr with all the  possible handlers, response writers and a number of additional features  despite the fact that they do not even know what some of them do. Although  the use of memory for the standard configuration elements is not large,  remember that maintenance of a minimalist <em>solrconfig.xml</em> file is  definitely easier than that which is blown to the borders of  impossibility.</p>
<h2>2. Why should I cache ?</h2>
<p>Extreme case, but true. I was once asked if the cache is necessary if the application using Solr will use the cache on it`s own side. My answer was yes, of course. People who do not know Solr, imagine it&#8217;s cache, as the next, sometimes unnecessary, level of retention of search results. However,  please note that in addition to cache mechanism based on HTTP Solr has  its own cache &#8211; to be more accurate &#8211; Solr have more than one type of  cache. When we adjust Solr cache to our needs we monitor test servers &#8211; how hits distribute &#8211; is the cache too big or maybe too small. Please  note that Solr cache configuration is not a one-time process, and from  time to time we must take a look at the statistics and possibly update  our configuration.</p>
<h2>3. Because you need to know how to warm up</h2>
<p>Solr starts a few minutes and replication lasts forever, even though the index is relatively small. So the question arises &#8211; why? Look  at <em>solrconfig.xml </em>file and we have a winner &#8211; a huge number of warming  queries, those that run at startup and those that run during the warm-up  of new searcher after data replication. We  must remember not to overdo the number of queries, because we achieve  counterproductive effect &#8211; despite potentially good warm-up Solr will  run poorly or not at all.</p>
<h2>4. I`ll save it in the configuration</h2>
<p>Sometimes  I meet the approach, where the person using the Solr would like to save  all the query parameters, even those that are changing the  configuration files. This  approach leads to many handlers definitions, which barely differ from  each other &#8211; the difference is in set of parameters, and the application  must &#8220;remember&#8221; which handler to use with the appropriate query. Of course, if you want to add some static or default configuration parameters, such an approach is absolutely correct. In  my opinion, it is a wring decision to create dozens of handlers  differing only in certain parameters or values of these parameters. Lets let the application, using Solr, have little bit of freedom make the application responsible for querying Solr.</p>
<h2>5. Why do I need a newer version</h2>
<p>As  in the case file that describes the structure of the index, in case of   <em>solrconfig.xml</em> file it is worth to take the time to look at what has  changed since the last deployed version of Solr. As you know Solr is developed pretty fast, and thus the configuration tends to change. From  my experience I know that for various reasons (such as tight deadlines,  lack of knowledge Solr)&nbsp; configuration files, during deployment  updates, are usually left alone. I`ll  repeat once again &#8211; try to update the configuration files &#8211; it take a  little time, and You can only gain profit by doing updates.</p>
<h2>6. The default configuration is optimal for me</h2>
<p>This time, I left the most common mistake as the last one. This is a very frequently repeated error, which draws not just my attention. I  emphasize this again &#8211; it is worth taking a moment (sometimes it take a  bit longer) and adjust the configuration files for our needs. In  a large number of cases, the configuration that You we will prepare  will be much more optimal for Your implementation, than the  configuration that come as default with Solr.</p>
<h2>Finally</h2>
<p>As  in the case of entry for errors in the<em> schema.xml</em> file  (<a href="http://solr.pl/2010/08/30/5-sins-of-schema-xml-modifications/?lang=en" target="_blank" rel="noopener noreferrer">http://solr.pl/2010/08/30/5-sins-of-schema-xml-modifications/?lang=en</a>),  I recommend the entry titled &#8220;<em>The Seven Deadly Sins of Solr</em>&#8221; which  can be read at: <a href="http://www.lucidimagination.com/blog/2010/01/21/the-seven-deadly-sins-of-solr/">http://www.lucidimagination.com/blog/2010/01/21/the-seven-deadly-sins-of-solr/</a>.  A reading worth the time.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2010/09/13/6-sins-of-solrconfig-xml-modifications/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
