{"id":295,"date":"2020-11-11T20:57:46","date_gmt":"2020-11-11T19:57:46","guid":{"rendered":"http:\/\/sematext.solr.pl\/?page_id=295"},"modified":"2020-11-11T20:57:47","modified_gmt":"2020-11-11T19:57:47","slug":"solr-4-0-cookbook","status":"publish","type":"page","link":"https:\/\/solr.pl\/en\/solr-4-0-cookbook\/","title":{"rendered":"Solr 4.0 Cookbook"},"content":{"rendered":"<p><a href=\"http:\/\/www.packtpub.com\/apache-solr-4-cookbook\/book\" target=\"_blank\" rel=\"noopener noreferrer\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-2683\" src=\"https:\/\/solr.pl\/wp-content\/uploads\/2013\/01\/cookbook_4_cover.png\" alt=\"cookbook_4_cover\" width=\"125\" height=\"152\"><\/a>The book is totally focused on the 4.0 version of Apache Solr enterprise search server. The content is divided into ten thematic chapters, just like with the previous version of the book. Each chapter consists of a few to several subsections. The book is maintained in the convention cookbook, which means that it is not a guide from A to Z about Solr &#8211; it is a ready-made solutions to some of the problems that can be encountered while working with Solr.<\/p>\n<p>The book includes topics such as:<\/p>\n<ul>\n<li>SolrCloud &#8211; configuration, usage and administration panel<\/li>\n<li>Document language detection<\/li>\n<li>Indexing data in different formats<\/li>\n<li>Faceting<\/li>\n<li>Results grouping<\/li>\n<li>Performance improvement techniques<\/li>\n<li>Real life situations, like auto complete or relevance improvements<\/li>\n<li>And much more \ud83d\ude42<\/li>\n<\/ul>\n<p>If you are interested, please refer to the Packt Publishing page: <a href=\"http:\/\/www.packtpub.com\/apache-solr-4-cookbook\/book\">http:\/\/www.packtpub.com\/apache-solr-4-cookbook\/book<\/a>.<\/p>\n<h2>Errata<\/h2>\n<p>We would like to ensure that the reception of the book should be as good as possible and because we have found some mistakes in the book we decided to write a little errata. We sincerely apologize for all the error and mistakes.<\/p>\n<h3>Chapter 1<\/h3>\n<h4>Running Solr on Jetty<\/h4>\n<p>On page 6 there is a <strong>context<\/strong> directory being mentioned &#8211; it should be <strong>contexts<\/strong>.<\/p>\n<p>The example showing how to increase the header buffer size is based on Jetty 6. If you are using newer Jetty, like Jetty 8 instead of&nbsp; <strong>headerBufferSize<\/strong> please use the <strong>requestHeaderSize <\/strong>property. So the example will look like:<\/p>\n<pre class=\"brush:xml\">&lt;Set name=\"requestHeaderSize\"&gt;32768&lt;\/Set&gt;<\/pre>\n<h4>Installing a standalone Zookeeper<\/h4>\n<p>In case of installing more than a single Zookeeper instance you need to create a file called <em>myid<\/em> in the <em>data<\/em> directory of Zookeeper installation. This file should contain identifier of the Zookeeper server. More information about this can be found on the following web page <a href=\"http:\/\/zookeeper.apache.org\/doc\/r3.3.3\/zookeeperStarted.html\" target=\"_blank\" rel=\"noopener noreferrer\">http:\/\/zookeeper.apache.org\/doc\/r3.3.3\/zookeeperStarted.html<\/a>.<br \/>\nFound by: Marek Rogozi\u0144ski (<a href=\"https:\/\/twitter.com\/nnegativ\">@nnegativ<\/a>)<\/p>\n<h4>How to fetch and index web pages<\/h4>\n<p>On page 28, the example describing the <em>schema.xml<\/em> file should look like the description states, so it should be like this:<\/p>\n<pre>&lt;schema name=\"nutch\" version=\"1.5\"&gt;<\/pre>\n<p>Found by: Marek Rogozi\u0144ski (<a href=\"https:\/\/twitter.com\/nnegativ\">@nnegativ<\/a>)<\/p>\n<h3>Chapter 2<\/h3>\n<h4><strong>How to properly configure Data Import Handler with JDBC<\/strong><\/h4>\n<p>On page 43 there is a following sentence: &#8220;To check the status of the indexing process, you can run the command once again.&#8221;. This is only right when DIH is working if it is not, than another indexing process will start. In order to check the status one can just run the following command:<\/p>\n<pre class=\"brush:xml\">curl http:\/\/localhost:8983\/solr\/dataimport<\/pre>\n<p>Found by: Artyom Lukanin (<a href=\"https:\/\/twitter.com\/avlukanin\">@avlukanin<\/a>)<\/p>\n<h4>How to properly configure Data Import Handler with JDBC<\/h4>\n<p>On page 43 in the <em>db-data-config.xml<\/em> example there is the following code snippet:<\/p>\n<pre class=\"brush:xml\">&lt;field column=\"description\" name=\"description\" \/&gt;<\/pre>\n<p>It should be:<\/p>\n<pre class=\"brush:xml\">&lt;field column=\"desc\" name=\"description\" \/&gt;<\/pre>\n<p>Found by: Artyom Lukanin (<a href=\"https:\/\/twitter.com\/avlukanin\">@avlukanin<\/a>)<\/p>\n<h4>How to modify data while importing with Data Import Handler<\/h4>\n<p>On page 54 in the <em>db-data-config.xml<\/em> example there is the following code snippet:<\/p>\n<pre>row.remove('name');<\/pre>\n<p>It should be:<\/p>\n<pre>row.remove('user_name');<\/pre>\n<p>Found by: Felipe Besson<\/p>\n<h4>Handling multiple currencies<\/h4>\n<p>On page 59, there is a small typo. The last sentence in the introduction to the recipe should be: &#8220;On the other hand, you can use the new functionality introduced in Solr 4.0 and create a field that will use the provided currency exchange rates&#8221;.<\/p>\n<p>Found by: Felipe Besson<\/p>\n<h3>Chapter 3<\/h3>\n<h4>Eliminating XML and HTML tags from text<\/h4>\n<p>On page 73 the value in <i>html<\/i> field of the example document should be surrounded by <i>CDATA<\/i> section, just like it is in the code you can download. The example document should look like this:<\/p>\n<pre class=\"brush:xml\">&lt;add&gt;\n &lt;doc&gt;\n  &lt;field name=\"id\"&gt;1&lt;\/field&gt;\n  &lt;field name=\"html\"&gt;&lt;![CDATA[&lt;html&gt;&lt;head&gt;&lt;title&gt;My page&lt;\/title&gt;&lt;\/head&gt;&lt;body&gt;&lt;p&gt;This is a &lt;b&gt;my&lt;\/b&gt; &lt;i&gt;sample&lt;\/i&gt; page&lt;\/body&gt;&lt;\/html&gt;]]&gt;&lt;\/field&gt;\n &lt;\/doc&gt;\n&lt;\/add&gt;<\/pre>\n<p>Found by: Marek Rogozi\u0144ski (<a href=\"https:\/\/twitter.com\/nnegativ\">@nnegativ<\/a>)<\/p>\n<h4>Changing words to other words<\/h4>\n<p>On page 79, there is a small typo. The third sentence of the &#8220;How it works&#8230;&#8221; section should be &#8220;The second one should be of interest to us right now&#8221;.<br \/>\nFound by: Felipe Besson<\/p>\n<h4>Storing geographical points in the index<\/h4>\n<p>On page 90 there is a sentence missing before the last example. Currently it is &#8220;<em>(&#8230;) can add data to index:<\/em>&#8221; and it should be &#8220;<em>(&#8230;) can add data to index. Now let&#8217;s look again at the query<\/em>&#8220;.<br \/>\nFound by: Marek Rogozi\u0144ski (<a href=\"https:\/\/twitter.com\/nnegativ\">@nnegativ<\/a>)<\/p>\n<h4>Using your own stemming dictionary<\/h4>\n<p>On page 102, the last sentence should be: &#8220;Now, the fields that are based on the <em>text_english<\/em> type will be stemmed&#8221;. The word <em>not<\/em> is not needed.<br \/>\nFound by: Marek Rogozi\u0144ski (<a href=\"https:\/\/twitter.com\/nnegativ\">@nnegativ<\/a>)<\/p>\n<h3>Chapter 4<\/h3>\n<h4>How to search for a phrase, not a single word<\/h4>\n<p>On page 114 the sentence &#8220;This means that you want documents that could have an additional word between the word <em>2010<\/em> and <em>report<\/em>&#8221; should be &#8220;This means that you want documents that could have an additional word between the word <em>2012<\/em> and <em>report<\/em>&#8220;.<br \/>\nFound by: Felipe Besson<\/p>\n<h3>Chapter 7<\/h3>\n<h4>Setting up two collections inside a single cluster<\/h4>\n<p>In both examples showing how to upload collection configuration to ZooKeeper there is a mistake &#8211; there should be a space character between the <strong>confname<\/strong> parameter and its value. Those examples should look like this:<\/p>\n<pre class=\"brush:xml\">cloud-scripts\/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir \/usr\/share\/config\/books\/conf -confname bookscollection\ncloud-scripts\/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir \/usr\/share\/config\/users\/conf -confname userscollection<\/pre>\n<h4>Managing your SolrCloud cluster<\/h4>\n<p>In the examples showing how to upload collection configuration to ZooKeeper there is a mistake &#8211; there should be a space character between the <strong>confname<\/strong> parameter and its value. This example should look like this:<\/p>\n<pre class=\"brush:xml\">cloud-scripts\/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir \/usr\/share\/config\/books\/conf -confname bookscollection<\/pre>\n<h3>Chapter 10<\/h3>\n<h4>How to get the documents with all the query words to the top of the results set<\/h4>\n<p>On page 298 we have the <em>\/better<\/em> handler configuration that misses spaces in some properties. The properties missing spaces are:<\/p>\n<pre class=\"brush:xml\">&lt;str name=\"q\"&gt;_query_:\"{!edismaxqf=$qfQuery mm=$mmQuerypf=pfQuerybg=$boostQuery v=$mainQuery}\"&lt;\/str&gt;\n&lt;str name=\"boostQuery\"&gt;_query_:\"{!edismaxqf=$boostQueryQf mm=100% v=$mainQuery\"^100000&lt;\/str&gt;<\/pre>\n<p>The correct version is as follows:<\/p>\n<pre class=\"brush:xml\">&lt;str name=\"q\"&gt;_query_:\"{!edismax qf=$qfQuery mm=$mmQuery pf=pfQuery bg=$boostQuery v=$mainQuery}\"&lt;\/str&gt;\n&lt;str name=\"boostQuery\"&gt;_query_:\"{!edismax qf=$boostQueryQf mm=100% v=$mainQuery\"^100000&lt;\/str&gt;<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>The book is totally focused on the 4.0 version of Apache Solr enterprise search server. The content is divided into ten thematic chapters, just like with the previous version of the book. Each chapter consists of a few to several<\/p>\n","protected":false},"author":3,"featured_media":0,"parent":0,"menu_order":2,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"class_list":["post-295","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/pages\/295","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/comments?post=295"}],"version-history":[{"count":1,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/pages\/295\/revisions"}],"predecessor-version":[{"id":296,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/pages\/295\/revisions\/296"}],"wp:attachment":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/media?parent=295"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}