Solr 4.0 Cookbook

cookbook_4_coverThe book is totally focused on the 4.0 version of Apache Solr enterprise search server. The content is divided into ten thematic chapters, just like with the previous version of the book. Each chapter consists of a few to several subsections. The book is maintained in the convention cookbook, which means that it is not a guide from A to Z about Solr – it is a ready-made solutions to some of the problems that can be encountered while working with Solr.

The book includes topics such as:

  • SolrCloud – configuration, usage and administration panel
  • Document language detection
  • Indexing data in different formats
  • Faceting
  • Results grouping
  • Performance improvement techniques
  • Real life situations, like auto complete or relevance improvements
  • And much more 🙂

If you are interested, please refer to the Packt Publishing page: http://www.packtpub.com/apache-solr-4-cookbook/book.

Errata

We would like to ensure that the reception of the book should be as good as possible and because we have found some mistakes in the book we decided to write a little errata. We sincerely apologize for all the error and mistakes.

Chapter 1

Running Solr on Jetty

On page 6 there is a context directory being mentioned – it should be contexts.

The example showing how to increase the header buffer size is based on Jetty 6. If you are using newer Jetty, like Jetty 8 instead of  headerBufferSize please use the requestHeaderSize property. So the example will look like:

<Set name="requestHeaderSize">32768</Set>

Installing a standalone Zookeeper

In case of installing more than a single Zookeeper instance you need to create a file called myid in the data directory of Zookeeper installation. This file should contain identifier of the Zookeeper server. More information about this can be found on the following web page http://zookeeper.apache.org/doc/r3.3.3/zookeeperStarted.html.
Found by: Marek RogoziƄski (@nnegativ)

How to fetch and index web pages

On page 28, the example describing the schema.xml file should look like the description states, so it should be like this:

<schema name="nutch" version="1.5">

Found by: Marek RogoziƄski (@nnegativ)

Chapter 2

How to properly configure Data Import Handler with JDBC

On page 43 there is a following sentence: “To check the status of the indexing process, you can run the command once again.”. This is only right when DIH is working if it is not, than another indexing process will start. In order to check the status one can just run the following command:

curl http://localhost:8983/solr/dataimport

Found by: Artyom Lukanin (@avlukanin)

How to properly configure Data Import Handler with JDBC

On page 43 in the db-data-config.xml example there is the following code snippet:

<field column="description" name="description" />

It should be:

<field column="desc" name="description" />

Found by: Artyom Lukanin (@avlukanin)

How to modify data while importing with Data Import Handler

On page 54 in the db-data-config.xml example there is the following code snippet:

row.remove('name');

It should be:

row.remove('user_name');

Found by: Felipe Besson

Handling multiple currencies

On page 59, there is a small typo. The last sentence in the introduction to the recipe should be: “On the other hand, you can use the new functionality introduced in Solr 4.0 and create a field that will use the provided currency exchange rates”.

Found by: Felipe Besson

Chapter 3

Eliminating XML and HTML tags from text

On page 73 the value in html field of the example document should be surrounded by CDATA section, just like it is in the code you can download. The example document should look like this:

<add>
 <doc>
  <field name="id">1</field>
  <field name="html"><![CDATA[<html><head><title>My page</title></head><body><p>This is a <b>my</b> <i>sample</i> page</body></html>]]></field>
 </doc>
</add>

Found by: Marek RogoziƄski (@nnegativ)

Changing words to other words

On page 79, there is a small typo. The third sentence of the “How it works…” section should be “The second one should be of interest to us right now”.
Found by: Felipe Besson

Storing geographical points in the index

On page 90 there is a sentence missing before the last example. Currently it is “(…) can add data to index:” and it should be “(…) can add data to index. Now let’s look again at the query“.
Found by: Marek RogoziƄski (@nnegativ)

Using your own stemming dictionary

On page 102, the last sentence should be: “Now, the fields that are based on the text_english type will be stemmed”. The word not is not needed.
Found by: Marek RogoziƄski (@nnegativ)

Chapter 4

How to search for a phrase, not a single word

On page 114 the sentence “This means that you want documents that could have an additional word between the word 2010 and report” should be “This means that you want documents that could have an additional word between the word 2012 and report“.
Found by: Felipe Besson

Chapter 7

Setting up two collections inside a single cluster

In both examples showing how to upload collection configuration to ZooKeeper there is a mistake – there should be a space character between the confname parameter and its value. Those examples should look like this:

cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir /usr/share/config/books/conf -confname bookscollection
cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir /usr/share/config/users/conf -confname userscollection

Managing your SolrCloud cluster

In the examples showing how to upload collection configuration to ZooKeeper there is a mistake – there should be a space character between the confname parameter and its value. This example should look like this:

cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir /usr/share/config/books/conf -confname bookscollection

Chapter 10

How to get the documents with all the query words to the top of the results set

On page 298 we have the /better handler configuration that misses spaces in some properties. The properties missing spaces are:

<str name="q">_query_:"{!edismaxqf=$qfQuery mm=$mmQuerypf=pfQuerybg=$boostQuery v=$mainQuery}"</str>
<str name="boostQuery">_query_:"{!edismaxqf=$boostQueryQf mm=100% v=$mainQuery"^100000</str>

The correct version is as follows:

<str name="q">_query_:"{!edismax qf=$qfQuery mm=$mmQuery pf=pfQuery bg=$boostQuery v=$mainQuery}"</str>
<str name="boostQuery">_query_:"{!edismax qf=$boostQueryQf mm=100% v=$mainQuery"^100000</str>