The book is totally focused on the 4.0 version of Apache Solr enterprise search server. The content is divided into ten thematic chapters, just like with the previous version of the book. Each chapter consists of a few to several subsections. The book is maintained in the convention cookbook, which means that it is not a guide from A to Z about Solr – it is a ready-made solutions to some of the problems that can be encountered while working with Solr.
The book includes topics such as:
- SolrCloud – configuration, usage and administration panel
- Document language detection
- Indexing data in different formats
- Faceting
- Results grouping
- Performance improvement techniques
- Real life situations, like auto complete or relevance improvements
- And much more đ
If you are interested, please refer to the Packt Publishing page: http://www.packtpub.com/apache-solr-4-cookbook/book.
Errata
We would like to ensure that the reception of the book should be as good as possible and because we have found some mistakes in the book we decided to write a little errata. We sincerely apologize for all the error and mistakes.
Chapter 1
Running Solr on Jetty
On page 6 there is a context directory being mentioned – it should be contexts.
The example showing how to increase the header buffer size is based on Jetty 6. If you are using newer Jetty, like Jetty 8 instead of headerBufferSize please use the requestHeaderSize property. So the example will look like:
<Set name="requestHeaderSize">32768</Set>
Installing a standalone Zookeeper
In case of installing more than a single Zookeeper instance you need to create a file called myid in the data directory of Zookeeper installation. This file should contain identifier of the Zookeeper server. More information about this can be found on the following web page http://zookeeper.apache.org/doc/r3.3.3/zookeeperStarted.html.
Found by: Marek RogoziĆski (@nnegativ)
How to fetch and index web pages
On page 28, the example describing the schema.xml file should look like the description states, so it should be like this:
<schema name="nutch" version="1.5">
Found by: Marek RogoziĆski (@nnegativ)
Chapter 2
How to properly configure Data Import Handler with JDBC
On page 43 there is a following sentence: “To check the status of the indexing process, you can run the command once again.”. This is only right when DIH is working if it is not, than another indexing process will start. In order to check the status one can just run the following command:
curl http://localhost:8983/solr/dataimport
Found by: Artyom Lukanin (@avlukanin)
How to properly configure Data Import Handler with JDBC
On page 43 in the db-data-config.xml example there is the following code snippet:
<field column="description" name="description" />
It should be:
<field column="desc" name="description" />
Found by: Artyom Lukanin (@avlukanin)
How to modify data while importing with Data Import Handler
On page 54 in the db-data-config.xml example there is the following code snippet:
row.remove('name');
It should be:
row.remove('user_name');
Found by: Felipe Besson
Handling multiple currencies
On page 59, there is a small typo. The last sentence in the introduction to the recipe should be: “On the other hand, you can use the new functionality introduced in Solr 4.0 and create a field that will use the provided currency exchange rates”.
Found by: Felipe Besson
Chapter 3
Eliminating XML and HTML tags from text
On page 73 the value in html field of the example document should be surrounded by CDATA section, just like it is in the code you can download. The example document should look like this:
<add> <doc> <field name="id">1</field> <field name="html"><![CDATA[<html><head><title>My page</title></head><body><p>This is a <b>my</b> <i>sample</i> page</body></html>]]></field> </doc> </add>
Found by: Marek RogoziĆski (@nnegativ)
Changing words to other words
On page 79, there is a small typo. The third sentence of the “How it works…” section should be “The second one should be of interest to us right now”.
Found by: Felipe Besson
Storing geographical points in the index
On page 90 there is a sentence missing before the last example. Currently it is “(…) can add data to index:” and it should be “(…) can add data to index. Now let’s look again at the query“.
Found by: Marek RogoziĆski (@nnegativ)
Using your own stemming dictionary
On page 102, the last sentence should be: “Now, the fields that are based on the text_english type will be stemmed”. The word not is not needed.
Found by: Marek RogoziĆski (@nnegativ)
Chapter 4
How to search for a phrase, not a single word
On page 114 the sentence “This means that you want documents that could have an additional word between the word 2010 and report” should be “This means that you want documents that could have an additional word between the word 2012 and report“.
Found by: Felipe Besson
Chapter 7
Setting up two collections inside a single cluster
In both examples showing how to upload collection configuration to ZooKeeper there is a mistake – there should be a space character between the confname parameter and its value. Those examples should look like this:
cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir /usr/share/config/books/conf -confname bookscollection cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir /usr/share/config/users/conf -confname userscollection
Managing your SolrCloud cluster
In the examples showing how to upload collection configuration to ZooKeeper there is a mistake – there should be a space character between the confname parameter and its value. This example should look like this:
cloud-scripts/zkcli.sh -cmdupconfig -zkhost localhost:2181 -confdir /usr/share/config/books/conf -confname bookscollection
Chapter 10
How to get the documents with all the query words to the top of the results set
On page 298 we have the /better handler configuration that misses spaces in some properties. The properties missing spaces are:
<str name="q">_query_:"{!edismaxqf=$qfQuery mm=$mmQuerypf=pfQuerybg=$boostQuery v=$mainQuery}"</str> <str name="boostQuery">_query_:"{!edismaxqf=$boostQueryQf mm=100% v=$mainQuery"^100000</str>
The correct version is as follows:
<str name="q">_query_:"{!edismax qf=$qfQuery mm=$mmQuery pf=pfQuery bg=$boostQuery v=$mainQuery}"</str> <str name="boostQuery">_query_:"{!edismax qf=$boostQueryQf mm=100% v=$mainQuery"^100000</str>