6 sins of solrconfig.xml modifications

Solrconfig.xml file is another file that defines the behavior Solr. Unlike a file that describes the structure of the index file solrconfig.xml determines the functionality available in Solr. Just like in the case schema.xml file we can distinguish a number of standard mistakes made by those who implement Solr, and I’m not talking only about people who have little experience with Solr. In order to learn some of those mistakes I invite you to read the following entry.

At the beginning I wanted to point out that the following examples are not all mistakes that can be made, this are only examples of what should be considered when using Solr.

1. I`m sure I`ll need it

As in case of schema.xml file, I suggest minimalism in solrconfig.xml too. If we know that we will only use response in JSON format there is no need to configure additional response formats. I often come across situations when a person setup Solr with all the possible handlers, response writers and a number of additional features despite the fact that they do not even know what some of them do. Although the use of memory for the standard configuration elements is not large, remember that maintenance of a minimalist solrconfig.xml file is definitely easier than that which is blown to the borders of impossibility.

2. Why should I cache ?

Extreme case, but true. I was once asked if the cache is necessary if the application using Solr will use the cache on it`s own side. My answer was yes, of course. People who do not know Solr, imagine it’s cache, as the next, sometimes unnecessary, level of retention of search results. However, please note that in addition to cache mechanism based on HTTP Solr has its own cache – to be more accurate – Solr have more than one type of cache. When we adjust Solr cache to our needs we monitor test servers – how hits distribute – is the cache too big or maybe too small. Please note that Solr cache configuration is not a one-time process, and from time to time we must take a look at the statistics and possibly update our configuration.

3. Because you need to know how to warm up

Solr starts a few minutes and replication lasts forever, even though the index is relatively small. So the question arises – why? Look at solrconfig.xml file and we have a winner – a huge number of warming queries, those that run at startup and those that run during the warm-up of new searcher after data replication. We must remember not to overdo the number of queries, because we achieve counterproductive effect – despite potentially good warm-up Solr will run poorly or not at all.

4. I`ll save it in the configuration

Sometimes I meet the approach, where the person using the Solr would like to save all the query parameters, even those that are changing the configuration files. This approach leads to many handlers definitions, which barely differ from each other – the difference is in set of parameters, and the application must “remember” which handler to use with the appropriate query. Of course, if you want to add some static or default configuration parameters, such an approach is absolutely correct. In my opinion, it is a wring decision to create dozens of handlers differing only in certain parameters or values of these parameters. Lets let the application, using Solr, have little bit of freedom make the application responsible for querying Solr.

5. Why do I need a newer version

As in the case file that describes the structure of the index, in case of solrconfig.xml file it is worth to take the time to look at what has changed since the last deployed version of Solr. As you know Solr is developed pretty fast, and thus the configuration tends to change. From my experience I know that for various reasons (such as tight deadlines, lack of knowledge Solr) configuration files, during deployment updates, are usually left alone. I`ll repeat once again – try to update the configuration files – it take a little time, and You can only gain profit by doing updates.

6. The default configuration is optimal for me

This time, I left the most common mistake as the last one. This is a very frequently repeated error, which draws not just my attention. I emphasize this again – it is worth taking a moment (sometimes it take a bit longer) and adjust the configuration files for our needs. In a large number of cases, the configuration that You we will prepare will be much more optimal for Your implementation, than the configuration that come as default with Solr.

Finally

As in the case of entry for errors in the schema.xml file (http://solr.pl/2010/08/30/5-sins-of-schema-xml-modifications/?lang=en), I recommend the entry titled “The Seven Deadly Sins of Solr” which can be read at: http://www.lucidimagination.com/blog/2010/01/21/the-seven-deadly-sins-of-solr/. A reading worth the time.

Solr.pl