Today’s entry is dedicated to one type of cache in the Solr – filter cache. I will try to explain what it does, how to configure it and how to use it in an efficient way.
What it is used for ?
Let’s start from the inside. FilterCache stores unordered collection of identifiers of documents. Of course, these are not the IDs defined in the schema.xml file as a unique key – Solr stores the internal IDs of the documents used by Lucene and Solr – it is worth remembering.
What it is used for ?
The main task of the filterCache is to keep results related to the use of filters. Although it is not his only use. In addition, the cache can serve as an aid for faceting mechanism (if using the TermEnum method), and for sorting when <useFilterForSortedQuery/> option is set to true in the solrconfig.xml file.
FilterCache standard definition is as follows:
<filterCache class="solr.FastLRUCache" size="16384" initialSize="4096" autowarmCount="4096" />
You have the following configuration options:
- class – class is responsible for implementation. For filterCache recommend using solr.FastLRUCache, which is characterized by greater efficiency in a larger number of operations GET, PUT than that.
- size – the maximum number of entries that can be found in the cache.
- initialSize – initial size of the cache.
- autowarmCount – the number of entries that will be transcribed during the warm-up from the old to the new cache.
- minSize – value specifying to which the number of entries Solr will try to reduce the cache in case of full restoration.
- acceptableSize – if Solr will not be able to bring the number of entries to the specified by parameter minSize, the value acceptableSize will be the one to which it will seek a new one.
- cleanupThread – the default value is false. If set to true to clean the cache will be used a separate topic.
In most cases, the use of size , and initialSize and autowarmCount parameters is quite sufficient.
How to configure ?
The size of the cache should be determined on the basis of queries that are sent to Solr. The maximum size filterCache should be at least as large as the number of filters (with values) that we use. This means that if your application is, in a given period of time, using 2000 for example (fq parameters with values), the size parameter should be set to a minimum value of 2000.
However, the configuration of the cache is not sufficient – we need to make the query to be able to use it. Take the following query for example:
At first glance, the query is the correct. However, there is a problem – it does not use filterCache. The entire request will be handled by queryResultCache and will create a single entry in it. Let’s modify it a bit and send the following query.
What happens now? As in the previous case, an entry will be created in queryResultCache. Additionaly there will be two entries in filterCache created. Now let’s look at the next query:
This query would create another entry in the queryResultCache and would use two already existing entries in the filterCache. Thus the execution time of the query would be reduced and the query would be less demanding for the I/O.
However, let’s look at the query in the following form:
Solr would not be able to use any information from the cache and would have to collect all the information for the results of the Lucene index.
Last few words
As you can see, the correct way to configure cache is not what guarantee that Solr will be able to use it. The efficiency of the target implementation depends on how the queries are send to Solr. It is worth remembering when planning implementation.
This post is also available in: Polish