Solr 7.6.0 – uninvertible fields

With the recent release of Solr 7.6.0 we got a new option for the fields and field types – the property called uninvertible. It allows us to control what Solr will do when it will require data in an uninverted format, so for example when using faceting or sorting. Let’s look into what happens with various settings of this new property.

Example data structure

Let’s start with a very simple data structure that we will use for testing:

Nothing special, we have three fields, one of which is the identifier. After indexing some example data let’s try running faceting on on of the field, for example the title one. In our case that shouldn’t be a problem, the query will be executed without any issues. Just for the reference, the query looks as follows:

Adding uninvertible

The next thing we should try is modifying our simple example by introducing the uninvertible property for the title field and setting it to true. Keep in mind that for backward compatibility reason this is the default value for the fields in our data structure. After we modify the schema.xml file and run the query we will still get results as we would expect. Btw, the modified data structure should look as follows:

Solr behavior with uninvertible=false

Let’s modify our example one last time and let’s set the uninvertible property for the title field to false:

If we would now run our example query to Solr its behavior will change. Instead of Solr returning the faceting results we would get an empty array. This is because we don’t have doc values for the title field, because we can’t have it – it is analyzed field. And with field set to uninvertible=false Solr will not allow building the field cache entries for that field. 

Usage

The uninverible property should be used when we want to be sure that on our instance or in our cluster certain fields shouldn’t be used for faceting. For example, when we know that some fields are analyzed they may contain lots of unique values, which can be very memory intensive leading to high memory usage and long garbage collections. I.e. you can think of removing possibility of running facets or sorting on the analyzed fields if you don’t need it, so that no one is able to run such queries on the production environment.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.