SolrCloud and query execution control

With the release of Solr 7.0 and introduction of new replica types, in addition to the defa?ult NRT type the question appeared – can we control the queries and where they are executed? Can we tell Solr to execute the queries only on the PULL replicas or give TLOG replicas a priority? Let’s check that out.

Shards parameter

The first control option that we have in SolrCloud is the shards parameter. Using it we can directly control which shards should be used for querying. For example we can provide a logical shard name in our query:

The first of the above queries will be executed only on those shards that are grouped under the logical shard1 name. The second query will be executed on logical shard1, shard2 and shard3, while the third query will be executed on the shards that are deployed on the localhost:6683 node on the test collection

There is also a possibility to do load balancing across instances, for example:

The above query will be executed on instance running on port 6683 or on the one running on port 7783.

Shards.preference parameter

While the shards parameter gives us some degree of control where the query should be executed it is not exactly what we would like to have. However to use a certain type of replica we would have to get the data about the physical layout of the shards and this is not something that we would like to do. Because of that the shards.preference parameter has been introduced to Solr. It allows us to tell Solr what type of replicas should have the priority when executing query.

For example, to tell Solr that PULL type replicas should have priority when the query is executed one should add the shards.preference parameter to the query and set it to replica.type:PULL:

The nice thing is that we can tell Solr that first PULL replicas should be used and then if they are not available TLOG replicas should be used:

We can also define that PULL types replicas should be used first and if they are not available local shards should have the priority:

In addition to the above example we can also define priority based on location of the replicas. For example if our 192.168.1.1 Solr node is way more powerful compared to the others and we would like to first prioritize PULL replicas and then the mentioned Solr node we would run the following query:

Summary

The discussed parameters and the shards.preference in particular with its replica.type value can be very useful when we are using SolrCloud with different types of replicas. Telling Solr that we would like to prefer PULL or TLOG replicas we can lower the query based pressure on the NRT replicas and thus have better performance of the whole cluster. What’s more – dividing the replicas can help us in achieving query performance that is close what Solr master – slave architecture provides without sacrificing all the goodies that come with SolrCloud itself.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.