SolrCloud – write and read tolerance

SolrCloud similar to most of the distributed systems is designed with some rules in mind. There are also rules that each distributed system is subject to. For example the CAP theorem tells that a system can’t achieve availability, data consistency and network partition split tolerance at the same time – you can have two out of three at most. Of course, in this blog entry, we will not be discussing principles of the distributed systems, but we will focus on write and read tolerance in SolrCloud.

Write time tolerance

Write tolerance is not a simple topic. First of all, with the introduction of Solr 7.0 we got a variety of replica types. We have NRT replicas which write data to transaction log and index data on each replica. We have TLOG type replicas which write to transaction log, but instead of indexing the data on their own they use the replication mechanism to pull the data. And finally we have the PULL replicas which do not use transaction log and only use replication mechanism to pull the data periodically from the leader shard.

However, we won’t be analyzing how each of the replica types work, but we will focus on the NRT replicas, because this type was here from the beginning of SolrCloud and what’s more this is still the default type of replicas in SolrCloud.

When it comes to NRT replicas, the indexing process is as follows. The leader accepts the data, writes it into the transaction log and sends it to all its replicas (assuming all are of NRT type). Then each of the replicas writes the data to the transaction log and return the acknowledgment. At this point we know that the data is safe. Of course, somewhere in the meantime the data will also be written to the inverted index. But the question is – what will happen when not all shards will be available? I would bet on indexing not succeeding, but to be perfectly sure – let’s check that by starting two Solr instances by using the following commands

$ bin/solr start -c

$ bin/solr start -z localhost:9983 -p 6983

Next, let’s create a collection built of two shards:

$ bin/solr create_collection -c test_index -shards 2 -replicationFactor 1

Once the collection is created let’s stop one of the instances

$ bin/solr stop -p 6983

And finally let’s try indexing some data by using the following command:

$ curl -XPOST -H 'Content-type:application/json' 'localhost:8983/solr/test_index/update' -d '{
 "id" : 2,
 "name" : "Test document"
}'

As we would expect Solr returns an error:

{
  "responseHeader":{
    "status":503,
    "QTime":4011},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"No registered leader was found after waiting for 4000ms , collection: test_index slice: shard2 saw state=DocCollection(test_index//collections/test_index/state.json/8)={\n  \"pullReplicas\":\"0\",\n  \"replicationFactor\":\"1\",\n  \"shards\":{\n    \"shard1\":{\n      \"range\":\"80000000-ffffffff\",\n      \"state\":\"active\",\n      \"replicas\":{\"core_node3\":{\n          \"core\":\"test_index_shard1_replica_n1\",\n          \"base_url\":\"http://192.168.1.11:8983/solr\",\n          \"node_name\":\"192.168.1.11:8983_solr\",\n          \"state\":\"active\",\n          \"type\":\"NRT\",\n          \"force_set_state\":\"false\",\n          \"leader\":\"true\"}}},\n    \"shard2\":{\n      \"range\":\"0-7fffffff\",\n      \"state\":\"active\",\n      \"replicas\":{\"core_node4\":{\n          \"core\":\"test_index_shard2_replica_n2\",\n          \"base_url\":\"http://192.168.1.11:6983/solr\",\n          \"node_name\":\"192.168.1.11:6983_solr\",\n          \"state\":\"down\",\n          \"type\":\"NRT\",\n          \"force_set_state\":\"false\",\n          \"leader\":\"true\"}}}},\n  \"router\":{\"name\":\"compositeId\"},\n  \"maxShardsPerNode\":\"-1\",\n  \"autoAddReplicas\":\"false\",\n  \"nrtReplicas\":\"1\",\n  \"tlogReplicas\":\"0\"} with live_nodes=[192.168.1.11:8983_solr]",
    "code":503}}

In that case we can’t really do anything. We don’t want to manually route the data and even if we would, we don’t have a guarantee that the data would end up in one of the available shards. The best we can do is bring the missing shards back to life as soon as possible 😉

What will happen if we have multiple replicas and only some of them are missing? In that case the write should be successful and Solr should inform us how many replicas written the data (at least in its newest versions) by including the rf parameter in the response. Let’s check that out.

Let’s create another collection, this time with a single shard and two replicas on our two Solr nodes:

$ bin/solr create_collection -c test_index_2 -shards 1 -replicationFactor 2

If we would try to index data using exactly the same Solr would return the following response (when using with Solr 7.6.0):

{
  "responseHeader":{
    "rf":2,
    "status":0,
    "QTime":316}}

As we can see the rf parameter is set to 2. This means that the replication factor of 2 was achieved. In the scope of our collection it means that the write was successful both on the leader shards and the replica shard. If we would stop the Solr instance running on port 6983 and try to index the same data once again, we would get the following response:

{
  "responseHeader":{
    "rf":1,
    "status":0,
    "QTime":4}}

In the earlier Solr versions in order to get the information about the achieved replication factor we had to include the min_rf parameter in our indexing request and set it to a value higher then 1.

Read time tolerance

When it comes to reads the situation is a bit different. If we don’t have all shards available we will loose visibility over a portion of the data. For example, having collection with 10 shards and loosing one of them means that we lost approximately 10% of the data. And of course during query, by default, Solr will not show the remaining 90% of the documents, but will just throw an error. Let’s check if that is true. To do that we will create two instances of Solr by using the following command:

$ bin/solr start -c

$ bin/solr start -z localhost:9983 -p 6983

Next, let’s create a simple collection built of two shards:

$ bin/solr create_collection -c test -shards 2 -replicationFactor 1

And now, without indexing the data let’s just stop one instance, the one that is running on port 6983:

$ bin/solr stop -p 6983

Now all it takes to get an error is to run the following query:

http://localhost:8983/solr/test/select?q=*:*

In response instead of empty results list we will get an error similar to the following one:

{
  "responseHeader":{
    "status":503,
    "QTime":6,
    "params":{
      "q":"*:*"}},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"no servers hosting shard: shard2",
    "code":503}}

OK the default behavior is good – we have an error, because we don’t have full consistency of the data. But what if we would like to show the partial results taking the risk of not delivering the most valuable results, but also not showing error or empty pages. To achieve that Solr gives us two parameters, the shards.tolerant and shards.info. If we would like to have partial results returned we should set the first one to true, if we would like to have detailed information about shards we should set the second one to true. For example:

http://localhost:8983/solr/test/select?q=*:*&shards.tolerant=true&shards.info=true

In case of the above query Solr will not return an error, partial results will be returned and an information about error on one of the shards:

{
  "responseHeader":{
    "zkConnected":true,
    "partialResults":true,
    "status":0,
    "QTime":45,
    "params":{
      "q":"*:*",
      "shards.tolerant":"true",
      "shards.info":"true"}},
  "shards.info":{
    "":{
      "error":"org.apache.solr.common.SolrException: no servers hosting shard: ",
      "trace":"org.apache.solr.common.SolrException: no servers hosting shard: \n\tat org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:165)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)\n\tat org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n",
      "time":0},
    "http://192.168.1.11:8983/solr/test_shard1_replica_n1/":{
      "numFound":0,
      "maxScore":0.0,
      "shardAddress":"http://192.168.1.11:8983/solr/test_shard1_replica_n1/",
      "time":18}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  }}

As you can see everything works as we wanted. We got the results, we got the information that the results are partial (the partialResults property set to true in the response header) so our application would know that the results are not full and something went wrong. What’s more, we also got full information about which shard is to blame, because we added the shards.info=true parameter to our query.

Solr.pl

SolrCloud – write and read tolerance

Write time tolerance

Read time tolerance

Leave a Reply Cancel reply