Solr 7.2 – rebalancing replicas using UTILIZENODE

With the release of Solr 7.2 we got a new option in the Collections API – the ability to utilize a new or existing node of our cluster based on the autoscaling properties. It basically looks at the replicas placement and using the autoscaling rules for the cluster and collection allows us to automatically move replicas to a given node. Sounds nice in theory, so let’s look how it actually works.

Autoscaling API in Solr

If you are not aware what the Autoscaling API in Solr is let me quickly get you up to date on that. That API allows us to set cluster-wide or per collection rules that tell Solr how it should allocate shards and replicas in the cluster. We want to take node CPU utilization into consideration – sure, we can do that. We want to utilize the information about the free disk space on the nodes – of course, we can do that. The very nice thing is that the rules can be set for both collections and for the whole cluster. That gives us flexibility – we can set general rules for shard allocation and force certain rules per collection if that is needed.

However, the allocation rules are only used during shard assignment. Which means that once we have the shards assigned and ready Solr will not use the stored information anymore. At least that was the case till Solr 7.2. Will 7.2 we have the ability to rebalance the cluster situation using the UTILIZENODE command.

Test environment

For ease of repeating the same test I’ll be using very simple environment based on Solr 7.2. I’ll start two nodes, first one using the following command:

$ bin/solr start -c

And the second node using the following command:

$ bin/solr start -z localhost:9983 -p 6683

In addition to that I’ll create a single collection called test which will be initially built out of a single shard and I’ll add 3 replicas, all on the same node. So my final view of the cluster will look as follows:

Its cluster state looks as follows:

{"test":{
    "pullReplicas":"0",
    "replicationFactor":"1",
    "router":{"name":"compositeId"},
    "maxShardsPerNode":"1",
    "autoAddReplicas":"false",
    "nrtReplicas":"1",
    "tlogReplicas":"0",
    "shards":{"shard1":{
        "range":"80000000-7fffffff",
        "state":"active",
        "replicas":{
          "core_node2":{
            "core":"test_shard1_replica_n1",
            "base_url":"http://192.168.1.15:8983/solr",
            "node_name":"192.168.1.15:8983_solr",
            "state":"active",
            "type":"NRT",
            "leader":"true"},
          "core_node4":{
            "core":"test_shard1_replica_n3",
            "base_url":"http://192.168.1.15:8983/solr",
            "node_name":"192.168.1.15:8983_solr",
            "state":"active",
            "type":"NRT"},
          "core_node6":{
            "core":"test_shard1_replica_n5",
            "base_url":"http://192.168.1.15:8983/solr",
            "node_name":"192.168.1.15:8983_solr",
            "state":"active",
            "type":"NRT"},
          "core_node8":{
            "core":"test_shard1_replica_n7",
            "base_url":"http://192.168.1.15:8983/solr",
            "node_name":"192.168.1.15:8983_solr",
            "state":"active",
            "type":"NRT"}}}}}}

As you can see all the replicas are on the same node, the one with base_url equal to http://192.168.1.15:8983/solr.

So we have the following situation:

We have Solr cluster built of two nodes
We have a single collection working in the cluster
Our collection is built of 4 shards – one leader and three replicas
All shards are assigned to a single node

Basically we are not utilizing one of the nodes. Let’s try changing that situation by using the UTILIZENODE command of the Solr Collections API introduced in Solr 7.2.

Collections API UTILIZENODE in works

To show you how the Autoscaling API works we will first create a cluster wide preference. By using the following command we will tell Solr to try to evenly balance the cluster on the basis of the number of replicas in each of the nodes:

$ curl -XPOST 'localhost:8983/api/cluster/autoscaling' -H 'Content-Type:application/json' -d '{
 "set-cluster-preferences" : [
  {"minimize": "cores"}
 ]
}'

The above command sets the cluster wide shard allocation preferences and tells Solr to minimize the number of cores that is present on each node. Which means that Solr will try to balance the cores across all the nodes.

So now, using the UTILIZENODE command from the Collections API we will try to force Solr to automatically move some of the replicas. The call to the Collections API looks as follows:

$ curl -XGET 'localhost:8983/solr/admin/collections?action=UTILIZENODE&node=192.168.1.15:6683_solr'

The node parameter is required because we need to tell Solr which node we consider underutilized. The processing of the command can take a longer period of time, so please consider running it in a async mode. However one the command ends its execution we should see the following in the cloud view:

The cluster state after rebalancing changed as well and now looks as follows:

{"test":{
    "pullReplicas":"0",
    "replicationFactor":"1",
    "shards":{"shard1":{
        "range":"80000000-7fffffff",
        "state":"active",
        "replicas":{
          "core_node6":{
            "core":"test_shard1_replica_n5",
            "base_url":"http://192.168.1.15:8983/solr",
            "node_name":"192.168.1.15:8983_solr",
            "state":"active",
            "type":"NRT",
            "leader":"true"},
          "core_node8":{
            "core":"test_shard1_replica_n7",
            "base_url":"http://192.168.1.15:8983/solr",
            "node_name":"192.168.1.15:8983_solr",
            "state":"active",
            "type":"NRT"},
          "core_node10":{
            "core":"test_shard1_replica_n9",
            "base_url":"http://192.168.1.15:6683/solr",
            "node_name":"192.168.1.15:6683_solr",
            "state":"active",
            "type":"NRT"},
          "core_node12":{
            "core":"test_shard1_replica_n11",
            "base_url":"http://192.168.1.15:6683/solr",
            "node_name":"192.168.1.15:6683_solr",
            "state":"active",
            "type":"NRT"}}}},
    "router":{"name":"compositeId"},
    "maxShardsPerNode":"1",
    "autoAddReplicas":"false",
    "nrtReplicas":"1",
    "tlogReplicas":"0"}}

As you can see two replicas have been moved – from the 192.168.1.15:8983_solr node to 192.168.1.15:6683_solr node, which is exactly what our cluster preferences were telling Solr to do.

Summary

The UTILIZENODE from the Collections API is a form of automation of cluster management. Of course we could do the same by using the ADDREPLICA and DELETEREPLICA commands, but we would have to manually keep the cluster and collection policies in mind and be sure we are not violating any of the settings that are already present. The UTILIZENODE command gives us all that and allows us to force utilization of a given node in the cluster according to the policies that we already defined.

Solr.pl

Solr 7.2 – rebalancing replicas using UTILIZENODE

Autoscaling API in Solr

Test environment

Collections API UTILIZENODE in works

Summary

Leave a Reply Cancel reply