In case our collection is very large (or there are a few of them) in a case when full replication is needed, Solr can use all the bandwidth of our network (if the disks are fast enough). This is good in some situations, but bad in others. Imagine that you have some collections being queried and of them starts to replicate a tens of gigabytes of data – all the other collections would suffer. Thankfully with the release of Solr 5 we got replication throttling and we can use it to limit the amount of data allowed to be transferred by Solr.
To show you how replication throttling works we will compare two use cases – one will copy 2GB index without any limits and the other will copy the same index, but with throttling enabled. To illustrate that we will use SolrCloud deployment, with very simple configuration. Please note, that replication throttling is not limited to SolrCloud deployments and can be used successfully in old master – slave architecture.
Replication without throttling
To show you how network bandwidth has been used we will use the following, standard replication handler configuration:
<requestHandler name="/replication" class="solr.ReplicationHandler"> </requestHandler>
Network usage in that case looked as follows:
Replication with throttling enabled
Now let’s compare it to replication that is setup to use throttling:
<requestHandler name="/replication" class="solr.ReplicationHandler"> <lst name="defaults"> <str name="maxWriteMBPerSec">0.1</str> </lst> </requestHandler>
Network usage in this case looks as follows:
As we can see the configuration itself is very simple and it just works. Again, it does work in master – slave and in SolrCloud deployments, so no matter which way you go, you can use replication throttling. The only thing that is missing now, at least for me, is dedicated API that I could use to change the replication configuration without core reload. Maybe in the future? 😉
This post is also available in: Polish