Introducing Solr Circuit Breakers

With the Solr 8.7 release, we were given a very useful feature called circuit breakers. A circuit breaker design pattern allows stopping execution when certain criteria are met. For example, when the memory usage is higher than defined a query execution may be stopped, or when the CPU usage on a given node is too high. Let’s look at what Solr 8.7 brings us.

New Functionality

The new functionality that was introduced to Solr aims to prevent the execution of a request that causes a node to go beyond defined thresholds. For example, when the memory utilization of the JVM reaches 75% you may want to stop all requests that are hitting Solr for the ones that are processed to finish.

That’s why with the Solr 8.7 a circuit breaker code was added with two circuit breaker implementations:

JVM memory-based circuit breaker
CPU utilization based circuit breaker

So when to use the circuit breaker functionality? When you want to trade-in request throughput for stability. If you want more stability from your Solr nodes you include circuit breakers, if you want to go full throttle – you don’t.

Circuit Breaker Configuration

The circuit breaker configuration should be included in the solrconfig.xml and should be included inside the circuitBreaker tag:

<circuitBreaker class="solr.CircuitBreakerManager" enabled="true">
...
</circuitBreaker>

The enabled attribute turns the circuit breaker functionality on and off globally. When set to true they are enabled, when set to false they are disabled.

At the moment of writing this blog post we could use two circuit breakers:

CPU based
JVM memory based

The CPU based one tracks the CPU utilization and checks the average CPU usage over the last minute. If that crossed the defined threshold the circuit breaker is tripped and the execution of the request will be prevented. To enable it we would include the following properties inside the circuitBreaker tag:

<str name="cpuEnabled">true</str>
<str name="cpuThreshold">75</str>

The first property enables the CPU based circuit breaker and the second one specifies the threshold.

The JVM memory-based circuit breaker tracks the memory usage of the JVM and rejects the execution of the request if the usage is above the percentage of the maximum heap size – the one defined with the Xmx. To enable it we would include the following properties inside the circuitBreaker tag:

<str name="memEnabled">true</str>
<str name="memThreshold">80</str>

The above configuration means that the JVM memory circuit breaker is enabled and it will be triggered when the JVM heap usage is above 80%. So if our Solr heap would be set to a maximum size of 10G the circuit breaker would prevent the execution of requests if the usage is above 8G.

One thing to keep in mind is that the memThreshold can take the values between 50 and 95.

Our final configuration would look as follows:

<circuitBreaker class="solr.CircuitBreakerManager" enabled="true">
  <str name="memEnabled">true</str>
  <str name="memThreshold">80</str>
  <str name="cpuEnabled">true</str>
  <str name="cpuThreshold">75</str>
</circuitBreaker>

How it Works

Once you have the circuit breakers defined and working, if they are tripped you will see a response like this:

{
  "responseHeader":{
    "status":503,
    "QTime":0,
    "params":{
      "json":"{\n\t\"query\": \"*:*\",\n\t\"facet\": {\n\t  \"test\": {\n\t    \"terms\": {\n\t      \"field\": \"text\"\n\t    }\n\t  }\n\t}\n}"}},
  "status":"FAILURE",
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"Circuit Breakers tripped Memory Circuit Breaker triggered as JVM heap usage values are greater than allocated threshold.Seen JVM heap memory usage 503369312 and allocated threshold 429496729\n",
    "code":503}}

The response code 503 and the message about the tripped circuit breaker. In the above case, it was a memory circuit breaker. We should back-off and wait for the next request. An exponential back-off is a good practice. Though the circuit breakers do not add any significant overhead too many checks can cause performance overhead.

The Downsides

Having circuit breakers in Solr doesn’t mean that you can’t go into out of memory situations. And this is the first downside of the functionality. To simulate that I experimented.

I included the full example on our Github account and you can repeat what I did.

I started with a simple, empty Solr 8.7 node, embedded Zookeeper and I created a simple collection that had the following circuit breaker configuration:

<circuitBreaker class="solr.CircuitBreakerManager" enabled="true">
  <str name="memEnabled">true</str>
  <str name="memThreshold">80</str>
  <str name="cpuEnabled">true</str>
  <str name="cpuThreshold">75</str>
</circuitBreaker>

Once the circuit collection was created I indexed 1.000.000 documents using a simple Python script. They were randomized and I tried to create a very high cardinality field. The fields section of the schema.xml looked as follows:

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" docValues="true" />
<field name="text" type="text_ws" indexed="true" stored="true" multiValued="false" />

After that was done I run a query like this:

curl -XGET 'localhost:8983/solr/circuit/select?q=*:*&rows=0'

It resulted in a proper response:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":60,
    "params":{
      "q":"*:*",
      "rows":"0"}},
  "response":{"numFound":1000000,"start":0,"numFoundExact":true,"docs":[]
  }}

The second query looked as follows:

curl 'http://localhost:8983/solr/circuit/query' -d '{
  "query": "*:*",
  "limit": 0,
  "facet": {
    "test": {
      "terms": {
	"field": "text"
      }
    }
  }
}'

And then Solr responded with the following:

{
  "responseHeader":{
    "zkConnected":true,
    "status":500,
    "QTime":6893,
    "params":{
      "json":"{\n  \"query\": \"*:*\",\n  \"limit\": 0,\n  \"facet\": {\n    \"test\": {\n      \"terms\": {\n\t\"field\": \"text\"\n      }\n    }\n  }\n}"}},
  "response":{"numFound":1000000,"start":0,"numFoundExact":true,"docs":[]
  },
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","java.lang.OutOfMemoryError"],
    "msg":"Exception occured during uninverting text",
    "trace":"org.apache.solr.common.SolrException: Exception occured during uninverting text\n\tat org.apache.solr.search.facet.UnInvertedField.rethrowAsSolrException(UnInvertedField.java:681)\n\tat org.apache.solr.search.facet.UnInvertedField.getUnInvertedField(UnInvertedField.java:621)\n\tat org.apache.solr.search.facet.FacetFieldProcessorByArrayUIF.findStartAndEndOrds(FacetFieldProcessorByArrayUIF.java:43)\n\tat org.apache.solr.search.facet.FacetFieldProcessorByArray.calcFacets(FacetFieldProcessorByArray.java:116)\n\tat org.apache.solr.search.facet.FacetFieldProcessorByArray.process(FacetFieldProcessorByArray.java:94)\n\tat org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:454)\n\tat org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:477)\n\tat org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:433)\n\tat org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:65)\n\tat org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:454)\n\tat org.apache.solr.search.facet.FacetModule.process(FacetModule.java:150)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:360)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)\n\tat java.base/java.lang.Thread.run(Thread.java:832)\nCaused by: java.lang.OutOfMemoryError: Java heap space\n\tat org.apache.solr.search.facet.UnInvertedField.visitTerm(UnInvertedField.java:136)\n\tat org.apache.solr.uninverting.DocTermOrds.uninvert(DocTermOrds.java:350)\n\tat org.apache.solr.search.facet.UnInvertedField.<init>(UnInvertedField.java:205)\n\tat org.apache.solr.search.facet.UnInvertedField.lambda$getUnInvertedField$1(UnInvertedField.java:613)\n\tat org.apache.solr.search.facet.UnInvertedField$$Lambda$658/0x000000080116cc40.apply(Unknown Source)\n\tat org.apache.solr.util.ConcurrentLRUCache.lambda$computeIfAbsent$1(ConcurrentLRUCache.java:227)\n\tat org.apache.solr.util.ConcurrentLRUCache$$Lambda$659/0x000000080116c040.apply(Unknown Source)\n\tat java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)\n\tat org.apache.solr.util.ConcurrentLRUCache.computeIfAbsent(ConcurrentLRUCache.java:226)\n\tat org.apache.solr.search.FastLRUCache.computeIfAbsent(FastLRUCache.java:258)\n\tat org.apache.solr.search.facet.UnInvertedField.getUnInvertedField(UnInvertedField.java:610)\n\tat org.apache.solr.search.facet.FacetFieldProcessorByArrayUIF.findStartAndEndOrds(FacetFieldProcessorByArrayUIF.java:43)\n\tat org.apache.solr.search.facet.FacetFieldProcessorByArray.calcFacets(FacetFieldProcessorByArray.java:116)\n\tat org.apache.solr.search.facet.FacetFieldProcessorByArray.process(FacetFieldProcessorByArray.java:94)\n\tat org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:454)\n\tat org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:477)\n\tat org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:433)\n\tat org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:65)\n\tat org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:454)\n\tat org.apache.solr.search.facet.FacetModule.process(FacetModule.java:150)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:360)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n",
    "code":500}}

You can see – a java.lang.OutOfMemoryError is the root cause. This is an extreme example, but you should be aware that Solr looks at the JVM memory usage to compare the usage with the configuration of the circuit breaker. And that happened because the heap usage of the JVM didn’t cross the defined 80%:

Basically, at the time when the limits were checked in Solr, the JVM memory usage was reporting close to 66%, so below 80% and that’s why the request was allowed to be executed.

Summary

Even though not everything is perfect in the Solr circuit breakers world I think this is a step in the right direction. We got the CPU and JVM memory-based ones with the groundwork done for other implementations to follow. Hopefully, in the not too distant future, we will see more and more circuit breaker implementations – for example, ones limited and dedicated to certain features, like query, indexing, or faceting one. Would you like to see such implementations?

Solr.pl