Enabling Tracing in Solr

With the release of Solr 8.2, we’ve got the support for Open Tracing. Vendor-neutral APIs that support distributed tracing and allow us to choose whatever backend or vendor that we want to store our traces. No matter if we want to stay in the open-source world or we want to go with one of the commercial vendors supporting Open Tracing. Let’s have a look at how into how we can enable distributed tracing in Solr.

The Setup

For the blog post, I set up a simple SolrCloud cluster built of two nodes, all running on the same machine. Nothing sophisticated, but it will be enough to show us what are the benefits of using distributed tracing.

Open Tracing Backend

To be able to use distributed tracing we need to choose some backend. At the moment of writing the only backend supported by Solr out-of-the-box is Jaeger. Adding other backends, even the commercial ones are not very problematic, but that is not the main focus in the blog post, so I’ll omit to describe that.

For my tests, I will just run the jaegertracing/all-in-one Docker container in its latest version. It is as simple as running the following command:

$ docker run -d --name jaeger -p 16686:16686 -p 6831:6831/udp -p 5775:5775/udp jaegertracing/all-in-one:latest

We run the container under the name jaeger and we map three ports – 16686, 6831, and 5775. The 16686 is needed to connect to the Jaeger UI and the 5775 is the port we will use for shipping tracing data.

You can check if everything is running correctly by running:

$ docker ps

Configuring Solr for Open Tracing

The next thing we need to do is setting up Solr. To do that the libraries from contrib/jaegertracer-configurator/lib/ directory and the solr-jaegertracer-configurator-8.6.0.jar from the dist directory needs to be placed in the Solr classpath. In my case, I just created the lib directory in the server/solr directory and copied the mentioned jar files there.

You also need to modify the solr.xml file that is present in the server/solr directory and include the tracer configuration there. If you are working with an example Solr instance you will already have some configuration in the solr.xml file, so you just need to add the following section there:

<tracerConfig name="tracerConfig" class="org.apache.solr.jaeger.JaegerTracerConfigurator">
  <str name="agentHost">localhost</str>
  <int name="agentPort">5775</int>
  <bool name="logSpans">true</bool>
  <int name="flushInterval">1000</int>
  <int name="maxQueueSize">10000</int>
</tracerConfig>

The above tracerConfig tag configures the Jaeger distributed tracer. We define the agent host as localhost and we set the port to 5775. We also tell it to log spans and we define the maximum flush interval and the maximum queue size.

Remember that all the configuration and libraries must be present on all of our Solr nodes.

The Test Cluster & Data

After doing everything above we can just start our Solr instances. In this case, I’ve used the following commands:

$ bin/solr start -c -f
$ bin/solr start -f -p 6883 -z localhost:9983

So I started two Solr instances. The first one with embedded Zookeeper running along with Solr instance and the second one connecting to that Zookeeper. A test SolrCloud cluster.

Before creating the collection that will be used for testing, I did one more thing – I set the tracing sampling to 100%, which means that every span will be shipped to our Jaeger backend. That is done by setting the cluster property called samplePercentage and giving it a value of 100. The command that I used was as follows:

$ curl -XGET 'localhost:8983/solr/admin/collections?action=CLUSTERPROP&name=samplePercentage&val=100'

Now keep in mind that this is only done for tests and in a real, production system you may want to use sampling to lower down the amount of data that you store for your distributed traces.

I also created a collection called test using the _default configuration. I don’t need any fancy features, I just need a few documents indexed and a simple query. Because of that, the following command was everything that was needed:

$ curl -XPOST -H 'Content-type:application/json' 'http://localhost:8983/api/c/'  -d '{ 
  "create": { 
    "name": "test",
    "numShards": "2"
  } 
}'

I indexed data by using the following command:

$ curl -XPOST -H 'Content-type:application/json' 'localhost:8983/solr/test/update?commit=true' -d '[
 {
  "id": 1,
  "name": "Test document 1",
  "tags": [ "doc", "test" ]
 },
 {
  "id": 2,
  "name": "Test document 2",
  "tags": [ "doc", "test" ]
 },
 {
  "id": 3,
  "name": "Test document 3",
  "tags": [ "doc", "test" ]
 }
]'

After that I run the following query:

$ curl -XGET -H 'Content-type:application/json' 'localhost:8983/solr/test/select' -d '{
  "query" : "name:document",
  "facet": {
    "tags" : {
      "terms" : {
        "field" : "tags"
      }
    }
  }	
}'

Looking into Jaeger UI

After running the query we should already have something in Jaeger. We indexed data, we run the query. After going to localhost:16686 and choosing solr as the service type we can see traces. For example, one for query:

If you need more data the tags section is there for the rescue:

Everything available just by including additional libraries and a few lines of configuration in Solr.

Going Further – Tracing In Your Code

Of course, the power of distributed tracing is that you are not limited to traces from a single source – Solr in this case. We can also include tracing in our code.

For example, if we have a simple application that runs queries to Solr we could include Open Tracing span creation and configure the Jaeger tracer, just like we did in Solr by modifying the solr.xml file. An example code in Java could look as follows (the full code is available on Github):

public class App {
    private JaegerTracer tracer;
    private HttpSolrClient solrClient;

    public static void main(String[] args) throws Exception {
        App app = new App();
        app.initTracer();
        app.initSolrClient();
        app.start();
    }

    public void start() throws Exception {
        Span span = tracer.buildSpan("example query").start();

        final Map<String, String> query = new HashMap<>();
        query.put("q", "*:*");
        MapSolrParams queryParams = new MapSolrParams(query);

        final QueryResponse queryResponse = solrClient.query("test", queryParams);
        final SolrDocumentList documents = queryResponse.getResults();

        sleep(10);
        processDocumentsSlow(documents, span, 100);

        span.finish();
    }

    private void processDocumentsSlow(SolrDocumentList documents, Span rootSpan, long sleepTime) {
        Span span = tracer
            .buildSpan("process documents")
            .asChildOf(rootSpan)
            .start();

        processDocumentsSlowNext(documents, span, 300);
        sleep(sleepTime);

        span.finish();
    }

    private void processDocumentsSlowNext(SolrDocumentList documents, Span rootSpan, long sleepTime) {
        Span span = tracer
            .buildSpan("process documents next")
            .asChildOf(rootSpan)
            .start();

        sleep(sleepTime);

        span.finish();
    }

    private void sleep(long millis) {
        try {
            Thread.sleep(millis);
        } catch (Exception ex) {}
    }

    public void initTracer() {
        if (this.tracer == null) {
            Configuration.SamplerConfiguration samplerConfiguration = new Configuration
                .SamplerConfiguration()
                .withType(ConstSampler.TYPE)
                .withParam(1);

            Configuration.ReporterConfiguration reporterConfiguration = Configuration
                .ReporterConfiguration
                .fromEnv();

            Configuration.SenderConfiguration senderConfig = reporterConfiguration
                .getSenderConfiguration()
                .withAgentHost("localhost")
                .withAgentPort(5775);

            reporterConfiguration
                .withLogSpans(true)
                .withSender(senderConfig);

            Configuration configuration = new Configuration("Jaeger with Solr")
                .withSampler(samplerConfiguration)
                .withReporter(reporterConfiguration);

            this.tracer = configuration.getTracer();
        }
    }

    public void initSolrClient() {
        if (this.solrClient == null) {
            this.solrClient = new HttpSolrClient
                .Builder("http://localhost:8983/solr")
                .build();
        }
    }
}

Apart from the initTracer method which shows how to configure Jaeger tracer programmatically the interesting piece is in the start method. We create a top-level span called example query, build a query to Solr and execute it. Next, we simulate some slowness by first calling the processDocumentsSlow method and inside it the processDocumentsSlowNext method. Each of those methods creates its span and includes it as the child of another span by calling the asChildOf method and providing the root span. This ends up looking in Jaeger UI as follows:

So now we get visibility not only into the Solr itself, but also into our code.

The Next Steps

The power of distributed is fully visibile when the whole code is producing spans and when it all gives you full visibility into the execution and the timings related to that. Open Tracing supports not only Java, but also JavaScript, Go, Python, PHP, Objective-C, C++, C#, and Ruby. So if your application stack is developed using those you can instrument your code or without any problem.

It is also worth noting that Open Tracing is just a set of APIs and Jaeger is just one of the tracers that support that API. You can use both open source solutions as well as commercial ones, depending on your needs and depending on what you are already using for monitoring logs and metrics.

Solr.pl