Solr 8.5.0 – bin/postlog tool

With the release of Solr 8.5.0, we’ve got the tool that allows us to index logs that Solr produces. Once we have such data in Solr we can search through it and analyze it using the tools that Solr provides. Let’s look into how we can use the newly introduced tool.

The Assumptions

During our simple test of the new functionality, we will not use any kind of fancy configuration dedicated to indexing logs. We will use the _default configuration provided by Solr by default. It will allow us to play around with the indexed data. However, when dealing with a production environment you will want to prepare a static collection structure that will allow you to index data in an expected format.

Test Environment

Our test environment is veru simple. We start Solr by using the following command:

$ bin/solr start -c -f

The next step is to create two collections – one called test and the second called logs. The first one will be used to index data and run a few queries, while the second will be used to index the logs. We can create the mentioned collections by using the following commands:

$ bin/solr create_collection -c test
$ bin/solr create_collection -c logs

In addition to that we will need a bit of data that we can index by using the following command:

$ curl -H 'Content-type:application/json' -XPOST 'localhost:8983/solr/test/update?commit=true' -d '[
 {
  "id": 1,
  "title": "Test document one"
 },
 {
  "id": 2,
  "title": "Test document two"
 }
]'

As the last step let’s run two simple queries:

$ curl -XGET 'localhost:8983/solr/test/select?q=*:*'
$ curl -XGET 'localhost:8983/solr/test/select?q=*:*&fq=id:1'

Indexing Logs

The Solr I’ve been using for the tests was installed in the /opt/solr directory and the logs were available in the /opt/solr/server/logs directory. Knowing that, to index all the logs from the mentioned directory to our logs collection is as simple as running the following command:

$ bin/postlogs http://localhost:8983/solr/logs /opt/solr/server/logs/

And almost instantly we should see a response like this:

Sending last batch ...
Committed

That means that our logs are now ready to be searched on and analyzed.

Searching Through Logs

Let’s see what we have in the logs. The simplest way to look into that is just running the match all query that will return all the documents from our collection. To do that we run the following query:

http://localhost:8983/solr/logs/select?q=*:*

The response from Solr is as follows:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":0,
    "params":{
      "q":"*:*"}},
  "response":{"numFound":23,"start":0,"docs":[
      {
        "date_dt":"2020-03-28T11:25:52.506Z",
        "qtime_i":33,
        "status_s":"0",
        "params_t":"wt=json",
        "wt_s":"json",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "path_s":"/admin/info/system",
        "type_s":"admin",
        "id":"673b31e1-4e7f-4add-8f11-21094a20a790",
        "file_s":"solr.log",
        "_version_":1662407100088713216},
      {
        "date_dt":"2020-03-28T11:25:52.542Z",
        "qtime_i":2,
        "status_s":"0",
        "params_t":"action=CLUSTERSTATUS&wt=json",
        "wt_s":"json",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "path_s":"/admin/collections",
        "type_s":"admin",
        "id":"a14d72a3-3a08-406e-bbb8-455743b770ea",
        "file_s":"solr.log",
        "_version_":1662407100099198976},
      {
        "date_dt":"2020-03-28T11:25:52.749Z",
        "qtime_i":0,
        "status_s":"0",
        "params_t":"action=list&wt=json",
        "wt_s":"json",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "path_s":"/admin/collections",
        "type_s":"admin",
        "id":"12141bf6-6a46-49f6-9cc9-72a037755771",
        "file_s":"solr.log",
        "_version_":1662407100100247552},
      {
        "date_dt":"2020-03-28T11:25:54.645Z",
        "core_s":"test_shard1_replica_n1",
        "type_s":"newSearcher",
        "line_t":"2020-03-28 11:25:54.645 INFO  (searcherExecutor-14-thread-1-processing-n:192.168.1.197:8983_solr x:test_shard1_replica_n1 c:test s:shard1 r:core_node2) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.c.SolrCore [test_shard1_replica_n1] Registered new searcher Searcher@7b44e20e[test_shard1_replica_n1] main{ExitableDirectoryReader(UninvertingDirectoryReader())}",
        "id":"66b4a193-17de-46fb-a586-d3359c20af0b",
        "file_s":"solr.log",
        "_version_":1662407100101296128},
      {
        "date_dt":"2020-03-28T11:25:54.795Z",
        "qtime_i":1624,
        "status_s":"0",
        "params_t":"qt=/admin/cores&coreNodeName=core_node2&collection.configName=test&newCollection=true&name=test_shard1_replica_n1&action=CREATE&numShards=1&collection=test&shard=shard1&wt=javabin&version=2&replicaType=NRT",
        "wt_s":"javabin",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "collection_s":"test",
        "core_s":"test_shard1_replica_n1",
        "shard_s":"shard1",
        "replica_s":"core_node2",
        "path_s":"/admin/cores",
        "type_s":"admin",
        "id":"31db40d4-9273-47d6-af6e-98665b2b0bb6",
        "file_s":"solr.log",
        "_version_":1662407100102344704},
      {
        "date_dt":"2020-03-28T11:25:54.899Z",
        "qtime_i":2146,
        "status_s":"0",
        "params_t":"replicationFactor=1&maxShardsPerNode=-1&collection.configName=test&name=test&action=CREATE&numShards=1&wt=json",
        "wt_s":"json",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "path_s":"/admin/collections",
        "type_s":"admin",
        "id":"51a735fc-fc90-49b6-ae13-2823a708c287",
        "file_s":"solr.log",
        "_version_":1662407100103393280},
      {
        "date_dt":"2020-03-28T11:26:20.675Z",
        "qtime_i":2,
        "status_s":"0",
        "params_t":"wt=javabin&version=2&key=solr.core.test.shard1.replica_n1:QUERY./select.requests&key=solr.core.test.shard1.replica_n1:UPDATE./update.requests&key=solr.core.test.shard1.replica_n1:INDEX.sizeInBytes",
        "wt_s":"javabin",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "path_s":"/admin/metrics",
        "type_s":"admin",
        "id":"17928b68-4a83-41e9-84b2-15533028c0c4",
        "file_s":"solr.log",
        "_version_":1662407100104441856},
      {
        "date_dt":"2020-03-28T11:26:20.683Z",
        "qtime_i":1,
        "status_s":"0",
        "params_t":"wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used",
        "wt_s":"javabin",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "path_s":"/admin/metrics",
        "type_s":"admin",
        "id":"a8cd34ff-4a13-4d6e-a59a-21b882aacd19",
        "file_s":"solr.log",
        "_version_":1662407100105490432},
      {
        "date_dt":"2020-03-28T11:27:10.751Z",
        "type_s":"commit",
        "line_t":"2020-03-28 11:27:10.751 INFO  (qtp776700275-20) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.u.DirectUpdateHandler2 start commit{_version_=1662406970049560576,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}",
        "soft_commit_s":"false",
        "open_searcher_s":"true",
        "collection_s":"test",
        "core_s":"test_shard1_replica_n1",
        "shard_s":"shard1",
        "replica_s":"core_node2",
        "id":"11eacb14-ba19-4fcf-bf37-e13e75b58a17",
        "file_s":"solr.log",
        "_version_":1662407100106539008},
      {
        "date_dt":"2020-03-28T11:27:10.888Z",
        "core_s":"test_shard1_replica_n1",
        "type_s":"newSearcher",
        "line_t":"2020-03-28 11:27:10.888 INFO  (searcherExecutor-14-thread-1-processing-n:192.168.1.197:8983_solr x:test_shard1_replica_n1 c:test s:shard1 r:core_node2) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.c.SolrCore [test_shard1_replica_n1] Registered new searcher Searcher@404ea7b6[test_shard1_replica_n1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(8.5.0):C2:[diagnostics={os=Mac OS X, java.vendor=Oracle Corporation, java.version=1.8.0_191, java.vm.version=25.191-b12, lucene.version=8.5.0, os.arch=x86_64, java.runtime.version=1.8.0_191-b12, source=flush, os.version=10.15.4, timestamp=1585394830809}]:[attributes={Lucene50StoredFieldsFormat.mode=BEST_SPEED}])))}",
        "id":"6a0a6fe0-7390-49c8-921d-e269f14de12f",
        "file_s":"solr.log",
        "_version_":1662407100107587584}]
  }}

Of course we have to remember that we are using the _default configuration. Though it is not perfect we can already see a few things that are potentially interesting:

  • time of the event is available in the date_dt field
  • duration of the event is available in the qtime_i field
  • the query parameters are available in the params_s field
  • the collection name is stored in the collection_s field
  • the shard and the core are stored in the shard_s and the core_s fields

Knowing that we can for example ask Solr for all the queries that were run using the /select request handler and see which collections were used and which queries were executed between 0 to 100 milliseconds and which were executed longer than that. A query that would fulfill that looks as follows:

http://localhost:8983/solr/logs/select?q=path_s:\/select&facet=true&facet.field=collection_s&facet.interval=qtime_i&&facet.interval.set=[0,100)&facet.interval.set=[100,*]

The response returned by Solr for the above query looks as follows:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":0,
    "params":{
      "q":"path_s:\\/select",
      "facet.field":"collection_s",
      "facet.interval":"qtime_i",
      "facet":"true",
      "facet.interval.set":["[0,100)",
        "[100,*]"]}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "date_dt":"2020-03-28T11:27:39.401Z",
        "qtime_i":51,
        "status_s":"0",
        "hits_l":2,
        "params_t":"q=*:*",
        "q_s":"*:*",
        "q_t":"*:*",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "collection_s":"test",
        "core_s":"test_shard1_replica_n1",
        "shard_s":"shard1",
        "replica_s":"core_node2",
        "path_s":"/select",
        "type_s":"query",
        "id":"3af60e9f-85cf-4958-89c5-b239e80af179",
        "file_s":"solr.log",
        "_version_":1662407100112830464},
      {
        "date_dt":"2020-03-28T11:27:52.597Z",
        "qtime_i":11,
        "status_s":"0",
        "hits_l":1,
        "params_t":"q=*:*&fq=id:1",
        "q_s":"*:*",
        "q_t":"*:*",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "collection_s":"test",
        "core_s":"test_shard1_replica_n1",
        "shard_s":"shard1",
        "replica_s":"core_node2",
        "path_s":"/select",
        "type_s":"query",
        "id":"c6190ce0-845b-4fe8-b646-e5fc3d69bc80",
        "file_s":"solr.log",
        "_version_":1662407100113879040}]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "collection_s":[
        "test",2,
        "logs",0]},
    "facet_ranges":{},
    "facet_intervals":{
      "qtime_i":{
        "[0,100)":2,
        "[100,*]":0}},
    "facet_heatmaps":{}}}

Two queries were found, both of them run to the test collection and executed for less than 100 milliseconds.

The Summary

The bin/postlogs tool is, in my opinion, a step in the right direction when it comes to a quick and easy way of indexing, searching and analyzing logs on demand. After indexing the data we not only gain the possibility of full-text searching through the data, but we can also use faceting to get some insights over the logs. Of course, if we would like to use this method in a production environment we would have to index data periodically, index it into a separate Solr cluster and maintain that cluster. However, this is a topic for a different blog post 🙂 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.