<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>8.5.0 &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/8-5-0-2/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Sat, 14 Nov 2020 15:18:22 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>Solr 8.5.0 &#8211; bin/postlog tool</title>
		<link>https://solr.pl/en/2020/03/30/solr-8-5-0-bin-postlog-tool/</link>
					<comments>https://solr.pl/en/2020/03/30/solr-8-5-0-bin-postlog-tool/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 30 Mar 2020 14:17:55 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[8.5.0]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[logs]]></category>
		<category><![CDATA[postlog]]></category>
		<category><![CDATA[solr]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=1018</guid>

					<description><![CDATA[With the release of Solr 8.5.0, we&#8217;ve got the tool that allows us to index logs that Solr produces. Once we have such data in Solr we can search through it and analyze it using the tools that Solr provides.]]></description>
										<content:encoded><![CDATA[
<p>With the release of Solr 8.5.0, we&#8217;ve got the tool that allows us to index logs that Solr produces. Once we have such data in Solr we can search through it and analyze it using the tools that Solr provides. Let&#8217;s look into how we can use the newly introduced tool.</p>



<span id="more-1018"></span>



<h2 class="wp-block-heading">The Assumptions</h2>



<p>During our simple test of the new functionality, we will not use any kind of fancy configuration dedicated to indexing logs. We will use the&nbsp;<em>_default</em>&nbsp;configuration provided by Solr by default. It will allow us to play around with the indexed data. However, when dealing with a production environment you will want to prepare a static collection structure that will allow you to index data in an expected format.</p>



<h2 class="wp-block-heading">Test Environment</h2>



<p>Our test environment is veru simple. We start Solr by using the following command:</p>



<pre class="wp-block-code"><code class="">$ bin/solr start -c -f</code></pre>



<p>The next step is to create two collections &#8211; one called <em>test</em> and the second called <em>logs</em>. The first one will be used to index data and run a few queries, while the second will be used to index the logs. We can create the mentioned collections by using the following commands:</p>



<pre class="wp-block-code"><code class="">$ bin/solr create_collection -c test
$ bin/solr create_collection -c logs</code></pre>



<p>In addition to that we will need a bit of data that we can index by using the following command:</p>



<pre class="wp-block-code"><code class="">$ curl -H 'Content-type:application/json' -XPOST 'localhost:8983/solr/test/update?commit=true' -d '[
 {
  "id": 1,
  "title": "Test document one"
 },
 {
  "id": 2,
  "title": "Test document two"
 }
]'</code></pre>



<p>As the last step let&#8217;s run two simple queries:</p>



<pre class="wp-block-code"><code class="">$ curl -XGET 'localhost:8983/solr/test/select?q=*:*'
$ curl -XGET 'localhost:8983/solr/test/select?q=*:*&amp;fq=id:1'</code></pre>



<h2 class="wp-block-heading">Indexing Logs</h2>



<p>The Solr I&#8217;ve been using for the tests was installed in the <em>/opt/solr</em> directory and the logs were available in the <em>/opt/solr/server/logs</em> directory. Knowing that, to index all the logs from the mentioned directory to our <em>logs</em> collection is as simple as running the following command:</p>



<pre class="wp-block-code"><code class="">$ bin/postlogs http://localhost:8983/solr/logs /opt/solr/server/logs/</code></pre>



<p>And almost instantly we should see a response like this:</p>



<pre class="wp-block-code"><code class="">Sending last batch ...
Committed</code></pre>



<p>That means that our logs are now ready to be searched on and analyzed. </p>



<h2 class="wp-block-heading">Searching Through Logs</h2>



<p>Let&#8217;s see what we have in the logs. The simplest way to look into that is just running the <em>match all</em> query that will return all the documents from our collection. To do that we run the following query:</p>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/logs/select?q=*:*</code></pre>



<p>The response from Solr is as follows:</p>



<pre class="wp-block-code"><code class="">{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":0,
    "params":{
      "q":"*:*"}},
  "response":{"numFound":23,"start":0,"docs":[
      {
        "date_dt":"2020-03-28T11:25:52.506Z",
        "qtime_i":33,
        "status_s":"0",
        "params_t":"wt=json",
        "wt_s":"json",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "path_s":"/admin/info/system",
        "type_s":"admin",
        "id":"673b31e1-4e7f-4add-8f11-21094a20a790",
        "file_s":"solr.log",
        "_version_":1662407100088713216},
      {
        "date_dt":"2020-03-28T11:25:52.542Z",
        "qtime_i":2,
        "status_s":"0",
        "params_t":"action=CLUSTERSTATUS&amp;wt=json",
        "wt_s":"json",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "path_s":"/admin/collections",
        "type_s":"admin",
        "id":"a14d72a3-3a08-406e-bbb8-455743b770ea",
        "file_s":"solr.log",
        "_version_":1662407100099198976},
      {
        "date_dt":"2020-03-28T11:25:52.749Z",
        "qtime_i":0,
        "status_s":"0",
        "params_t":"action=list&amp;wt=json",
        "wt_s":"json",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "path_s":"/admin/collections",
        "type_s":"admin",
        "id":"12141bf6-6a46-49f6-9cc9-72a037755771",
        "file_s":"solr.log",
        "_version_":1662407100100247552},
      {
        "date_dt":"2020-03-28T11:25:54.645Z",
        "core_s":"test_shard1_replica_n1",
        "type_s":"newSearcher",
        "line_t":"2020-03-28 11:25:54.645 INFO  (searcherExecutor-14-thread-1-processing-n:192.168.1.197:8983_solr x:test_shard1_replica_n1 c:test s:shard1 r:core_node2) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.c.SolrCore [test_shard1_replica_n1] Registered new searcher Searcher@7b44e20e[test_shard1_replica_n1] main{ExitableDirectoryReader(UninvertingDirectoryReader())}",
        "id":"66b4a193-17de-46fb-a586-d3359c20af0b",
        "file_s":"solr.log",
        "_version_":1662407100101296128},
      {
        "date_dt":"2020-03-28T11:25:54.795Z",
        "qtime_i":1624,
        "status_s":"0",
        "params_t":"qt=/admin/cores&amp;coreNodeName=core_node2&amp;collection.configName=test&amp;newCollection=true&amp;name=test_shard1_replica_n1&amp;action=CREATE&amp;numShards=1&amp;collection=test&amp;shard=shard1&amp;wt=javabin&amp;version=2&amp;replicaType=NRT",
        "wt_s":"javabin",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "collection_s":"test",
        "core_s":"test_shard1_replica_n1",
        "shard_s":"shard1",
        "replica_s":"core_node2",
        "path_s":"/admin/cores",
        "type_s":"admin",
        "id":"31db40d4-9273-47d6-af6e-98665b2b0bb6",
        "file_s":"solr.log",
        "_version_":1662407100102344704},
      {
        "date_dt":"2020-03-28T11:25:54.899Z",
        "qtime_i":2146,
        "status_s":"0",
        "params_t":"replicationFactor=1&amp;maxShardsPerNode=-1&amp;collection.configName=test&amp;name=test&amp;action=CREATE&amp;numShards=1&amp;wt=json",
        "wt_s":"json",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "path_s":"/admin/collections",
        "type_s":"admin",
        "id":"51a735fc-fc90-49b6-ae13-2823a708c287",
        "file_s":"solr.log",
        "_version_":1662407100103393280},
      {
        "date_dt":"2020-03-28T11:26:20.675Z",
        "qtime_i":2,
        "status_s":"0",
        "params_t":"wt=javabin&amp;version=2&amp;key=solr.core.test.shard1.replica_n1:QUERY./select.requests&amp;key=solr.core.test.shard1.replica_n1:UPDATE./update.requests&amp;key=solr.core.test.shard1.replica_n1:INDEX.sizeInBytes",
        "wt_s":"javabin",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "path_s":"/admin/metrics",
        "type_s":"admin",
        "id":"17928b68-4a83-41e9-84b2-15533028c0c4",
        "file_s":"solr.log",
        "_version_":1662407100104441856},
      {
        "date_dt":"2020-03-28T11:26:20.683Z",
        "qtime_i":1,
        "status_s":"0",
        "params_t":"wt=javabin&amp;version=2&amp;key=solr.jvm:os.processCpuLoad&amp;key=solr.node:CONTAINER.fs.coreRoot.usableSpace&amp;key=solr.jvm:os.systemLoadAverage&amp;key=solr.jvm:memory.heap.used",
        "wt_s":"javabin",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "path_s":"/admin/metrics",
        "type_s":"admin",
        "id":"a8cd34ff-4a13-4d6e-a59a-21b882aacd19",
        "file_s":"solr.log",
        "_version_":1662407100105490432},
      {
        "date_dt":"2020-03-28T11:27:10.751Z",
        "type_s":"commit",
        "line_t":"2020-03-28 11:27:10.751 INFO  (qtp776700275-20) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.u.DirectUpdateHandler2 start commit{_version_=1662406970049560576,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}",
        "soft_commit_s":"false",
        "open_searcher_s":"true",
        "collection_s":"test",
        "core_s":"test_shard1_replica_n1",
        "shard_s":"shard1",
        "replica_s":"core_node2",
        "id":"11eacb14-ba19-4fcf-bf37-e13e75b58a17",
        "file_s":"solr.log",
        "_version_":1662407100106539008},
      {
        "date_dt":"2020-03-28T11:27:10.888Z",
        "core_s":"test_shard1_replica_n1",
        "type_s":"newSearcher",
        "line_t":"2020-03-28 11:27:10.888 INFO  (searcherExecutor-14-thread-1-processing-n:192.168.1.197:8983_solr x:test_shard1_replica_n1 c:test s:shard1 r:core_node2) [c:test s:shard1 r:core_node2 x:test_shard1_replica_n1] o.a.s.c.SolrCore [test_shard1_replica_n1] Registered new searcher Searcher@404ea7b6[test_shard1_replica_n1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(8.5.0):C2:[diagnostics={os=Mac OS X, java.vendor=Oracle Corporation, java.version=1.8.0_191, java.vm.version=25.191-b12, lucene.version=8.5.0, os.arch=x86_64, java.runtime.version=1.8.0_191-b12, source=flush, os.version=10.15.4, timestamp=1585394830809}]:[attributes={Lucene50StoredFieldsFormat.mode=BEST_SPEED}])))}",
        "id":"6a0a6fe0-7390-49c8-921d-e269f14de12f",
        "file_s":"solr.log",
        "_version_":1662407100107587584}]
  }}</code></pre>



<p>Of course we have to remember that we are using the <em>_default</em> configuration. Though it is not perfect we can already see a few things that are potentially interesting:</p>



<ul class="wp-block-list"><li>time of the event is available in the <strong>date_dt</strong> field</li><li>duration of the event is available in the <strong>qtime_i</strong> field</li><li>the query parameters are available in the <strong>params_s</strong> field</li><li>the collection name is stored in the <strong>collection_s</strong> field</li><li>the shard and the core are stored in the <strong>shard_s</strong> and the <strong>core_s</strong> fields</li></ul>



<p>Knowing that we can for example ask Solr for all the queries that were run using the <strong>/select</strong> request handler and see which collections were used and which queries were executed between 0 to 100 milliseconds and which were executed longer than that. A query that would fulfill that looks as follows:</p>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/logs/select?q=path_s:\/select&amp;facet=true&amp;facet.field=collection_s&amp;facet.interval=qtime_i&amp;&amp;facet.interval.set=[0,100)&amp;facet.interval.set=[100,*]</code></pre>



<p>The response returned by Solr for the above query looks as follows:</p>



<pre class="wp-block-code"><code class="">{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":0,
    "params":{
      "q":"path_s:\\/select",
      "facet.field":"collection_s",
      "facet.interval":"qtime_i",
      "facet":"true",
      "facet.interval.set":["[0,100)",
        "[100,*]"]}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "date_dt":"2020-03-28T11:27:39.401Z",
        "qtime_i":51,
        "status_s":"0",
        "hits_l":2,
        "params_t":"q=*:*",
        "q_s":"*:*",
        "q_t":"*:*",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "collection_s":"test",
        "core_s":"test_shard1_replica_n1",
        "shard_s":"shard1",
        "replica_s":"core_node2",
        "path_s":"/select",
        "type_s":"query",
        "id":"3af60e9f-85cf-4958-89c5-b239e80af179",
        "file_s":"solr.log",
        "_version_":1662407100112830464},
      {
        "date_dt":"2020-03-28T11:27:52.597Z",
        "qtime_i":11,
        "status_s":"0",
        "hits_l":1,
        "params_t":"q=*:*&amp;fq=id:1",
        "q_s":"*:*",
        "q_t":"*:*",
        "distrib_s":"true",
        "shards_s":"false",
        "ids_s":"false",
        "collection_s":"test",
        "core_s":"test_shard1_replica_n1",
        "shard_s":"shard1",
        "replica_s":"core_node2",
        "path_s":"/select",
        "type_s":"query",
        "id":"c6190ce0-845b-4fe8-b646-e5fc3d69bc80",
        "file_s":"solr.log",
        "_version_":1662407100113879040}]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "collection_s":[
        "test",2,
        "logs",0]},
    "facet_ranges":{},
    "facet_intervals":{
      "qtime_i":{
        "[0,100)":2,
        "[100,*]":0}},
    "facet_heatmaps":{}}}</code></pre>



<p>Two queries were found, both of them run to the <strong>test</strong> collection and executed for less than 100 milliseconds. </p>



<h2 class="wp-block-heading">The Summary</h2>



<p>The&nbsp;<strong>bin/postlogs</strong>&nbsp;tool is, in my opinion, a step in the right direction when it comes to a quick and easy way of indexing, searching and analyzing logs on demand. After indexing the data we not only gain the possibility of full-text searching through the data, but we can also use faceting to get some insights over the logs. Of course, if we would like to use this method in a production environment we would have to index data periodically, index it into a separate Solr cluster and maintain that cluster. However, this is a topic for a different blog post <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" />&nbsp;</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2020/03/30/solr-8-5-0-bin-postlog-tool/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
