<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>querying &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/querying-2/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Sat, 14 Nov 2020 14:27:47 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>SolrCloud &#8211; write and read tolerance</title>
		<link>https://solr.pl/en/2018/12/31/solrcloud-write-and-read-tolerance/</link>
					<comments>https://solr.pl/en/2018/12/31/solrcloud-write-and-read-tolerance/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 31 Dec 2018 14:27:21 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[querying]]></category>
		<category><![CDATA[solr]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=978</guid>

					<description><![CDATA[SolrCloud similar to most of the distributed systems is designed with some rules in mind. There are also rules that each distributed system is subject to. For example the CAP theorem tells that a system can&#8217;t achieve availability, data consistency]]></description>
										<content:encoded><![CDATA[
<p>SolrCloud similar to most of the distributed systems is designed with some rules in mind. There are also rules that each distributed system is subject to. For example the <a href="https://en.wikipedia.org/wiki/CAP_theorem">CAP</a> theorem tells that a system can&#8217;t achieve availability, data consistency and network partition split tolerance at the same time &#8211; you can have two out of three at most. Of course, in this blog entry, we will not be discussing principles of the distributed systems, but we will focus on write and read tolerance in SolrCloud.</p>



<span id="more-978"></span>



<h2 class="wp-block-heading">Write time tolerance</h2>



<p>Write tolerance is not a simple topic. First of all, with the introduction of Solr 7.0 we got a variety of replica types. We have NRT replicas which write data to transaction log and index data on each replica. We have TLOG type replicas which write to transaction log, but instead of indexing the data on their own they use the replication mechanism to pull the data. And finally we have the PULL replicas which do not use transaction log and only use replication mechanism to pull the data periodically from the leader shard.</p>



<p>However, we won&#8217;t be analyzing how each of the replica types work, but we will focus on the NRT replicas, because this type was here from the beginning of SolrCloud and what&#8217;s more this is still the default type of replicas in SolrCloud.</p>



<p>When it comes to NRT replicas, the indexing process is as follows. The leader accepts the data, writes it into the transaction log and sends it to all its replicas (assuming all are of NRT type). Then each of the replicas writes the data to the transaction log and return the acknowledgment. At this point we know that the data is safe. Of course, somewhere in the meantime the data will also be written to the inverted index. But the question is &#8211; what will happen when not all shards will be available? I would bet on indexing not succeeding, but to be perfectly sure &#8211; let&#8217;s check that by starting two Solr instances by using the following commands</p>



<pre class="wp-block-code"><code class="">$ bin/solr start -c</code></pre>



<pre class="wp-block-code"><code class="">$ bin/solr start -z localhost:9983 -p 6983</code></pre>



<p>Next, let&#8217;s create a collection built of two shards:</p>



<pre class="wp-block-code"><code class="">$ bin/solr create_collection -c test_index -shards 2 -replicationFactor 1</code></pre>



<p>Once the collection is created let&#8217;s stop one of the instances</p>



<pre class="wp-block-code"><code class="">$ bin/solr stop -p 6983</code></pre>



<p>And finally let&#8217;s try indexing some data by using the following command:</p>



<pre class="wp-block-code"><code class="">$ curl -XPOST -H 'Content-type:application/json' 'localhost:8983/solr/test_index/update' -d '{
 "id" : 2,
 "name" : "Test document"
}'</code></pre>



<p>As we would expect Solr returns an error:</p>



<pre class="wp-block-code"><code class="">{
  "responseHeader":{
    "status":503,
    "QTime":4011},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"No registered leader was found after waiting for 4000ms , collection: test_index slice: shard2 saw state=DocCollection(test_index//collections/test_index/state.json/8)={\n  \"pullReplicas\":\"0\",\n  \"replicationFactor\":\"1\",\n  \"shards\":{\n    \"shard1\":{\n      \"range\":\"80000000-ffffffff\",\n      \"state\":\"active\",\n      \"replicas\":{\"core_node3\":{\n          \"core\":\"test_index_shard1_replica_n1\",\n          \"base_url\":\"http://192.168.1.11:8983/solr\",\n          \"node_name\":\"192.168.1.11:8983_solr\",\n          \"state\":\"active\",\n          \"type\":\"NRT\",\n          \"force_set_state\":\"false\",\n          \"leader\":\"true\"}}},\n    \"shard2\":{\n      \"range\":\"0-7fffffff\",\n      \"state\":\"active\",\n      \"replicas\":{\"core_node4\":{\n          \"core\":\"test_index_shard2_replica_n2\",\n          \"base_url\":\"http://192.168.1.11:6983/solr\",\n          \"node_name\":\"192.168.1.11:6983_solr\",\n          \"state\":\"down\",\n          \"type\":\"NRT\",\n          \"force_set_state\":\"false\",\n          \"leader\":\"true\"}}}},\n  \"router\":{\"name\":\"compositeId\"},\n  \"maxShardsPerNode\":\"-1\",\n  \"autoAddReplicas\":\"false\",\n  \"nrtReplicas\":\"1\",\n  \"tlogReplicas\":\"0\"} with live_nodes=[192.168.1.11:8983_solr]",
    "code":503}}</code></pre>



<p>In that case we can&#8217;t really do anything. We don&#8217;t want to manually route the data and even if we would, we don&#8217;t have a guarantee that the data would end up in one of the available shards. The best we can do is bring the missing shards back to life as soon as possible <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> </p>



<p>What will happen if we have multiple replicas and only some of them are missing? In that case the write should be successful and Solr should inform us how many replicas written the data (at least in its newest versions) by including the <em>rf</em> parameter in the response. Let&#8217;s check that out.</p>



<p>Let&#8217;s create another collection, this time with a single shard and two replicas on our two Solr nodes:</p>



<pre class="wp-block-code"><code class="">$ bin/solr create_collection -c test_index_2 -shards 1 -replicationFactor 2</code></pre>



<p>If we would try to index data using exactly the same Solr would return the following response (when using with Solr 7.6.0):</p>



<pre class="wp-block-code"><code class="">{
  "responseHeader":{
    "rf":2,
    "status":0,
    "QTime":316}}</code></pre>



<p>As we can see the <em>rf</em> parameter is set to <em>2</em>. This means that the replication factor of 2 was achieved. In the scope of our collection it means that the write was successful both on the leader shards and the replica shard. If we would stop the Solr instance running on port <em>6983</em> and try to index the same data once again, we would get the following response:</p>



<pre class="wp-block-code"><code class="">{
  "responseHeader":{
    "rf":1,
    "status":0,
    "QTime":4}}</code></pre>



<p>In the earlier Solr versions in order to get the information about the achieved replication factor we had to include the <em>min_rf</em> parameter in our indexing request and set it to a value higher then 1. </p>



<h2 class="wp-block-heading">Read time tolerance</h2>



<p>When it comes to reads the situation is a bit different. If we don&#8217;t have all shards available we will loose visibility over a portion of the data. For example, having collection with 10 shards and loosing one of them means that we lost approximately 10% of the data. And of course during query, by default, Solr will not show the remaining 90% of the documents, but will just throw an error. Let&#8217;s check if that is true. To do that we will create two instances of Solr by using the following command:</p>



<pre class="wp-block-code"><code class="">$ bin/solr start -c</code></pre>



<pre class="wp-block-code"><code class="">$ bin/solr start -z localhost:9983 -p 6983</code></pre>



<p> Next, let&#8217;s create a simple collection built of two shards:</p>



<pre class="wp-block-code"><code class="">$ bin/solr create_collection -c test -shards 2 -replicationFactor 1</code></pre>



<p>And now, without indexing the data let&#8217;s just stop one instance, the one that is running on port <em>6983</em>:</p>



<pre class="wp-block-code"><code class="">$ bin/solr stop -p 6983</code></pre>



<p>Now all it takes to get an error is to run the following query:</p>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/test/select?q=*:*</code></pre>



<p>In response instead of empty results list we will get an error similar to the following one:</p>



<pre class="wp-block-code"><code class="">{
  "responseHeader":{
    "status":503,
    "QTime":6,
    "params":{
      "q":"*:*"}},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"no servers hosting shard: shard2",
    "code":503}}</code></pre>



<p>OK the default behavior is good &#8211; we have an error, because we don&#8217;t have full consistency of the data. But what if we would like to show the partial results taking the risk of not delivering the most valuable results, but also not showing error or empty pages. To achieve that Solr gives us two parameters, the <em>shards.tolerant</em> and <em>shards.info</em>. If we would like to have partial results returned we should set the first one to <em>true</em>, if we would like to have detailed information about shards we should set the second one to <em>true</em>. For example:</p>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/test/select?q=*:*&amp;shards.tolerant=true&amp;shards.info=true</code></pre>



<p>In case of the above query Solr will not return an error, partial results will be returned and an information about error on one of the shards:</p>



<pre class="wp-block-code"><code class="">{
  "responseHeader":{
    "zkConnected":true,
    "partialResults":true,
    "status":0,
    "QTime":45,
    "params":{
      "q":"*:*",
      "shards.tolerant":"true",
      "shards.info":"true"}},
  "shards.info":{
    "":{
      "error":"org.apache.solr.common.SolrException: no servers hosting shard: ",
      "trace":"org.apache.solr.common.SolrException: no servers hosting shard: \n\tat org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:165)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)\n\tat org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n",
      "time":0},
    "http://192.168.1.11:8983/solr/test_shard1_replica_n1/":{
      "numFound":0,
      "maxScore":0.0,
      "shardAddress":"http://192.168.1.11:8983/solr/test_shard1_replica_n1/",
      "time":18}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  }}</code></pre>



<p>As you can see everything works as we wanted. We got the results, we got the information that the results are partial (the <em>partialResults</em> property set to <em>true</em> in the response header) so our application would know that the results are not full and something went wrong. What&#8217;s more, we also got full information about which shard is to blame, because we added the <em>shards.info=true</em> parameter to our query.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2018/12/31/solrcloud-write-and-read-tolerance/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
