<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>solrcloud &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/solrcloud-2/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Sat, 14 Nov 2020 14:28:22 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>SolrCloud and query execution control</title>
		<link>https://solr.pl/en/2019/01/14/solrcloud-and-query-execution-control/</link>
					<comments>https://solr.pl/en/2019/01/14/solrcloud-and-query-execution-control/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 14 Jan 2019 14:27:53 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[control]]></category>
		<category><![CDATA[execution]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[solrcloud]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=980</guid>

					<description><![CDATA[With the release of Solr 7.0 and introduction of new replica types, in addition to the defa?ult NRT type the question appeared &#8211; can we control the queries and where they are executed? Can we tell Solr to execute the]]></description>
										<content:encoded><![CDATA[
<p>With the release of Solr 7.0 and introduction of new replica types, in addition to the defa?ult NRT type the question appeared &#8211; can we control the queries and where they are executed? Can we tell Solr to execute the queries only on the PULL replicas or give TLOG replicas a priority? Let&#8217;s check that out.</p>



<span id="more-980"></span>



<h2 class="wp-block-heading">Shards parameter</h2>



<p>The first control option that we have in SolrCloud is the <em>shards</em> parameter. Using it we can directly control which shards should be used for querying. For example we can provide a logical shard name in our query:</p>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/test/select?q=*:*&amp;shards=shard1</code></pre>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/test/select?q=*:*&amp;shards=shard1,shard2,shard3</code></pre>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/test/select?q=*:*&amp;shards=localhost:6683/solr/test</code></pre>



<p>The first of the above queries will be executed only on those shards that are grouped under the logical <em>shard1</em> name. The second query will be executed on logical <em>shard1</em>, <em>shard2</em> and <em>shard3</em>, while the third query will be executed on the shards that are deployed on the <em>localhost:6683</em> node on the <em>test</em> collection</p>



<p>There is also a possibility to do load balancing across instances, for example:</p>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/test/select?q=*:*&amp;shards=localhost:6683/solr/test|localhost:7783/solr/test</code></pre>



<p>The above query will be executed on instance running on port <em>6683</em> or on the one running on port <em>7783</em>. </p>



<h2 class="wp-block-heading">Shards.preference parameter</h2>



<p>While the <em>shards</em> parameter gives us some degree of control where the query should be executed it is not exactly what we would like to have. However to use a certain type of replica we would have to get the data about the physical layout of the shards and this is not something that we would like to do. Because of that the <em>shards.preference</em> parameter has been introduced to Solr. It allows us to tell Solr what type of replicas should have the priority when executing query. </p>



<p>For example, to tell Solr that PULL type replicas should have priority when the query is executed one should add the <em>shards.preference</em> parameter to the query and set it to <em>replica.type:PULL</em>:</p>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/test/select?q=*:*&amp;shards.preference=replica.type:PULL</code></pre>



<p>The nice thing is that we can tell Solr that first PULL replicas should be used and then if they are not available TLOG replicas should be used:</p>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/test/select?q=*:*&amp;shards.preference=replica.type:PULL,replica.type:TLOG</code></pre>



<p>We can also define that PULL types replicas should be used first and if they are not available local shards should have the priority:</p>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/test/select?q=*:*&amp;shards.preference=replica.type:PULL,replica.location:local</code></pre>



<p>In addition to the above example we can also define priority based on location of the replicas. For example if our <em>192.168.1.1</em> Solr node is way more powerful compared to the others and we would like to first prioritize PULL replicas and then the mentioned Solr node we would run the following query:</p>



<pre class="wp-block-code"><code class="">http://localhost:8983/solr/test/select?q=*:*&amp;shards.preference=replica.type:PULL,replica.location:http://192.168.1.1</code></pre>



<h2 class="wp-block-heading">Summary</h2>



<p>The discussed parameters and the <em>shards.preference</em> in particular with its <em>replica.type</em> value can be very useful when we are using SolrCloud with different types of replicas. Telling Solr that we would like to prefer PULL or TLOG replicas we can lower the query based pressure on the NRT replicas and thus have better performance of the whole cluster. What&#8217;s more &#8211; dividing the replicas can help us in achieving query performance that is close what Solr master &#8211; slave architecture provides without sacrificing all the goodies that come with SolrCloud itself. </p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2019/01/14/solrcloud-and-query-execution-control/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>SolrCloud &#8211; What happens when ZooKeeper fails?</title>
		<link>https://solr.pl/en/2013/12/02/solrcloud-what-happens-when-zookeeper-fails-2/</link>
					<comments>https://solr.pl/en/2013/12/02/solrcloud-what-happens-when-zookeeper-fails-2/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 02 Dec 2013 14:13:35 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[solrcloud]]></category>
		<category><![CDATA[zookeeper]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=620</guid>

					<description><![CDATA[One of the questions I tend to get is what happens with SolrCloud cluster when ZooKeeper fails. Of course we are not talking about a single ZooKeeper instance failure, but the whole ensemble not being accessible and so the quorum]]></description>
										<content:encoded><![CDATA[<p>One of the questions I tend to get is what happens with SolrCloud cluster when ZooKeeper fails. Of course we are not talking about a single ZooKeeper instance failure, but the whole <em>ensemble</em> not being accessible and so the <em>quorum</em> not present. Because the answer to this question is very easy to verify i decided to make a simple blog post to show what happens when ZooKeeper fails.</p>
<p><span id="more-620"></span></p>
<h3>Test environment</h3>
<p>The test environment was very simple:</p>
<ul>
<li>A single virtual machine running under Linux operating system</li>
<li>A single instance of ZooKeeper (which will be suitable for our test)</li>
<li>Two Solr instances with a single collection deployed</li>
<li>Solr <a title="Apache Lucene and Solr 4.6" href="http://solr.pl/en/2013/11/24/apache-lucene-and-solr-4-6/">4.6</a></li>
</ul>
<p>In order to create our test collection I&#8217;ve uploaded the configuration to ZooKeeper and used the following command:
</p>
<pre class="brush:bash">curl 'http://localhost:8983/solr/admin/collections?action=CREATE&amp;name=collection1&amp;numShards=2&amp;replicationFactor=1'</pre>
<p>The cloud view of the example cluster was as follows:
</p>
<p style="text-align: center;"><a href="http://solr.pl/wp-content/uploads/2013/12/cloud_view.png"><img decoding="async" class="aligncenter  wp-image-3330" alt="cloud_view" src="http://solr.pl/wp-content/uploads/2013/12/cloud_view.png" width="495" height="44"></a></p>
<h3>Test data indexing</h3>
<p>The next step in our test will be indexing. We will index a few example documents that are provided with Solr in the <em>exampledocs</em> directory. The following commands were used to index the data:
</p>
<pre class="brush:bash">curl 'localhost:8983/solr/collection1/update?commit=true' --data-binary @mem.xml -H 'Content-type:application/xml'
curl 'localhost:8983/solr/collection1/update?commit=true' --data-binary @monitor.xml -H 'Content-type:application/xml'
curl 'localhost:8983/solr/collection1/update?commit=true' --data-binary @monitor2.xml -H 'Content-type:application/xml'</pre>
<p>After executing the above commands we get the following number of documents:</p>
<ul>
<li>The whole collection holds <strong>5</strong> documents</li>
<li>Shard located on Solr running on port <strong>8983</strong> host <strong>1</strong> document</li>
<li>Shard located on Solr running on port<strong> 7983</strong> has <strong>4</strong> documents</li>
</ul>
<h3>Querying with ZooKeeper not present</h3>
<p>Now we go to the next step &#8211; we shutdown our ZooKeeper instance and we try to run a simple query by sending the following command:
</p>
<pre class="brush:bash">curl 'localhost:8983/solr/collection1/select?q=*:*&amp;indent=true'</pre>
<p>In result we get the following response:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
 &lt;lst name="responseHeader"&gt;
  &lt;int name="status"&gt;0&lt;/int&gt;
  &lt;int name="QTime"&gt;16&lt;/int&gt;
  &lt;lst name="params"&gt;
   &lt;str name="indent"&gt;true&lt;/str&gt;
   &lt;str name="q"&gt;*:*&lt;/str&gt;
  &lt;/lst&gt;
 &lt;/lst&gt;
&lt;result name="response" numFound="5" start="0" maxScore="1.0"&gt;
&lt;doc&gt;
 &lt;str name="id"&gt;TWINX2048-3200PRO&lt;/str&gt; 
 &lt;str name="name"&gt;CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail&lt;/str&gt;
 &lt;str name="manu"&gt;Corsair Microsystems Inc.&lt;/str&gt;
 &lt;str name="manu_id_s"&gt;corsair&lt;/str&gt;
 &lt;arr name="cat"&gt;
  &lt;str&gt;electronics&lt;/str&gt;
  &lt;str&gt;memory&lt;/str&gt;
 &lt;/arr&gt;
 &lt;arr name="features"&gt;
  &lt;str&gt;CAS latency 2,    2-3-3-6 timing, 2.75v, unbuffered, heat-spreader&lt;/str&gt;
 &lt;/arr&gt;
 &lt;float name="price"&gt;185.0&lt;/float&gt;
 &lt;str name="price_c"&gt;185,USD&lt;/str&gt;
 &lt;int name="popularity"&gt;5&lt;/int&gt;
 &lt;bool name="inStock"&gt;true&lt;/bool&gt;
 &lt;str name="store"&gt;37.7752,-122.4232&lt;/str&gt;
 &lt;date name="manufacturedate_dt"&gt;2006-02-13T15:26:37Z&lt;/date&gt;
 &lt;str name="payloads"&gt;electronics|6.0 memory|3.0&lt;/str&gt;
 &lt;long name="_version_"&gt;1453219034197655552&lt;/long&gt;
&lt;/doc&gt;
&lt;doc&gt;
 &lt;str name="id"&gt;VS1GB400C3&lt;/str&gt;
 &lt;str name="name"&gt;CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail&lt;/str&gt;
 &lt;str name="manu"&gt;Corsair Microsystems Inc.&lt;/str&gt;
 &lt;str name="manu_id_s"&gt;corsair&lt;/str&gt;
 &lt;arr name="cat"&gt;
  &lt;str&gt;electronics&lt;/str&gt;
  &lt;str&gt;memory&lt;/str&gt;
 &lt;/arr&gt;
 &lt;float name="price"&gt;74.99&lt;/float&gt;
 &lt;str name="price_c"&gt;74.99,USD&lt;/str&gt;
 &lt;int name="popularity"&gt;7&lt;/int&gt;
 &lt;bool name="inStock"&gt;true&lt;/bool&gt;
 &lt;str name="store"&gt;37.7752,-100.0232&lt;/str&gt;
 &lt;date name="manufacturedate_dt"&gt;2006-02-13T15:26:37Z&lt;/date&gt;
 &lt;str name="payloads"&gt;electronics|4.0 memory|2.0&lt;/str&gt;
 &lt;long name="_version_"&gt;1453219034252181504&lt;/long&gt;
&lt;/doc&gt;
&lt;doc&gt;
 &lt;str name="id"&gt;VDBDB1A16&lt;/str&gt;
 &lt;str name="name"&gt;A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM&lt;/str&gt;
 &lt;str name="manu"&gt;A-DATA Technology Inc.&lt;/str&gt;
 &lt;str name="manu_id_s"&gt;corsair&lt;/str&gt;
 &lt;arr name="cat"&gt;
  &lt;str&gt;electronics&lt;/str&gt;
  &lt;str&gt;memory&lt;/str&gt;
 &lt;/arr&gt;
 &lt;arr name="features"&gt;
  &lt;str&gt;CAS latency 3,     2.7v&lt;/str&gt;
 &lt;/arr&gt;
 &lt;int name="popularity"&gt;0&lt;/int&gt;
 &lt;bool name="inStock"&gt;true&lt;/bool&gt;
 &lt;str name="store"&gt;45.18414,-93.88141&lt;/str&gt;
 &lt;date name="manufacturedate_dt"&gt;2006-02-13T15:26:37Z&lt;/date&gt;
 &lt;str name="payloads"&gt;electronics|0.9 memory|0.1&lt;/str&gt;
 &lt;long name="_version_"&gt;1453219034255327232&lt;/long&gt;
&lt;/doc&gt;
&lt;doc&gt;
 &lt;str name="id"&gt;3007WFP&lt;/str&gt;
 &lt;str name="name"&gt;Dell Widescreen UltraSharp 3007WFP&lt;/str&gt;
 &lt;str name="manu"&gt;Dell, Inc.&lt;/str&gt;
 &lt;str name="manu_id_s"&gt;dell&lt;/str&gt;
 &lt;arr name="cat"&gt;
  &lt;str&gt;electronics&lt;/str&gt;
  &lt;str&gt;monitor&lt;/str&gt;
 &lt;/arr&gt;
 &lt;arr name="features"&gt;
  &lt;str&gt;30" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast&lt;/str&gt;
 &lt;/arr&gt;
 &lt;str name="includes"&gt;USB cable&lt;/str&gt;
 &lt;float name="weight"&gt;401.6&lt;/float&gt;
 &lt;float name="price"&gt;2199.0&lt;/float&gt;
 &lt;str name="price_c"&gt;2199,USD&lt;/str&gt;
 &lt;int name="popularity"&gt;6&lt;/int&gt;
 &lt;bool name="inStock"&gt;true&lt;/bool&gt;
 &lt;str name="store"&gt;43.17614,-90.57341&lt;/str&gt;
 &lt;long name="_version_"&gt;1453219041357332480&lt;/long&gt;
&lt;/doc&gt;
&lt;doc&gt;
 &lt;str name="id"&gt;VA902B&lt;/str&gt;
 &lt;str name="name"&gt;ViewSonic VA902B - flat panel display - TFT - 19"&lt;/str&gt;
 &lt;str name="manu"&gt;ViewSonic Corp.&lt;/str&gt;
 &lt;str name="manu_id_s"&gt;viewsonic&lt;/str&gt;
 &lt;arr name="cat"&gt;
  &lt;str&gt;electronics&lt;/str&gt;
  &lt;str&gt;monitor&lt;/str&gt;
 &lt;/arr&gt;
 &lt;arr name="features"&gt;
  &lt;str&gt;19" TFT active matrix LCD, 8ms response time, 1280 x 1024 native resolution&lt;/str&gt;
 &lt;/arr&gt;
 &lt;float name="weight"&gt;190.4&lt;/float&gt;
 &lt;float name="price"&gt;279.95&lt;/float&gt;
 &lt;str name="price_c"&gt;279.95,USD&lt;/str&gt;
 &lt;int name="popularity"&gt;6&lt;/int&gt;
 &lt;bool name="inStock"&gt;true&lt;/bool&gt;
 &lt;str name="store"&gt;45.18814,-93.88541&lt;/str&gt;
 &lt;long name="_version_"&gt;1453219045997281280&lt;/long&gt;&lt;/doc&gt;
&lt;/result&gt;
&lt;/response&gt;</pre>
<p>As we can see Solr responded correctly. This is because Solr already has the clusterstate.json file cached. To search Solr doesn&#8217;t need to update that file, so search should and is working as we could see.</p>
<h3>Indexing with failed ZooKeeper</h3>
<p>Without turning on our ZooKeeper instance we try to run the following command:
</p>
<pre class="brush:bash">curl 'localhost:8983/solr/collection1/update?commit=true' --data-binary @hd.xml -H 'Content-type:application/xml'</pre>
<p>The above command should result in indexing the contents of the <em>hd.xml </em>file. After a longer period of time Solr responds with the following information:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;lst name="responseHeader"&gt;&lt;int name="status"&gt;503&lt;/int&gt;&lt;int name="QTime"&gt;15096&lt;/int&gt;&lt;/lst&gt;&lt;lst name="error"&gt;&lt;str name="msg"&gt;Cannot talk to ZooKeeper - Updates are disabled.&lt;/str&gt;&lt;int name="code"&gt;503&lt;/int&gt;&lt;/lst&gt;
&lt;/response&gt;</pre>
<p>So as you can see we are not able to index data without working ZooKeeper <em>ensemble</em>.</p>
<h3>Starting ZooKeeper again</h3>
<p>So let&#8217;s see what will happen when we start our ZooKeeper instance again without restarting Solr nodes. After starting ZooKeeper we try to run the same indexing command, we just did, once again:
</p>
<pre class="brush:bash">curl 'localhost:8983/solr/collection1/update?commit=true' --data-binary @hd.xml -H 'Content-type:application/xml'</pre>
<p>And this time the response is different:
</p>
<pre class="brush:xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;response&gt;
&lt;lst name="responseHeader"&gt;&lt;int name="status"&gt;0&lt;/int&gt;&lt;int name="QTime"&gt;118&lt;/int&gt;&lt;/lst&gt;
&lt;/response&gt;</pre>
<p>As we can see the indexing request was successful this time. This allows us to assume that the connection to ZooKeeper was re-established by Solr. We can see that in Solr and ZooKeeper logs.</p>
<h3>Short summary</h3>
<p>As you can see, our short test allowed to see what happens when our ZooKeeper <em>ensemble </em>fails and what we can expect from Solr in such rare cases. I hope this blog entry will help you with some doubts about SolrCloud and its usefulnesses.</p>
<p>Please also remember that during the test, the cluster state did not change &#8211; all shards were accessible and working. We will see what will be happening when shards or replicas fails when ZooKeeeper is down in the next blog entry about SolrCloud.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2013/12/02/solrcloud-what-happens-when-zookeeper-fails-2/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>SolrCloud HOWTO</title>
		<link>https://solr.pl/en/2013/03/11/solrcloud-howto/</link>
					<comments>https://solr.pl/en/2013/03/11/solrcloud-howto/#respond</comments>
		
		<dc:creator><![CDATA[Marek Rogoziński]]></dc:creator>
		<pubDate>Mon, 11 Mar 2013 12:52:53 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[solrcloud]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=538</guid>

					<description><![CDATA[What is the most important change in 4.x version of Apache Solr? I think there are many of them but Solr Cloud is definitely something that changed a lot in Solr architecture. Until now, bigger installations suffered from single point]]></description>
										<content:encoded><![CDATA[<p>What is the most important change in 4.x version of Apache Solr? I think there are many of them but Solr Cloud is definitely something that changed a lot in Solr architecture. Until now, bigger installations suffered from single point of failure (SPOF) – there was only the one master server and when this server was going down, the whole cluster lose the ability to receive new data. Of course you could go for multiple masters, where a single master was responsible for indexing some part of the data, but still, there was a SPOF present in your deployment. Even if everything worked, due to commit interval and the fact that slave instances checked the presence of new data periodically, the solution was far from ideal – the new data in the cluster appeared minutes after commit.</p>
<p><span id="more-538"></span></p>
<p>Solr Cloud changed this behavior. In this article we will setup a new SolrCloud cluster from the scratch and we will see how it work.</p>
<h2>Our example cluster</h2>
<p>In our example we will use three Solr servers. Every server in the cluster is capable of handling both the index and the query requests. This is the main difference from the old-fashioned Solr architecture with single master and multiple slave servers. In the new architecture there is one additional element present: Zookeeper, which is responsible for holding configuration of the cluster and for synchronization of its work. It is crucial to understand that Solr relies on information stored in Zookeeper – if Zookeeper will fail, the whole cluster is useless. Because of this it is very important to have a fault tolerant Zookeeper ensemble and because of this we use three independent instances of Zookeeper that will form the ensemble.</p>
<h2>Zookeeper installation</h2>
<p>As we said previously, Zookeeper is a vital part of SolrCloud cluster. Although we can use embedded Zookeeper, this is only handy for testing. For production you definitely want your Zookeeper to be installed independently from Solr and run in a different Java virtual machine process to avoid those two interrupting each other and influencing each others work.</p>
<p>The installation of Apache Zookeeper is straight forward and may be described by the following steps:</p>
<ol>
<li>Download Zookeeper archive from: <a href="http://www.apache.org/dyn/closer.cgi/zookeeper/" target="_blank" rel="noopener noreferrer">http://www.apache.org/dyn/closer.cgi/zookeeper/</a></li>
<li>Unpack downloaded archive and copy <em>conf/zoo_sample.cfg</em> to <em>conf/zoo.cfg</em></li>
<li>Modify <em>zoo.cfg</em>:
<ol>
<li>Change <em>dataDir</em> to directory where you want to hold all cluster configuration data</li>
<li>Add information about all Zookeeper servers (see below)</li>
</ol>
</li>
</ol>
<p>After mentioned changes my <em>zoo.cfg</em> looks like the following one:
</p>
<pre class="brush:bash">tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper/data
clientPort=2181
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888</pre>
<ol>
<li>Copy this archive to the all servers, where Zookeeper service should be run</li>
<li>Create file <i>/var/zookeeper/data/myid</i> with server identifier. This identifier is different for each instance (for example on <i>zk2</i> this file should contain <em>2</em> number)</li>
<li>Start all instances using “bin/zkServer.sh start-foreground” and verify validity of the installation</li>
<li>Add “bin/zkServer.sh start” to starting scripts and make sure that operation system monitors that Zookeeper service is available.</li>
</ol>
<h2>Solr installation</h2>
<p>The installation of Solr is the following:</p>
<ol>
<li>Download Solr archive from: <a href="http://www.apache.org/dyn/closer.cgi/lucene/solr/4.1.0" target="_blank" rel="noopener noreferrer">http://www.apache.org/dyn/closer.cgi/lucene/solr/4.1.0</a></li>
<li>Unpack downloaded archive</li>
<li>In this tutorial we will use the ready Solr installation from the <em>example</em> directory and all changes are made to this example installation</li>
<li>Copy archive to all servers which are the part of the cluster</li>
<li>Install to Zookeeper configuration data, which will be used by the Solr cluster. For this run the first instance with:
<pre class="brush:bash">java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=solr1 -DzkHost=zk1:2181 -DnumShards=2 -jar start.jar</pre>
</li>
</ol>
<p>This should be run only once. The next run will use configuration from Zookeeper cluster and local configuration files are not needed.</p>
<ol>
<li>Run all instances using
<pre class="brush:bash">java –DzkHost=zk1:2181 –jar start.jar</pre>
</li>
</ol>
<h2>Verify the installation</h2>
<p>Go into administration panel on any Solr instance. For our deployment the URL should be like <i>http://solr1:8983/solr</i>. When you click on cloud tab, and graph, you should see something similar to the following screen shot:
</p>
<p style="text-align: center;"><a href="http://solr.pl/wp-content/uploads/2013/03/cloud.png"><img decoding="async" class="size-medium wp-image-2909 aligncenter" alt="cloud" src="http://solr.pl/wp-content/uploads/2013/03/cloud-300x138.png" width="300" height="138"></a></p>
<h2>Collection</h2>
<p>Our first collection &#8211; the <em>collection1</em> is divided into two shards (<em>shard1</em> and <em>shard2</em>). Each of those shards is placed on two Solr instances (OK, on the picture you see that every Solr is placed on the same host – I have currently only one physical server available for tests – any volunteers for donation? ;)). You can see that type of the dot tell us if it is a primary shard or replica.</p>
<h2>Summary</h2>
<p>I hope this is the first note about solrCloud. I know it is very short and skips details and information about shards, replicas and architecture of this solution. Treat this as a simple checklist for basic, (but real) configuration of your cloud.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2013/03/11/solrcloud-howto/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
