Solr 4.1: SolrCloud – multiple shards on the same Solr node

We would like to discuss another new feature that will be a part of upcoming Solr 4.1 – the ability to place more than one shard of a given collection on a single Solr instance. As you may know this is not possible currently. So, lets look how this works by comparing Solr 4.1 to 4.0.

In order to illustrate how this feature works I decided to see how the process of creating a new collection looks like. We will use a single Solr instance and our collection with be built of two shards.

Solr 4.0

Solr.xml

What we need to do first is clean the solr.xml file, so it doesn’t have any information about cores. Our solr.xml file should look like this:

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true">
 <cores adminPath="/admin/cores" defaultCoreName="collection1" host="${host:}" hostPort="${jetty.port:}" hostContext="${hostContext:}" zkClientTimeout="${zkClientTimeout:15000}">
 </cores>
</solr>

Starting Solr

Now we need to run a single Solr instance with embedded ZooKeeper. We do that, by running the following command:

java -DzkRun -jar start.jar

Preparing configuration

Before creating our collection we need to send all the needed configuration file to ZooKeeper. Assuming that we have Solr installed in /home/solrpl/solr/ directory and that we have our configuration files stored in  /home/solrpl/configs/collection1/conf directory I run the following script that is distributed with Solr 4.0:

/home/solrpl/solr/cloud-scripts/zkcli.sh -cmd upconfig -zkhost localhost:9983 -confdir /home/solrpl/configs/collection1/conf/ -confname collection1

Creating the collection

We should have our configuration files stored in ZooKeeper so now we can use the collections API to create our collection. In order to do the we run a query to Solr to the /solr/admin/collections endpoint with the action=CREATE parameter that tells Solr that we want to create new collection. We also need to provide the name of the collection by adding the name=collection1 parameter. In addition to that we inform Solr that we want to have our collection divided into two shards (numShard=2) and we don’t want any replicas (replicationFactor=0). So the full request looks like this:

curl 'http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=0'

Solr administration panel view

If you would repeat the above steps and look at the cloud view in Solr administration panel you would see something like this:
Solr 4.0 Cloud View

Comments

As you can see Solr 4.0 didn’t place both shards on a single machine. The shard named shard1 was placed on the xubuntu-virtual node, but the one called shard2 was not assigned. Of course that would change if we had more nodes forming the cluster, but that’s not the point of this entry.

Solr 4.1

Solr.xml

Similar to what we did with Solr 4.0 we start with cleaning the solr.xml file, which should look like this:

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true">
 <cores adminPath="/admin/cores" defaultCoreName="collection1" host="${host:}" hostPort="${jetty.port:}" hostContext="${hostContext:}" zkClientTimeout="${zkClientTimeout:15000}">
 </cores>
</solr>

Starting Solr

We do exactly the same when starting Solr 4.1, so we run the following command:

java -DzkRun -jar start.jar

Preparing configuration

Similar to what we did with Solr 4.0, we need to send our configuration to ZooKeeper. We do that by running exactly the same command as we did before:

/home/solrpl/solr/cloud-scripts/zkcli.sh -cmd upconfig -zkhost localhost:9983 -confdir /home/solrpl/configs/collection1/conf/ -confname collection1

Collection creation

Creating our collection will be a bit different this time. We send the same values of parameters like action, collection and numShards. However we add a new parameter the maxShardsPerNode one that specifies the maximum number of shards that can be placed on a single Solr instance (by default this value is set to 1). In our case we want to have two shards on a single Solr node so we set this parameter to 2. In addition to that Solr forces us to have at least a single replica, so we need to set the replicationFactor parameter to 1. The whole query looks like this:

curl 'http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=1&maxShardsPerNode=2'

Solr administration panel view

After all the above steps the cloud view in Solr administration panel looks like this:
solr_4.1_cloud

Comments

As you can see, with Solr 4.1 we were able to create collection built of two shards and place both of them on a single Solr node. So if you need to have this kind of functionality you can wait for Solr 4.1 and be sure that it will be working.

This post is also available in: Polish

This entry was posted on Monday, January 7th, 2013 at 08:12 and is filed under About Solr. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

13 Responses to “Solr 4.1: SolrCloud – multiple shards on the same Solr node”

  1. Fei Says:

    Hi,
    I followed your example precisely. But after running “Preparing Configuration” section of Solr4.1 I got the “Missing required parameter: name” error.

    Do you know what cause it?
    Thanks

  2. gr0 Says:

    Sorry, there was a mistake in the examples (somehow the script that formats the code snippets didn’t want to show one parameter) and they are updated now. Just add the name=collection1 parameter to your curl command and it should work without any problems.

  3. Fei Says:

    There is actually a typo collection=collection1 should be name=collection1.

  4. Fei Says:

    Thanks gr0, and I realized that as well.

  5. gr0 Says:

    Your are right. Thanks :)

  6. Fei Says:

    This morning, I tried to restart the server. But I got “java.net.BindException: Address already in use” error while binding to port 0.0.0.0/0.0.0.0:9983.

    Any advice?
    Thanks in advance.

  7. gr0 Says:

    That should only happen when you have some application already listening on that port. Please check that, maybe your Solr was already running ?

  8. chenlm Says:

    The param “maxshardspernode“ seems not work out….

    I use the default maxshardspernode = 1,but it still create two shards in one instance…

    Is this a bug?
    please help!! email address chenlm20042004@163.com

  9. gr0 Says:

    I think you are doing something wrong. When trying to create two shards and a single node with the following request:
    curl ‘http://localhost:8983/solr/admin/collections?action=CREATE&collection=collection1&numShards=2&replicationFactor=1&maxShardsPerNode=1′

    I get the following exception in Solr logs, which is expected:
    SEVERE: Cannot create collection collection1. Value of maxShardsPerNode is 1, and the number of live nodes is 1. This allows a maximum of 1 to be created. Value of numShards is 2 and value of replicationFactor is 1. This requires 2 shards to be created (higher than the allowed number)

    When trying to set the replicationFactor parameter to 0 it also says its not allowed:
    SEVERE: replicationFactor must be > 0

    Please check if you are doing everything right.

  10. ak Says:

    Is it possible to load balance the documents across the shards manually when I have single instance and multiple shards? anything by document size or document count

  11. Chris Says:

    Hi gr0, nice writeup… Just a question. What are the benefits of having multiple shards per node? Does it improve performance?

    Also, if I had a 3 node cluster, would the create command with 6 shards automatically put two on each node? How would replication be handled in this scenario?

  12. gr0 Says:

    One of the benefits of creating more shards than the actual nodes is that in the future you’ll be able to expand those to new nodes. Imagine a situation where we know that in the future we will need more servers because the X that we know have will not be enough and re-indexing all is not an option. In such case we can create more shards per node now and move them to new servers in the future.

  13. Dexter Legaspi Says:

    thanks for this write-up. I’ve been searching high and low for a Solr configuration that has multiple shards on one node/instance (most write-ups are multiple 1-shard collections in one node or multiple shards for 1 collection spanning multiple nodes)…now I just need to see if this works on 4.9 .. from what i’ve read there’s some substantial difference on code creation on recent 4.9 but collection creation is pretty much the same.

    I’m creating a write-up of my own once i get everything working.