Solr 5.2: quick look on Solr backup functionality

With the lastest release of Solr – the 5.2 and 5.2.1 we were given the new API – the backup API based on the replication handler. Because this functionality has been anticipated by some users, we decided to give it a quick look.

In order to test the new functionality we will do a very simple test:

We will launch Solr in the SolrCloud mode,
We will index a few documents,
We will make the backup using the new API,
We will index another few documents,
Finally we will try to restore the backup done in step 3

Let’s start.

Starting Solr

To start Solr in SolrCloud mode, we’ve used the bin/solr script and we’ve used the following command:

bin/solr -e cloud

For the purpose of the tests we need a single SolrCloud instance, with a single, empty collection (we will use the gettingstarted one provided with Solr) and a single shard.

Our cluster topology looked as follows:

Data indexation

Indexing data is as simple as starting Solr. Because we are using the example gettingstarted collection we can send documents without defined structure and Solr will adjust the schema.xml to what we need. So, for the purpose of the tests we will index two documents using the following command:

curl 'localhost:8983/solr/gettingstarted/update?commit=true' -H 'Content-type:application/xml' --data-binary '
<add>
 <doc>
  <field name="id">1</field>
  <field name="name">Test document 1</field>
 </doc>
 <doc>
  <field name="id">2</field>
  <field name="name">Test document 2</field>
 </doc>
</add>'

Backup

Making a backup is again very simple. We just need to run the following command:

curl 'http://localhost:8983/solr/gettingstarted/replication?command=backup&name=test&location=/Users/gro/backup/'

The above command tells Solr, that we want to make a backup of our collection called snapshot.test (Solr will add the value of the name parameter to the snapshot. prefix). The backup itself, will be created in the collection data directory by default – this is when we will not provide the desired directory using the location parameter. In our example, we’ve provided that parameter and use an absolute path to tell Solr where the backup should be placed.

The response from Solr should be fast and look similar to the following one:

<?xml version="1.0" encoding="UTF-8"?>
<response>
 <lst name="responseHeader"><int name="status">0</int><int name="QTime">2</int></lst><str name="status">OK</str>
</response>

Of course, if our collection is large, the time needed to create backup will be significantly larger. We can check the status our the backup creation by running the following command:

curl 'http://localhost:8983/solr/gettingstarted/replication?command=details'

Next indexation

The next step of our simple test is another indexation – this time adding two new documents using the following command:

curl 'localhost:8983/solr/gettingstarted/update?commit=true' -H 'Content-type:application/xml' --data-binary '
<add>
 <doc>
  <field name="id">3</field>
  <field name="name">Test document 3</field>
 </doc>
 <doc>
  <field name="id">4</field>
  <field name="name">Test document 4</field>
 </doc>
</add>'

After the above command, if we would run a simple query like the following one:

curl 'localhost:8983/solr/gettingstarted/select?q=*:*&rows=0&indent=true'

Solr should respond and inform us that we have four documents in total:

<?xml version="1.0" encoding="UTF-8"?>
<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">6</int>
  <lst name="params">
   <str name="q">*:*</str>
   <str name="indent">true</str>
   <str name="rows">0</str>
  </lst>
 </lst>
 <result name="response" numFound="4" start="0" maxScore="1.0">
 </result>
</response>

Restoring our backup

Now let’s try restoring our backup and see how many documents we will have after that operation. To restore the bacup we’ve created we run the following command:

curl 'http://localhost:8983/solr/gettingstarted/replication?command=restore&name=test&location=/Users/gro/backup/'

If everything went well, Solr response should be similar to the following one:

<?xml version="1.0" encoding="UTF-8"?>
<response>
 <lst name="responseHeader"><int name="status">0</int><int name="QTime">2</int></lst><str name="status">OK</str>
</response>

So let’s now check how many documents are present in our collection by running the following command:

curl 'localhost:8983/solr/gettingstarted/select?q=*:*&rows=0&indent=true'

<?xml version="1.0" encoding="UTF-8"?>
<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
  <lst name="params">
   <str name="q">*:*</str>
   <str name="indent">true</str>
   <str name="rows">0</str>
  </lst>
 </lst>
 <result name="response" numFound="2" start="0">
 </result>
</response>

As we can see, the number of documents in the collection is 2, which mean that our backup has been properly restored.

Short summary

As we can see, Solr backup mechanism works flawlessly, however we should remember about few things. When running a few Solr instances on the same physical machine, we should avoid doing backups using absolute paths – we can end up with shards data being overwritten. Apart from that, its good to finally have fully working and easy to use backup functionality 🙂

Solr.pl