Solr 5.2: quick look on Solr backup functionality

Rafał Kuć — Mon, 22 Jun 2015 12:53:50 +0000

With the lastest release of Solr – the 5.2 and 5.2.1 we were given the new API – the backup API based on the replication handler. Because this functionality has been anticipated by some users, we decided to give it a quick look.

In order to test the new functionality we will do a very simple test:

We will launch Solr in the SolrCloud mode,
We will index a few documents,
We will make the backup using the new API,
We will index another few documents,
Finally we will try to restore the backup done in step 3

Let’s start.

Starting Solr

To start Solr in SolrCloud mode, we’ve used the bin/solr script and we’ve used the following command:

bin/solr -e cloud

For the purpose of the tests we need a single SolrCloud instance, with a single, empty collection (we will use the gettingstarted one provided with Solr) and a single shard.

Our cluster topology looked as follows:

Data indexation

Indexing data is as simple as starting Solr. Because we are using the example gettingstarted collection we can send documents without defined structure and Solr will adjust the schema.xml to what we need. So, for the purpose of the tests we will index two documents using the following command:

curl 'localhost:8983/solr/gettingstarted/update?commit=true' -H 'Content-type:application/xml' --data-binary '

 
  1
  Test document 1
 
 
  2
  Test document 2
 
'

Backup

Making a backup is again very simple. We just need to run the following command:

curl 'http://localhost:8983/solr/gettingstarted/replication?command=backup&name=test&location=/Users/gro/backup/'

The above command tells Solr, that we want to make a backup of our collection called snapshot.test (Solr will add the value of the name parameter to the snapshot. prefix). The backup itself, will be created in the collection data directory by default – this is when we will not provide the desired directory using the location parameter. In our example, we’ve provided that parameter and use an absolute path to tell Solr where the backup should be placed.

The response from Solr should be fast and look similar to the following one:



 02OK

Of course, if our collection is large, the time needed to create backup will be significantly larger. We can check the status our the backup creation by running the following command:

curl 'http://localhost:8983/solr/gettingstarted/replication?command=details'

Next indexation

The next step of our simple test is another indexation – this time adding two new documents using the following command:

curl 'localhost:8983/solr/gettingstarted/update?commit=true' -H 'Content-type:application/xml' --data-binary '

 
  3
  Test document 3
 
 
  4
  Test document 4
 
'

After the above command, if we would run a simple query like the following one:

curl 'localhost:8983/solr/gettingstarted/select?q=*:*&rows=0&indent=true'

Solr should respond and inform us that we have four documents in total:



 
  0
  6
  
   *:*
   true
   0

Restoring our backup

Now let’s try restoring our backup and see how many documents we will have after that operation. To restore the bacup we’ve created we run the following command:

curl 'http://localhost:8983/solr/gettingstarted/replication?command=restore&name=test&location=/Users/gro/backup/'

If everything went well, Solr response should be similar to the following one:



 02OK

So let’s now check how many documents are present in our collection by running the following command:

curl 'localhost:8983/solr/gettingstarted/select?q=*:*&rows=0&indent=true'



 
  0
  0
  
   *:*
   true
   0

As we can see, the number of documents in the collection is 2, which mean that our backup has been properly restored.

Short summary

As we can see, Solr backup mechanism works flawlessly, however we should remember about few things. When running a few Solr instances on the same physical machine, we should avoid doing backups using absolute paths – we can end up with shards data being overwritten. Apart from that, its good to finally have fully working and easy to use backup functionality

Backing Up Your Index

Rafał Kuć — Mon, 13 Aug 2012 21:51:12 +0000

Did you ever wonder if you can create a backup of your index with the tools available in Solr ? For exmaple after every commit or optimize operation ? Or may you would like to create backups with the HTTP API call ? Lets see what possibilities Solr has to offer.

The Beginning

We decided to write about index backups even though this functionality is fairly simple. We noticed that many people tend to forget about this functionality, not only when it comes to Apache Solr. We hope that this blog entry, will help you remember about backup creation functionality, when you need it. But now, lets start from the beginning – before we started the tests, we looked at the directory where Solr keeps its indices and this is what we saw:

drwxrwxr-x 2 gr0 gr0 4096 2012-08-12 20:17 index
drwxrwxr-x 2 gr0 gr0 4096 2012-08-12 20:16 spellchecker

Manual Backup

In order to create a backup of your index with the use of HTTP API you have to have replication handler configured. If you have it, then you need to send the command parameter with backup value to the master server replication handler, for example like this:

curl 'http://localhost:8983/solr/replication?command=backup'

The above will tell Solr to create a new backup of the current index. Lets now look how the directory where indices live looks like after running the above command:

drwxrwxr-x 2 gr0 gr0 4096 2012-08-12 20:18 index
drwxrwxr-x 2 gr0 gr0 4096 2012-08-12 20:19 snapshot.20120812201917
drwxrwxr-x 2 gr0 gr0 4096 2012-08-12 20:16 spellchecker

As you can see, there is a new directory created – snapshot.20120812201917. We can assume, that we got what we wanted

Automatic Backup

In addition to manual backup creation, you can also configure Solr to create indices after commit or optimize operation. Please remember though, that if your index is changing rapidly it is usually a bad idea to create backup after each commit operation. But lets get back to automatic backups. In order to configure Solr to create backups for us, you need to add the following line to replication handler configuration:

commit

So, the full replication handler configuration (on the master server) would look like this:


 
  commit
  startup
  schema.xml,stopwords.txt
  commit

After sending two commit operation our dictionary with indices looks like this:

drwxrwxr-x 2 gr0 gr0 4096 2012-08-12 21:12 index
drwxrwxr-x 2 gr0 gr0 4096 2012-08-12 21:12 snapshot.20120812211203
drwxrwxr-x 2 gr0 gr0 4096 2012-08-12 21:12 snapshot.20120812211216
drwxrwxr-x 2 gr0 gr0 4096 2012-08-12 20:16 spellchecker

As you can see, Solr did what we wanted to be done.

Keeping Order

It is possible to control the maximum amount of backups that should be stored on disk. In order to configure that number you need to add the following line to your replication handler configuration:

The above configuration value tells Solr to keep maximum of ten backups of your index. Of course you can delete created backups (manually for example) if you don’t need them anymore.

backup – Solr.pl