Index – delete or update?

From time to time, in working with Solr there is a problem – how to update Solr index structure. There are various reasons for these changes – the new functional requirements, optimization, or anything else – it is not important. What is important is the question that arise – should we remove the index, or simply change the structure and do a full indexing? Contrary to appearances, the answer to this question depends on the changes we made in the structure of the index.

Personally, I am an advocate of solutions that have the smallest chance to cause problems – I just like to sleep at night. I think that removing the index after updateing its structure and then do the full indexation of the data is one of those solutions, at least in my opinion. I am aware, however, that this type of solution is not always acceptable. So when we are not forced to remove the index, and when not doing it exposes us to potential problems with the Solr ?

The answer to the question depends on what changed in the structure of the index. Such changes can be divided into three areas covering most of the changes that we make in the structure of the index:

Adding / removing new field
Similarity modification
Field modification

Adding / removing new field

In the case of the first type of modification of the matter is quite simple – if we add or remove a new field to schema.xml there is no need to remove the entire index before re-indexing. Solr handle adding a new field to the current index. Of course, you should be aware that the documents which will not be after this operation will not be re-indexed automatically updated.

Similarity modification

In the second case – the change of the class that is responsible for Similarity also does not force us to to delete the index after the change. But unlike the previous example, if we want Solr to correctly calculate the score, and thus to sort in the correct order we will be forced to re-indexing of all documents previously present in the index.

Field modification

Let stop a minute on the third case. Let’s suppose that we modify slightly the field in the index with the prosaic reason – we are no longer interested in the normalization of its length. We set omitNorms=”true” (I assume that the previous setting was omitNorms=”false”). If we only re-index all the documents, the Lucene indexes, in the combined segments, will still have information about length normalization of the field. Something went wrong. This is precisely the case when it is necessary to delete the index after the change to its structure, and prior to full indexation. At first glance, it seems that this is a very small change, but thinking further, we have some side effects of the change. It is worth remembering that some of the field properties are overwritten by other, as in the case of normalization of the length – if one segment will have lenght normalization, and the second will not, when you combine the segments you will have lenght normalization in the one that was created.

Solr.pl

Index – delete or update?

Adding / removing new field

Similarity modification

Field modification

Leave a Reply Cancel reply