CheckIndex for the rescue

While using Lucene and Solr we are used to a very high reliability of this products. However, there may come the day when Solr will inform us that our index is corrupted, and we need to do something about it. Is the only way to repair the index is to restore it from the backup or do full indexation ? Not only – there is hope in the form of CheckIndex tool.

What is CheckIndex ?

CheckIndex is a tool available in the Lucene library, which allows you to check the files and create new segments that do not contain problematic entries. This means that this tool, with little loss of data is able to repair a broken index, and thus save us from having to restore the index from the backup (of course if we have it) or do the full indexing of all documents that were stored in Solr.

Where do I start?

Please note that, according to what we find in Javadocs, this tool is experimental and may change in the future. Therefore, before starting working with it we should create a copy of the index. In addition, it is worth knowing that the tool analyzes the index byte by byte, and thus for large indexes the time of analysis and repair may be large. It is important not to run the tool with the -fix option at the moment when it is used by Solr or other application based on the Lucene library. Finally, be aware that the launch of the tool in repairing mode may result in removal of some or all documents that are stored in the index.

How to run it ?

To run the utility, go to the directory where the Lucene library files are located and run the following command:

In my case, it looked as follows:

After a while I got the following information:

It mean that the index is correct and there was no need for any corrective action. Additionally, you can learn some interesting things about the index 😉

Broken index

But what happens in the case of the broken index? There is only one way to see it – let’s try. So, I broke one of the index files and ran the CheckIndex tool. The following appeared on the console after I’ve run the CheckIndex tool:

As you can see, all the 19 documents that were in the index have been removed. This is an extreme case, but you should realize that this tool might work like this.

The end

If you remember about the basisc assumptions associated with the use of the CheckIndex tool you may find yourself in a situation when this tool will come in handy and you will not have to ask yourself a question like “When the last backup was made ?”.

8 thoughts on “CheckIndex for the rescue

  • 18 January 2011 at 06:49
    Permalink

    hi
    very informative post regarding to lucene, but what happen if we use the RAMDirectory class to maintain an in-memory index in Lucene.reparing of document can be done with that
    Thanks

    Reply
  • 22 June 2017 at 10:05
    Permalink

    I am getting below error while running it for Solr indexes-

    root@dtraflonrh752:/apps/alfresco/webapps/alfresco/WEB-INF/lib # java -cp lucene-core-2.4.1.1.jar -ea:org.apache.lucene… org.apache.lucene.index.CheckIndex /apps/solr-index/workspace/SpacesStore/index/ -fix

    Opening index @ /apps/solr-index/workspace/SpacesStore/index/

    ERROR: could not read any segments file in directory
    org.apache.lucene.index.CorruptIndexException: Unknown format version: -9
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:225)
    at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:275)
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:272)
    at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:258)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:687)
    WARNING: 0 documents will be lost

    NOTE: will write new segments file in 5 seconds; this will remove 0 docs from the index. THIS IS YOUR LAST CHANCE TO CTRL+C!
    5…
    4…
    3…
    2…
    1…
    Writing…
    Exception in thread “main” java.lang.NullPointerException
    at org.apache.lucene.index.CheckIndex.fixIndex(CheckIndex.java:565)
    at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:706)

    Can anyone help me what wrong am i doing? or my indexes are currupted:)

    Reply
    • 17 July 2017 at 22:26
      Permalink

      Are you sure you are using the same Lucene version as the indices have been written with?

      Reply
  • 23 July 2017 at 06:05
    Permalink

    Any suggestion on how to fix corrupt index. My index is about 130 GB. Thanks.

    Reply
    • 23 July 2017 at 20:57
      Permalink

      As mentioned in the blog post, you can check the index and try to repair it using the CheckIndex tool, but that won’t work in most cases. The possibility of fixing the index itself depends on the damage. If you have only a single segment of the single shard corrupted CheckIndex may help by removing that segment and portion of the data with it. Sometimes it is not possible though. Keep in mind that you need to use the same version of Lucene that your Solr is using, otherwise it will not work.

      Reply
      • 21 December 2018 at 06:48
        Permalink

        thanks. It worked …!!

        Reply

Leave a Reply to Shiv Kumar Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.