Solr fix corrupted index using Lucene

4 minute read

A few days ago, a Solr server in our SolrCloud installation stopped unexpectedly. After examining solr.log file I spotted that it could not start again because index.20140510031827076 file was corrupted. After some searching I found this Lucidworks article which described how to deal with a corrupted index file. So I proceeded to the following steps:

WARNING!!

This procedure may result in unrecoverable data loss. It is vital that you backup your index before performing index checking and repair.

  1. Find the guilty index file containing the corrupted segment. For me it was located in:
    C:/Program Files/apache-solr-4.6.0/example/solr/hellaserver_shard1_replica4/data/index.20140510031827076
    
  2. Find Lucene core .jar file. I work with Apache Solr 4.6.0 so it is a file named lucene-core-4.6.0.jar. It is usually in $SOLR_HOME/example/solr-webapp/webapp/WEB-INF/lib/lucene-core-4.6.0.jar. So, switch to the directory where it exists:
    cd C:/Program Files/apache-solr-4.6.0/example/solr-webapp/webapp/WEB-INF/lib
    
  3. Check the segments of the corrupted index file in order to identify the problematic segment. To accomplish that, run:
    java -cp lucene-core-4.6.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex "C:/Program Files/apache-solr-4.6.0/example/solr/hellaserver_shard1_replica4/data/index.20140510031827076"
    

    The check results for a healthy segment look like the following:

    1 of 37: name=_48r7 docCount=3021717
       codec=Lucene46
       compound=false
       numFiles=11
       size (MB)=5,020.87
       diagnostics = {timestamp=1403095380034, os=Windows Server 2012, os.version=6
    .2, mergeFactor=10, source=merge, lucene.version=4.6.0 1543363 - simon - 2013-11
    -19 11:05:50, os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.7.0_51, java
    .vendor=Oracle Corporation}
       has deletions [delGen=170]
       test: open reader.........OK [196 deleted docs]
       test: fields..............OK [135 fields]
       test: field norms.........OK [52 fields]
       test: terms, freq, prox...OK [16644025 terms; 441992389 terms/docs pairs; 39
    0271915 tokens]
       test (ignoring deletes): terms, freq, prox...OK [16646658 terms; 442060892 t
    erms/docs pairs; 390381979 tokens]
       test: stored fields.......OK [107214499 total field count; avg 35.484 fields
    per doc]
       test: term vectors........OK [0 total vector count; avg 0 term/freq vector f
    ields per doc]
       test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 SOR
    TED; 0 SORTED_SET]
    
    2 of 37: name=_3b16 docCount=2449309
       codec=Lucene46
       compound=false
       numFiles=11
       size (MB)=3,831.743
       diagnostics = {timestamp=1402370404453, os=Windows Server 2012, os.version=6
    .2, mergeFactor=10, source=merge, lucene.version=4.6.0 1543363 - simon - 2013-11
    -19 11:05:50, os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.7.0_51, java
    .vendor=Oracle Corporation}
       has deletions [delGen=426]
       test: open reader.........OK [20262 deleted docs]
       test: fields..............OK [92 fields]
       test: field norms.........OK [35 fields]
    

    When the index checker encounters a corrupted segment, the output looks like the following:

    37 of 37: name=_4gxr docCount=11
       codec=Lucene46
       compound=false
       numFiles=10
       size (MB)=75.71
       diagnostics = {timestamp=1403212995547, os=Windows Server 2012, os.version=6
    .2, source=flush, lucene.version=4.6.0 1543363 - simon - 2013-11-19 11:05:50, os
    .arch=amd64, java.version=1.7.0_51, java.vendor=Oracle Corporation}
       no deletions
       test: open reader.........FAILED
       WARNING: fixIndex() would remove reference to this segment; full exception:
    org.apache.lucene.index.CorruptIndexException: invalid docCount: 48066 maxDoc: 1
    1 (resource=MMapIndexInput(path="C:\Program Files\apache-solr-4.6.0\example\solr
    \hellasever_shard1_replica4\data\index.20140510031827076\_4gxr_Lucene41_0.ti
    m"))
          at org.apache.lucene.codecs.BlockTreeTermsReader.(BlockTreeTermsRe
    ader.java:166)
          at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProduc
    er(Lucene41PostingsFormat.java:437)
          at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader
    .(PerFieldPostingsFormat.java:195)
          at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProduc
    er(PerFieldPostingsFormat.java:244)
          at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.
    java:115)
          at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)
          at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:554)
          at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1941)
    
    WARNING: 1 broken segments (containing 11 documents) detected
    WARNING: would write new segments file, and 11 documents would be lost, if -fix
    were specified
    
  4. After the reconnaissance of the corrupted segment, rerun the command with -fix parameter:
    java -cp lucene-core-4.6.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex "C:/Program Files/apache-solr-4.6.0/example/solr/hellaserver_shard1_replica4/data/index.20140510031827076" -fix
    

    When the checker fixes the segment, the output will look like the following:

    37 of 37: name=_4gxr docCount=11
       codec=Lucene46
       compound=false
       numFiles=10
       size (MB)=75.71
       diagnostics = {timestamp=1403212995547, os=Windows Server 2012, os.version=6
    .2, source=flush, lucene.version=4.6.0 1543363 - simon - 2013-11-19 11:05:50, os
    .arch=amd64, java.version=1.7.0_51, java.vendor=Oracle Corporation}
       no deletions
       test: open reader.........FAILED
       WARNING: fixIndex() would remove reference to this segment; full exception:
    org.apache.lucene.index.CorruptIndexException: invalid docCount: 48066 maxDoc: 1
    1 (resource=MMapIndexInput(path="C:\Program Files\apache-solr-4.6.0\example\solr
    \hellasever_shard1_replica4\data\index.20140510031827076\_4gxr_Lucene41_0.ti
    m"))
          at org.apache.lucene.codecs.BlockTreeTermsReader.<init>(BlockTreeTermsRe
    ader.java:166)
          at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProduc
    er(Lucene41PostingsFormat.java:437)
          at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader
    .<init>(PerFieldPostingsFormat.java:195)
          at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProduc
    er(PerFieldPostingsFormat.java:244)
          at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.
    java:115)
          at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:95)
          at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:554)
          at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1941)
    
    WARNING: 1 broken segments (containing 11 documents) detected
    WARNING: 11 documents will be lost
    
    NOTE: will write new segments file in 5 seconds; this will remove 11 docs from t
    he index. THIS IS YOUR LAST CHANCE TO CTRL+C!
    5...
    4...
    3...
    2...
    1...
    Writing...
    OK
    Wrote new segments file "segments_y5e"
    
  5. Finally, restart Solr server in order to resynchronise with the shard leader.

Comments