Solr fix corrupted index using Lucene
A few days ago, a Solr server in our SolrCloud installation stopped unexpectedly. After examining solr.log file I spotted that it could not start again because index.20140510031827076 file was corrupted. After some searching I found this Lucidworks article which described how to deal with a corrupted index file. So I proceeded to the following steps:
WARNING!!
This procedure may result in unrecoverable data loss. It is vital that you backup your index before performing index checking and repair.
- Find the guilty index file containing the corrupted segment. For me it was located in:
C:/Program Files/apache-solr-4.6.0/example/solr/hellaserver_shard1_replica4/data/index.20140510031827076
- Find Lucene core .jar file. I work with Apache Solr 4.6.0 so it is a file named lucene-core-4.6.0.jar. It is usually in
$SOLR_HOME/example/solr-webapp/webapp/WEB-INF/lib/lucene-core-4.6.0.jar
. So, switch to the directory where it exists:cd C:/Program Files/apache-solr-4.6.0/example/solr-webapp/webapp/WEB-INF/lib
- Check the segments of the corrupted index file in order to identify the problematic segment. To accomplish that, run:
java -cp lucene-core-4.6.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex "C:/Program Files/apache-solr-4.6.0/example/solr/hellaserver_shard1_replica4/data/index.20140510031827076"
The check results for a healthy segment look like the following:
1 of 37: name=_48r7 docCount=3021717 codec=Lucene46 compound=false numFiles=11 size (MB)=5,020.87 diagnostics = {timestamp=1403095380034, os=Windows Server 2012, os.version=6 .2, mergeFactor=10, source=merge, lucene.version=4.6.0 1543363 - simon - 2013-11 -19 11:05:50, os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.7.0_51, java .vendor=Oracle Corporation} has deletions [delGen=170] test: open reader.........OK [196 deleted docs] test: fields..............OK [135 fields] test: field norms.........OK [52 fields] test: terms, freq, prox...OK [16644025 terms; 441992389 terms/docs pairs; 39 0271915 tokens] test (ignoring deletes): terms, freq, prox...OK [16646658 terms; 442060892 t erms/docs pairs; 390381979 tokens] test: stored fields.......OK [107214499 total field count; avg 35.484 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector f ields per doc] test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 SOR TED; 0 SORTED_SET] 2 of 37: name=_3b16 docCount=2449309 codec=Lucene46 compound=false numFiles=11 size (MB)=3,831.743 diagnostics = {timestamp=1402370404453, os=Windows Server 2012, os.version=6 .2, mergeFactor=10, source=merge, lucene.version=4.6.0 1543363 - simon - 2013-11 -19 11:05:50, os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.7.0_51, java .vendor=Oracle Corporation} has deletions [delGen=426] test: open reader.........OK [20262 deleted docs] test: fields..............OK [92 fields] test: field norms.........OK [35 fields]
When the index checker encounters a corrupted segment, the output looks like the following:
37 of 37: name=_4gxr docCount=11 codec=Lucene46 compound=false numFiles=10 size (MB)=75.71 diagnostics = {timestamp=1403212995547, os=Windows Server 2012, os.version=6 .2, source=flush, lucene.version=4.6.0 1543363 - simon - 2013-11-19 11:05:50, os .arch=amd64, java.version=1.7.0_51, java.vendor=Oracle Corporation} no deletions test: open reader.........FAILED WARNING: fixIndex() would remove reference to this segment; full exception: org.apache.lucene.index.CorruptIndexException: invalid docCount: 48066 maxDoc: 1 1 (resource=MMapIndexInput(path="C:\Program Files\apache-solr-4.6.0\example\solr \hellasever_shard1_replica4\data\index.20140510031827076\_4gxr_Lucene41_0.ti m")) at org.apache.lucene.codecs.BlockTreeTermsReader.(BlockTreeTermsRe ader.java:166) at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProduc er(Lucene41PostingsFormat.java:437) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader .(PerFieldPostingsFormat.java:195) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProduc er(PerFieldPostingsFormat.java:244) at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders. java:115) at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:554) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1941) WARNING: 1 broken segments (containing 11 documents) detected WARNING: would write new segments file, and 11 documents would be lost, if -fix were specified
- After the reconnaissance of the corrupted segment, rerun the command with -fix parameter:
java -cp lucene-core-4.6.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex "C:/Program Files/apache-solr-4.6.0/example/solr/hellaserver_shard1_replica4/data/index.20140510031827076" -fix
When the checker fixes the segment, the output will look like the following:
37 of 37: name=_4gxr docCount=11 codec=Lucene46 compound=false numFiles=10 size (MB)=75.71 diagnostics = {timestamp=1403212995547, os=Windows Server 2012, os.version=6 .2, source=flush, lucene.version=4.6.0 1543363 - simon - 2013-11-19 11:05:50, os .arch=amd64, java.version=1.7.0_51, java.vendor=Oracle Corporation} no deletions test: open reader.........FAILED WARNING: fixIndex() would remove reference to this segment; full exception: org.apache.lucene.index.CorruptIndexException: invalid docCount: 48066 maxDoc: 1 1 (resource=MMapIndexInput(path="C:\Program Files\apache-solr-4.6.0\example\solr \hellasever_shard1_replica4\data\index.20140510031827076\_4gxr_Lucene41_0.ti m")) at org.apache.lucene.codecs.BlockTreeTermsReader.<init>(BlockTreeTermsRe ader.java:166) at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProduc er(Lucene41PostingsFormat.java:437) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader .<init>(PerFieldPostingsFormat.java:195) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProduc er(PerFieldPostingsFormat.java:244) at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders. java:115) at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:95) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:554) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1941) WARNING: 1 broken segments (containing 11 documents) detected WARNING: 11 documents will be lost NOTE: will write new segments file in 5 seconds; this will remove 11 docs from t he index. THIS IS YOUR LAST CHANCE TO CTRL+C! 5... 4... 3... 2... 1... Writing... OK Wrote new segments file "segments_y5e"
- Finally, restart Solr server in order to resynchronise with the shard leader.
Comments