Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on Export index from Anserini to Ciff #33

Open
HansiZeng opened this issue Jan 8, 2023 · 1 comment
Open

Error on Export index from Anserini to Ciff #33

HansiZeng opened this issue Jan 8, 2023 · 1 comment

Comments

@HansiZeng
Copy link

I use the following command to get Anserini index

python -m pyserini.index.lucene \
  -collection JsonVectorCollection \
  -input "experiments/spladev2/out/docs_anserini" \
  -index "experiments/spladev2/out/anserini_index/" \
  -generator DefaultLuceneDocumentGenerator \
  -threads 16 -impact -pretokenized \
  -optimize

and

./ciff/target/appassembler/bin/ExportAnseriniLuceneIndex -index "experiments/spladev2/out/anserini_index/" -output experiments/spladev2/out/anserini_index.ciff

to export the index from Anserini to Ciff

But I get the error:

Exception in thread "main" java.lang.IllegalArgumentException: indexCreatedVersionMajor is in the future: 9
        at org.apache.lucene.index.SegmentInfos.<init>(SegmentInfos.java:169)
        at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:327)
        at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291)
        at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:64)
        at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:61)
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:720)
        at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:84)
        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:64)
        at io.osirrc.ciff.lucene.ExportAnseriniLuceneIndex.main(ExportAnseriniLuceneIndex.java:136)
        Suppressed: org.apache.lucene.index.CorruptIndexException: checksum passed (b2b4eb97). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/home/jupyter/neural-ranking/splade/experiments/cocondenser_kldiv_distil_01-05_190239/out/anserini_index/segments_2")))
                at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:466)
                at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:434)
                ... 7 more

Can you help me to solve the error? If you need more information don't hesitate to let me know. Thanks so much!

@JMMackenzie
Copy link
Member

Hello,

I think the problem is that the Lucene index version is ahead of the CIFF version.

You could try to modify the pom.xml file on line 29: <lucene.version>8.11.0</lucene.version> to the version you used for the index export (refer to the pom.xml file in the Anserini directory; it's probably 9.* based on the error).

We may need to keep updating this ciff tool to stay abreast of the Lucene changes; if this change does help, please feel free to submit a PR.

@lintool may also have some ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants