Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically delete corrupt indexes upon detection #10

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rnewson
Copy link
Contributor

@rnewson rnewson commented Jan 6, 2020

No description provided.

@ksnavely
Copy link
Contributor

ksnavely commented Jan 6, 2020

I'm trying to think of how this behavior might cause issues. Presumably we trust Lucene (8.1.0?) to accurately throw the CorruptIndexException when faced with an invalid index, and to not do so for valid indexes.

But is there any chance we could incidentally feed a non-Lucene index down this code path and wind up with the subspace being cleared?

@rnewson
Copy link
Contributor Author

rnewson commented Jan 6, 2020

There's three concerns;

  1. We clear a non-corrupt index (if Lucene throws this erroneously).
  2. We clear some other range (search3-java has been passed a subspace it shouldn't have been).
  3. Ops have to manually find and delete genuinely corrupted indexes in production while we work out the kinks.

I think 1) is possible but unlikely and I think we should accept the risk. 2 is also possible and would be a pretty serious bug in search3-erl or fdbcore/couch and it's bound to show up some other way, we can't be coy. The index and whatever is overlapping with it is toast anyway. I'd rather trash some data early in testing / beta than find out later. And 3 is my motivation for the patch.

@ksnavely
Copy link
Contributor

ksnavely commented Jan 6, 2020

Your thoughts make sense to me.

I was trying to conjure a scenario where Lucene would not recognize valid indexes as correct. I thought maybe an upgrade could trigger this due to inability to read the indexes (although I imagine the index format is tied to ~major versions for incompatibilities). But it looks like the index version exceptions are now unrelated to CorruptIndexException https://issues.apache.org/jira/browse/LUCENE-5972 except in parent class.

There are some references elastic/elasticsearch#28826 to hardware issues possibly surfacing as CIEs but for our purposes the hardware has been abstracted to data stored in FoundationDB, which should either be returning good data or returning well-stored invalid data. We'd see a different class of errors if there were issues with data transport from FDB.

IMO a cautious +1 here.

@tonysun83
Copy link
Contributor

tonysun83 commented Jan 6, 2020

I have only seen CorruptIndexException in the case where we upgraded the search3-java build and continued indexing in the same subspace given that the subspace is dbname+signature. So that begs the question about upgrades. We need to upgrade every so often and if index corruption happens every time, are we okay with deleting the index and possibly starting over? Shouldn't the upgrade happen seamlessly?

When we build a new index for a new DB, I have yet to see CorruptIndexException.

@ksnavely
Copy link
Contributor

ksnavely commented Jan 6, 2020

Could you provide more detail on the corruption you encountered above? That sounds like something we are doing that could be handled better?

@tonysun83
Copy link
Contributor

We would start indexing for a version of search3-java. When we replace the pod with a newer version of search3-java, that's when I would see the CorruptedIndex Exception. If we just keep letting the pod restart over and over then I would not see the corruptedindex exception.

@rnewson
Copy link
Contributor Author

rnewson commented Jan 7, 2020

@tonysun83 that sounds like actual corruption to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants