-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-5148 Introduce "soft fail" for corrupt ValueStore #5157
Conversation
Thanks for the fixes @hmottestad! I haven't had time yet to look into this, but I can contribute to the documentation if you want. Following up on another comment from your review of my PR: adding a test for an artificially corrupted store may be difficult, I don't really know how to do this reliably. Perhaps it is possible to manipulate a NativeValue instance in the ValueStore to hold an illegal state that triggers the new soft fail functionality. Any other suggestions? |
I'm actually working on it right now. I have a test that sets each byte in the values.dat file to zero, one at a time, then tries to read all the data. |
f6accf9
to
196cf9d
Compare
public void testCorruptValuesDatFileEntireValuesDatFile() throws IOException { | ||
for (int i = 4; i < 437; i++) { | ||
logger.debug("Corrupting byte at position " + i); | ||
repo.shutDown(); | ||
restoreFile(dataDir, "values.dat"); | ||
|
||
overwriteByteInFile(new File(dataDir, "values.dat"), i, 0x0); | ||
|
||
repo.init(); | ||
|
||
List<Statement> list = getStatements(); | ||
assertEquals(6, list.size()); | ||
|
||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually the main test that tests every single byte, except the header bytes, in the values.dat file.
In the unlikely event of corruption the system property `org.eclipse.rdf4j.sail.nativerdf.softFailOnCorruptData` can be set to `true` to | ||
allow the NativeStore to output CorruptValue/CorruptIRI/CorruptIRIOrBNode/CorruptLiteral objects. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation.
@ebner I've added some test coverage now. I think it might be best to also try to see what happens if the data is written to a trig or json-ld file, to test the very simple corrupt value implementations i added. |
…r when the corruption causes large amounts of data to be read
be2a482
to
b6215bb
Compare
core/sail/nativerdf/src/main/java/org/eclipse/rdf4j/sail/nativerdf/btree/BTree.java
Show resolved
Hide resolved
d8377b3
to
590e658
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am too distant from the native store implementation and its index structures to give a proper review. From what I can tell with going through the PR the logic seems fine: I can follow the idea and the new setting is opt-in.
Please see inline for minor (optional) comments on some messages
core/sail/nativerdf/src/main/java/org/eclipse/rdf4j/sail/nativerdf/ValueStore.java
Outdated
Show resolved
Hide resolved
core/sail/nativerdf/src/main/java/org/eclipse/rdf4j/sail/nativerdf/ValueStore.java
Show resolved
Hide resolved
// We have either managed to read enough data and can return the required subset of the data, or we have read | ||
// too little so we need to execute another read to get the correct data. | ||
if (dataLength <= data.length - 4) { | ||
// If the data length is larger than 750MB, we are likely reading the wrong data. Probably data corruption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did you come up with 750MB? Is this a special size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was what I experienced when testing.
I wouldn't think that a single literal would realistically be 750 MB. But I'll double check that the logic only applies if the system property is activated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more context to the comment
GitHub issue resolved: #5148
Briefly describe the changes proposed in this PR:
PR Author Checklist (see the contributor guidelines for more details):
mvn process-resources
to format from the command line)