Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-5148 Introduce "soft fail" for corrupt ValueStore #5157

Merged
merged 13 commits into from
Nov 10, 2024

Conversation

hmottestad
Copy link
Contributor

GitHub issue resolved: #5148

Briefly describe the changes proposed in this PR:


PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

@ebner
Copy link
Contributor

ebner commented Oct 23, 2024

Thanks for the fixes @hmottestad! I haven't had time yet to look into this, but I can contribute to the documentation if you want.

Following up on another comment from your review of my PR: adding a test for an artificially corrupted store may be difficult, I don't really know how to do this reliably. Perhaps it is possible to manipulate a NativeValue instance in the ValueStore to hold an illegal state that triggers the new soft fail functionality. Any other suggestions?

@hmottestad
Copy link
Contributor Author

I'm actually working on it right now. I have a test that sets each byte in the values.dat file to zero, one at a time, then tries to read all the data.

@hmottestad hmottestad force-pushed the GH-5148-soft-fail-native-store branch from f6accf9 to 196cf9d Compare October 23, 2024 11:26
Comment on lines 183 to 198
public void testCorruptValuesDatFileEntireValuesDatFile() throws IOException {
for (int i = 4; i < 437; i++) {
logger.debug("Corrupting byte at position " + i);
repo.shutDown();
restoreFile(dataDir, "values.dat");

overwriteByteInFile(new File(dataDir, "values.dat"), i, 0x0);

repo.init();

List<Statement> list = getStatements();
assertEquals(6, list.size());

}

}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the main test that tests every single byte, except the header bytes, in the values.dat file.

Comment on lines 101 to 103
In the unlikely event of corruption the system property `org.eclipse.rdf4j.sail.nativerdf.softFailOnCorruptData` can be set to `true` to
allow the NativeStore to output CorruptValue/CorruptIRI/CorruptIRIOrBNode/CorruptLiteral objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation.

@hmottestad
Copy link
Contributor Author

@ebner I've added some test coverage now.

I think it might be best to also try to see what happens if the data is written to a trig or json-ld file, to test the very simple corrupt value implementations i added.

@hmottestad hmottestad force-pushed the GH-5148-soft-fail-native-store branch from be2a482 to b6215bb Compare October 24, 2024 11:22
@hmottestad hmottestad force-pushed the GH-5148-soft-fail-native-store branch from d8377b3 to 590e658 Compare October 28, 2024 12:29
Copy link
Contributor

@aschwarte10 aschwarte10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am too distant from the native store implementation and its index structures to give a proper review. From what I can tell with going through the PR the logic seems fine: I can follow the idea and the new setting is opt-in.
Please see inline for minor (optional) comments on some messages

// We have either managed to read enough data and can return the required subset of the data, or we have read
// too little so we need to execute another read to get the correct data.
if (dataLength <= data.length - 4) {
// If the data length is larger than 750MB, we are likely reading the wrong data. Probably data corruption.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you come up with 750MB? Is this a special size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was what I experienced when testing.

I wouldn't think that a single literal would realistically be 750 MB. But I'll double check that the logic only applies if the system property is activated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more context to the comment

@hmottestad hmottestad enabled auto-merge November 10, 2024 12:12
@hmottestad hmottestad merged commit 136be9f into main Nov 10, 2024
9 checks passed
@hmottestad hmottestad deleted the GH-5148-soft-fail-native-store branch November 10, 2024 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make ValueStore more resilient against data corruption
3 participants