Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability of only reconstructing certain sub-chunk of certain shards #79

Open
drskalman opened this issue Aug 25, 2020 · 6 comments

Comments

@drskalman
Copy link
Collaborator

We have situations that we have recovered some data and parity shards but we are only interested in recovering a specific subset of the shards instead of the whole encoding mainly due to performance issues. As such it is useful to have reconstruct_subset_of_shards function. I'm working on it and opened this issue to discuss and track implementation detail etc.

@darrenldl
Copy link
Collaborator

I think the following notes which I thought of as I revisited the code briefly might be worth mentioning:

  • If the selected subset strictly concerns only with data shards, then the code can just reconstruct the required subset
  • But if the selected subset contains any parity shard, then it should implicitly/automatically include all missing data shards in the target set as well (as required for construction of parity shards, I think)

It might be easiest to modify code_some_slices to accept an "ignored indices" argument.

Either that, or adjust the outputs parameter (https://github.com/darrenldl/reed-solomon-erasure/blob/master/src/core.rs#L473) to allow for encoding of an index being skipped, otherwise shrinking the outputs directly would lead to code_single_slice misusing i_row (https://github.com/darrenldl/reed-solomon-erasure/blob/master/src/core.rs#L487)

And I might be way off as well - I haven't read the code base in forever.

@drskalman
Copy link
Collaborator Author

drskalman commented Aug 25, 2020 via email

@burdges
Copy link
Collaborator

burdges commented Aug 25, 2020

We're interested in the "perpendicular" situation with partial shards:

We've an k-of-n encoding of a message of size m. We're using GF(2^16) since n>256, so each R-S polynomial contains exactly 2k bytes of data. As m >> 2k we necessarily split the message up across several R-S polynomials, meaning we rounded m up to some m' such that 2k | m'.

We want the data between indices u and v. We let u' be u rounded down so that 2k | u' and let v' be v rounded up so that 2k | v'. There are now specific R-S polynomials that contain the data between indices u' and v', so we fetch k shares of each of those R-S polynomials, but only those. We then reconstruct only the data between indices u' and v', and return the requested subslice between indices u and v.

We ran into an interesting authentication hiccup with this approach however:

We authenticate each retrieved shard using Merkle tree proofs. We must hash shards using a deeper Merkle tree with leaves of size exactly 2k, which requires exposing an annoying amount from this crate. We could sign these partial u' to v' shards instead of refining the Merkle proof, and then punish invalid partial shards, but doing k signatures maybe sucks here.

It's harder but maybe more useful to implement a BCH flavor of R-S codes that actually does error correction, not sure about all the trade-offs though. Berlekamp–Massey decoder? Euclidean decoder? etc. We favor authentication via Merkle proofs over doing correction usually because error correction requires downloading twice the error rate, making Merkle proofs cheaper. We're addressing a narrower threat model here though, so maybe error correction with BCH suffices, not sure.

We'll could just reconstruct from partial shards optimistically, check the partial reconstructed data's hash against another source, and then do a full reconstruction if this fails.

Appologies for rambling there..

@drskalman
Copy link
Collaborator Author

drskalman commented Aug 26, 2020 via email

@drskalman drskalman changed the title Add the ability of only reconstructing certain shards Add the ability of only reconstructing certain sub-chunk of certain shards Aug 26, 2020
@darrenldl
Copy link
Collaborator

Okay, that's interesting.

Is the sub chunk always contiguous?

@burdges
Copy link
Collaborator

burdges commented Aug 27, 2020

Yes. It's mostly an authentication question: How much do we want to make hashing and merkrle proofs overlap with the erasure coding? I suppose a shard type could expose an iterator over data in the same polynomial, which I guess is ever even index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants