-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Saffron] Add update function #3004
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3004 +/- ##
==========================================
- Coverage 76.80% 76.72% -0.08%
==========================================
Files 261 261
Lines 61882 62079 +197
==========================================
+ Hits 47530 47633 +103
- Misses 14352 14446 +94 ☔ View full report in Codecov by Sentry. |
saffron/e2e-test.sh
Outdated
fi | ||
} | ||
|
||
perturb_bytes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, no it wasn't. I have replaced it with a python script instead because perl is like hieroglyphics to me. But actually this makes me realize that the user will should keep all the commitments on their end, not just the folded commitment with powers of alpha
. As of right now, they cannot update them without recomputing which is dumb.
saffron/e2e-test.sh
Outdated
ENCODED_FILE="${INPUT_FILE%.*}.bin" | ||
DECODED_FILE="${INPUT_FILE%.*}-decoded${INPUT_FILE##*.}" | ||
PERTURBED_FILE="${INPUT_FILE%.*}_perturbed${INPUT_FILE##*.}" | ||
ENCODED_DIFF_FILE="${ENCODED_FILE%.*}_diff.bin" | ||
DECODED_PERTURBED_FILE="${PERTURBED_FILE}-decoded${INPUT_FILE##*.}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you want to remove the extension? i.e. ${PERTURBED_FILE%.*}
DECODED_PERTURBED_FILE="${PERTURBED_FILE%.*}-decoded${PERTURBED_FILE##*.}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok all the file names should be fixed now
saffron/e2e-test.sh
Outdated
ENCODED_FILE="${INPUT_FILE%.*}.bin" | ||
DECODED_FILE="${INPUT_FILE%.*}-decoded${INPUT_FILE##*.}" | ||
PERTURBED_FILE="${INPUT_FILE%.*}_perturbed${INPUT_FILE##*.}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unify the usage of "-" and "_".
saffron/e2e-test.sh
Outdated
perturb_bytes "$INPUT_FILE" "$PERTURBED_FILE" 0.1 | ||
|
||
echo "Calculating diff for upated $INPUT_FILE (stored updated data in $PERTURBED_FILE)" | ||
cargo run --release --bin saffron calculate-diff --old "$INPUT_FILE" --new "$PERTURBED_FILE" -o "$ENCODED_DIFF_FILE" $SRS_ARG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cargo run --release --bin saffron calculate-diff --old "$INPUT_FILE" --new "$PERTURBED_FILE" -o "$ENCODED_DIFF_FILE" $SRS_ARG | |
cargo run --release --bin saffron calculate-diff --old "$INPUT_FILE" --new "$PERTURBED_FILE" -o "$ENCODED_DIFF_FILE" "$SRS_ARG" |
(and the others below, to make shellcheck happy)
saffron/src/blob.rs
Outdated
.map(|i| { | ||
d.get(&i) | ||
.copied() | ||
.unwrap_or(<G as AffineRepr>::ScalarField::zero()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.unwrap_or(<G as AffineRepr>::ScalarField::zero()) | |
.unwrap_or(G::ScalarField::zero()) |
And you can remove 7 | use ark_ec::AffineRepr;
#[serde(bound = "F: CanonicalDeserialize + CanonicalSerialize")] | ||
pub struct Diff<F> { | ||
#[serde_as(as = "Vec<HashMap<_, o1_utils::serialization::SerdeAs>>")] | ||
pub evaluation_diffs: Vec<HashMap<usize, F>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a vector is more efficient, if you don't need to access by the modified index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that updates can be sparse enough that using a full vector here would be overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use a vector of type Vec<(idx, F)>, or a linear memory layout where idx and F are inlined (tuples are not, I think).
saffron/src/utils.rs
Outdated
#[serde_as] | ||
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)] | ||
#[serde(bound = "F: CanonicalDeserialize + CanonicalSerialize")] | ||
pub struct Diff<F> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can restrict it to PrimeField
directly.
saffron/src/utils.rs
Outdated
.par_iter() | ||
.zip(old_elems) | ||
.map(|(n, o)| { | ||
n.iter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to optimize further by keeping the hashmap, you can also parallelize here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a rayon
pro, but I have a bad feeling about using par_iter
with something that can be really long (like thousands of elements) where the work is trivial (in this case subtracting field elems). On top of that, multiple nested par_iter
.
I don't have a problem using par_iter
on the chunks of size domain_size
since even a 500MB
blob is like 260 chunks (i.e. polynomials) and the work per chunk can be substantial.
If you know a reason I shouldn't worry about this, can you post a doc link? otherwise I would test it / time it before agreeing
|
||
fn random_perturbation(threshold: f64, data: &[u8]) -> Vec<u8> { | ||
let mut rng = rand::thread_rng(); | ||
data.iter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
par_iter
also here I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see this comment
saffron/src/blob.rs
Outdated
})) | ||
{ | ||
// start with some random user data | ||
let mut xs_blob = FieldBlob::<Vesta>::encode::<_, DefaultFqSponge<VestaParameters, PlonkSpongeConstantsKimchi>>(&*SRS, *DOMAIN, &xs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you can also add a type alias for DefaultFqSponge<VestaParameters, PlonkSpongeConstantsKimchi>
at the top to get this cleaner.
saffron/e2e-test.sh
Outdated
@@ -11,8 +11,34 @@ SRS_ARG="" | |||
if [ $# -eq 2 ]; then | |||
SRS_ARG="--srs-filepath $2" | |||
fi | |||
|
|||
ENCODED_FILE="${INPUT_FILE%.*}.bin" | |||
DECODED_FILE="${INPUT_FILE%.*}-decoded${INPUT_FILE##*.}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this doesn't handle the missing dot. And what about file without dot?
saffron/src/main.rs
Outdated
&srs, &domain, diff, | ||
); | ||
args.assert_commitment | ||
.into_iter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type is Option<T>
. I won't use a into_ter()
, even if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is dumb, it's a haskell pattern but I forgot about if let
in rust
saffron/src/blob.rs
Outdated
domain: &Radix2EvaluationDomain<G::ScalarField>, | ||
diff: Diff<G::ScalarField>, | ||
) { | ||
let updates: Vec<(usize, PolyComm<G>, DensePolynomial<G::ScalarField>)> = diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what we want to do, it is not efficient. You only want to use the Lagrange polynomials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is what I figured as well, but I saw while working on this that it isn't part of the SRS
(just the commitments). I didn't see another type in the codebase that manages this cache, am i missing something?
If we need to make something, this update needs the lagrange polynomials in the monomial basis since this is what the data is stored in. If it's an expensive one time set up (seems like it to me), can I suggest leaving a TODO and a follow up PR which:
- creates a type that manages the cache of lagrange polynomials for a given domain
- provied (de)serialize instances
- add a cli arg to load the cache from a file, apply it where necessary
- add a script that generates it and serializes it, cache it for CI like we do with srs
saffron/src/blob.rs
Outdated
domain: &Radix2EvaluationDomain<G::ScalarField>, | ||
diff: Diff<G::ScalarField>, | ||
) { | ||
let updates: Vec<(usize, PolyComm<G>, DensePolynomial<G::ScalarField>)> = diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add also some timers, activated while in debug more. We want to know how fast it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments.
Thanks for the thorough review! Your comments made me realize I need to do a little work RE managing commitments on the user and server side. I would like to put this back in draft for now, and submit that work as a separate PR before continuing here. In the meantime can you address questions about |
…r throw exception
@dannywillems I managed to correct my errors and get this working, am currently breaking it up into smaller pieces. Starting with #3005 and #3006. Going to close this as review is no longer needed. I will cherry-pick the scripts and some of the CLI stuff in more follow ups |
Summary
The user needs to be able to update data with the storage provider. This is done by posting "diff" objects, not by simply posting the raw updated data. This PR implements this feature.
Changes
Diff
type and utils function for computing aDiff
over binary data.FieldBlob
type. Add tests for random updates on random data.calculate-diff
: used by the user to calculate theDiff
object using theold
andnew
files as input 6eeb13eupdate
: Ask the storage provider to update some given data using theDiff
object calculated by(1)
61bb85f