[Saffron] Add update function #3004

martyall · 2025-02-05T18:38:00Z

Summary

The user needs to be able to update data with the storage provider. This is done by posting "diff" objects, not by simply posting the raw updated data. This PR implements this feature.

Changes

add Diff type and utils function for computing a Diff over binary data.
add update method on FieldBlob type. Add tests for random updates on random data.
add two cli commands:
1. calculate-diff: used by the user to calculate the Diff object using the old and new files as input 6eeb13e
2. update: Ask the storage provider to update some given data using the Diff object calculated by (1) 61bb85f
update e2e test script to perform a diff operation and check that the data was updated correctly. 2ba1a3f

codecov · 2025-02-05T20:37:22Z

Codecov Report

Attention: Patch coverage is 44.53441% with 137 lines in your changes missing coverage. Please review.

Project coverage is 76.72%. Comparing base (27404af) to head (18371d0).
Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
saffron/src/main.rs	0.00%	90 Missing ⚠️
saffron/src/utils.rs	29.41%	36 Missing ⚠️
saffron/src/cli.rs	0.00%	10 Missing ⚠️
saffron/src/blob.rs	98.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3004      +/-   ##
==========================================
- Coverage   76.80%   76.72%   -0.08%     
==========================================
  Files         261      261              
  Lines       61882    62079     +197     
==========================================
+ Hits        47530    47633     +103     
- Misses      14352    14446      +94

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dannywillems · 2025-02-06T10:06:32Z

saffron/e2e-test.sh

+   fi
+}
+
+perturb_bytes() {


Are you sure this works?

wow, no it wasn't. I have replaced it with a python script instead because perl is like hieroglyphics to me. But actually this makes me realize that the user will should keep all the commitments on their end, not just the folded commitment with powers of alpha. As of right now, they cannot update them without recomputing which is dumb.

dannywillems · 2025-02-06T10:13:52Z

saffron/e2e-test.sh

 ENCODED_FILE="${INPUT_FILE%.*}.bin"
 DECODED_FILE="${INPUT_FILE%.*}-decoded${INPUT_FILE##*.}"
+PERTURBED_FILE="${INPUT_FILE%.*}_perturbed${INPUT_FILE##*.}"
+ENCODED_DIFF_FILE="${ENCODED_FILE%.*}_diff.bin"
+DECODED_PERTURBED_FILE="${PERTURBED_FILE}-decoded${INPUT_FILE##*.}"


I guess you want to remove the extension? i.e. ${PERTURBED_FILE%.*}

DECODED_PERTURBED_FILE="${PERTURBED_FILE%.*}-decoded${PERTURBED_FILE##*.}"

ok all the file names should be fixed now

dannywillems · 2025-02-06T10:15:42Z

saffron/e2e-test.sh

 ENCODED_FILE="${INPUT_FILE%.*}.bin"
 DECODED_FILE="${INPUT_FILE%.*}-decoded${INPUT_FILE##*.}"
+PERTURBED_FILE="${INPUT_FILE%.*}_perturbed${INPUT_FILE##*.}"


Nit: unify the usage of "-" and "_".

dannywillems · 2025-02-06T10:22:00Z

saffron/e2e-test.sh

+perturb_bytes "$INPUT_FILE" "$PERTURBED_FILE" 0.1
+
+echo "Calculating diff for upated $INPUT_FILE (stored updated data in $PERTURBED_FILE)"
+cargo run --release --bin saffron calculate-diff --old "$INPUT_FILE" --new "$PERTURBED_FILE" -o "$ENCODED_DIFF_FILE" $SRS_ARG


Suggested change

cargo run --release --bin saffron calculate-diff --old "$INPUT_FILE" --new "$PERTURBED_FILE" -o "$ENCODED_DIFF_FILE" $SRS_ARG

cargo run --release --bin saffron calculate-diff --old "$INPUT_FILE" --new "$PERTURBED_FILE" -o "$ENCODED_DIFF_FILE" "$SRS_ARG"

(and the others below, to make shellcheck happy)

dannywillems · 2025-02-06T10:25:52Z

saffron/src/blob.rs

+                        .map(|i| {
+                            d.get(&i)
+                                .copied()
+                                .unwrap_or(<G as AffineRepr>::ScalarField::zero())


Suggested change

.unwrap_or(<G as AffineRepr>::ScalarField::zero())

.unwrap_or(G::ScalarField::zero())

And you can remove 7 | use ark_ec::AffineRepr;

dannywillems · 2025-02-06T10:31:01Z

saffron/src/utils.rs

+#[serde(bound = "F: CanonicalDeserialize + CanonicalSerialize")]
+pub struct Diff<F> {
+    #[serde_as(as = "Vec<HashMap<_, o1_utils::serialization::SerdeAs>>")]
+    pub evaluation_diffs: Vec<HashMap<usize, F>>,


Note that a vector is more efficient, if you don't need to access by the modified index.

I'm assuming that updates can be sparse enough that using a full vector here would be overkill?

You can use a vector of type Vec<(idx, F)>, or a linear memory layout where idx and F are inlined (tuples are not, I think).

dannywillems · 2025-02-06T10:31:55Z

saffron/src/utils.rs

+#[serde_as]
+#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
+#[serde(bound = "F: CanonicalDeserialize + CanonicalSerialize")]
+pub struct Diff<F> {


I guess we can restrict it to PrimeField directly.

dannywillems · 2025-02-06T10:32:43Z

saffron/src/utils.rs

+            .par_iter()
+            .zip(old_elems)
+            .map(|(n, o)| {
+                n.iter()


If you want to optimize further by keeping the hashmap, you can also parallelize here.

I'm not a rayon pro, but I have a bad feeling about using par_iter with something that can be really long (like thousands of elements) where the work is trivial (in this case subtracting field elems). On top of that, multiple nested par_iter.

I don't have a problem using par_iter on the chunks of size domain_size since even a 500MB blob is like 260 chunks (i.e. polynomials) and the work per chunk can be substantial.

If you know a reason I shouldn't worry about this, can you post a doc link? otherwise I would test it / time it before agreeing

dannywillems · 2025-02-06T10:34:48Z

saffron/src/blob.rs

+
+    fn random_perturbation(threshold: f64, data: &[u8]) -> Vec<u8> {
+        let mut rng = rand::thread_rng();
+        data.iter()


par_iter also here I guess.

see this comment

saffron/src/blob.rs

dannywillems · 2025-02-06T10:44:35Z

saffron/src/blob.rs

+        }))
+        {
+            // start with some random user data
+            let mut xs_blob = FieldBlob::<Vesta>::encode::<_, DefaultFqSponge<VestaParameters, PlonkSpongeConstantsKimchi>>(&*SRS, *DOMAIN, &xs);


I guess you can also add a type alias for DefaultFqSponge<VestaParameters, PlonkSpongeConstantsKimchi> at the top to get this cleaner.

dannywillems · 2025-02-06T10:56:32Z

saffron/e2e-test.sh

@@ -11,8 +11,34 @@ SRS_ARG=""
 if [ $# -eq 2 ]; then
   SRS_ARG="--srs-filepath $2"
 fi
+
 ENCODED_FILE="${INPUT_FILE%.*}.bin"
 DECODED_FILE="${INPUT_FILE%.*}-decoded${INPUT_FILE##*.}"


Also, this doesn't handle the missing dot. And what about file without dot?

dannywillems · 2025-02-06T12:50:46Z

saffron/src/main.rs

+        &srs, &domain, diff,
+    );
+    args.assert_commitment
+        .into_iter()


The type is Option<T>. I won't use a into_ter(), even if it works.

Yeah this is dumb, it's a haskell pattern but I forgot about if let in rust

dannywillems · 2025-02-06T12:53:51Z

saffron/src/blob.rs

+        domain: &Radix2EvaluationDomain<G::ScalarField>,
+        diff: Diff<G::ScalarField>,
+    ) {
+        let updates: Vec<(usize, PolyComm<G>, DensePolynomial<G::ScalarField>)> = diff


That's not what we want to do, it is not efficient. You only want to use the Lagrange polynomials.

Yeah this is what I figured as well, but I saw while working on this that it isn't part of the SRS (just the commitments). I didn't see another type in the codebase that manages this cache, am i missing something?

If we need to make something, this update needs the lagrange polynomials in the monomial basis since this is what the data is stored in. If it's an expensive one time set up (seems like it to me), can I suggest leaving a TODO and a follow up PR which:

creates a type that manages the cache of lagrange polynomials for a given domain

provied (de)serialize instances

add a cli arg to load the cache from a file, apply it where necessary

add a script that generates it and serializes it, cache it for CI like we do with srs

dannywillems · 2025-02-06T12:58:00Z

saffron/src/blob.rs

+        domain: &Radix2EvaluationDomain<G::ScalarField>,
+        diff: Diff<G::ScalarField>,
+    ) {
+        let updates: Vec<(usize, PolyComm<G>, DensePolynomial<G::ScalarField>)> = diff


Let's add also some timers, activated while in debug more. We want to know how fast it is.

dannywillems

See my comments.

…nstantsKimchi>

martyall · 2025-02-06T23:39:34Z

@dannywillems

Thanks for the thorough review! Your comments made me realize I need to do a little work RE managing commitments on the user and server side. I would like to put this back in draft for now, and submit that work as a separate PR before continuing here.

In the meantime can you address questions about rayon and the lagrange polynomials?

…r throw exception

martyall · 2025-02-07T07:21:48Z

@dannywillems I managed to correct my errors and get this working, am currently breaking it up into smaller pieces. Starting with #3005 and #3006. Going to close this as review is no longer needed. I will cherry-pick the scripts and some of the CLI stuff in more follow ups

martyall added 7 commits February 4, 2025 15:49

add update function on blob, add utils to make diff and test both

5e25af8

apparently its faster to do the msm then fold over group elems

60c6c64

newtype Diff

86a7869

add compute diff cli command

6eeb13e

added cli functionality for update

61bb85f

updated e2e script

2ba1a3f

use random threshold in update test

3f903ab

martyall marked this pull request as ready for review February 5, 2025 18:39

fix e2e test script

51b5cd7

martyall requested review from Fizzixnerd, dannywillems and marcbeunardeau88 February 5, 2025 19:20

dannywillems reviewed Feb 6, 2025

View reviewed changes

saffron/src/blob.rs Outdated Show resolved Hide resolved

dannywillems reviewed Feb 6, 2025

View reviewed changes

dannywillems requested changes Feb 6, 2025

View reviewed changes

stop using weird haskell Maybe traversal pattern cause rust != haskell

c72dcc8

martyall added 2 commits February 6, 2025 13:58

uniform type alias for DefaultFqSponge<VestaParameters, PlonkSpongeCo…

1001f18

…nstantsKimchi>

fix temp file naming

ae48088

use python script to generate random perturbation, assert file diff o…

dad985c

…r throw exception

martyall marked this pull request as draft February 7, 2025 03:48

e2e test works

18371d0

martyall mentioned this pull request Feb 7, 2025

[saffron] add commitment type shared between user and provider #3005

Open

martyall mentioned this pull request Feb 8, 2025

[saffron] Add Diff type , methods, and tests #3006

Open

martyall closed this Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Saffron] Add update function #3004

[Saffron] Add update function #3004

martyall commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 5, 2025 •

edited

Loading

dannywillems Feb 6, 2025

martyall Feb 6, 2025

dannywillems Feb 6, 2025

martyall Feb 6, 2025

dannywillems Feb 6, 2025

dannywillems Feb 6, 2025

dannywillems Feb 6, 2025

dannywillems Feb 6, 2025

martyall Feb 6, 2025

dannywillems Feb 7, 2025

dannywillems Feb 6, 2025

dannywillems Feb 6, 2025

martyall Feb 6, 2025 •

edited

Loading

dannywillems Feb 6, 2025

martyall Feb 6, 2025

dannywillems Feb 6, 2025

dannywillems Feb 6, 2025

dannywillems Feb 6, 2025

martyall Feb 6, 2025

dannywillems Feb 6, 2025

martyall Feb 6, 2025 •

edited

Loading

dannywillems Feb 6, 2025

dannywillems left a comment

martyall commented Feb 6, 2025 •

edited

Loading

martyall commented Feb 7, 2025 •

edited

Loading

	cargo run --release --bin saffron calculate-diff --old "$INPUT_FILE" --new "$PERTURBED_FILE" -o "$ENCODED_DIFF_FILE" $SRS_ARG
	cargo run --release --bin saffron calculate-diff --old "$INPUT_FILE" --new "$PERTURBED_FILE" -o "$ENCODED_DIFF_FILE" "$SRS_ARG"

	.unwrap_or(<G as AffineRepr>::ScalarField::zero())
	.unwrap_or(G::ScalarField::zero())

[Saffron] Add update function #3004

[Saffron] Add update function #3004

Conversation

martyall commented Feb 5, 2025 • edited Loading

Summary

Changes

codecov bot commented Feb 5, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martyall Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martyall Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dannywillems left a comment

Choose a reason for hiding this comment

martyall commented Feb 6, 2025 • edited Loading

martyall commented Feb 7, 2025 • edited Loading

martyall commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 5, 2025 •

edited

Loading

martyall Feb 6, 2025 •

edited

Loading

martyall Feb 6, 2025 •

edited

Loading

martyall commented Feb 6, 2025 •

edited

Loading

martyall commented Feb 7, 2025 •

edited

Loading