Receipts working with Generic Struct Extraction #430

Zyouell · 2025-01-14T16:00:04Z

This PR closes CRY-21 upon merge.

A very sizeable PR that makes the Generic Struct Extraction Code and Receipt Extraction Code work together.

It has made reasonably large differences to both. The most noticeable is the change to ColumnData, ColumnDataGadget and MetadataGadget. Since the EVM only works in bytes I have removed the bit_offset aspects of the code and rewritten the extraction circuits to reflect this. They now no longer use lookups which should result in an overall speed up in proving time (since you no longer have to run the lookup PIOP).

In addition to this, instead of just having ColumnInfo we now have ExtractedColumnInfo and InputColumnInfo. This is to reduce work done when computing value digests but still keep everything grouped together.

The Receipt extraction now uses the new TableMetadata struct just like struct extraction does and I have also made a From<EventLogInfo> implementation to easily construct columns with identifiers.

I tried to remove some of the unnecessary generic constants (for instance those on storage leaf nodes and MPT extension nodes) that have a fixed length when viewed in a circuit. In order to make the compiler stop complaining about unconstrained generic constant I had to use the actual number values rather than being able to declare a constant and use that.

For Receipt proving in the integration test we use the code in mp2_v1/src/value_extraction/planner.rs since we don't need to keep track of previous value extraction in the receipt case.

linear · 2025-01-14T16:00:07Z

CRY-21 Rebase Receipt Trie Extraction onto Generic Struct Extraction

- Move the dependencies (including version) to the workspace `Cargo.toml`. - Update the available dependencies. As update `alloy` to `0.4` (`0.5` and `0.6` of alloy could not work with integration test). - Keep the current version of `ethereum-types` and `rlp`, since both is related with our forked `eth-trie.rs`. Integration test could work.

nikkolasg

Very nice work ! Changed a LOT of things but overall it seems we're much better with this PR !
Left many comments but many of them are aesthetics. Some of them are a bit nitty gritty, especially around the interface with DQ - if not clear, we should probably jump on a quick call at some point maybe with @hackaugusto

mp2-common/src/eth.rs

nikkolasg · 2025-01-15T15:48:12Z

mp2-common/src/eth.rs

+    } else if item_count == 2 {
+        // The first item is the encoded path, if it begins with a 2 or 3 it is a leaf, else it is an extension node
+        let first_item = rlp.at(0)?;
+
+        // We want the first byte
+        let first_byte = first_item.as_raw()[0];
+
+        // The we divide by 16 to get the first nibble
+        match first_byte / 16 {
+            0 | 1 => Ok(NodeType::Extension),
+            2 | 3 => Ok(NodeType::Leaf),
+            _ => Err(anyhow!(
+                "Expected compact encoding beginning with 0,1,2 or 3"
+            )),
+        }
+    } else {
+        Err(anyhow!(
+            "RLP encoded Node item count was {item_count}, expected either 17 or 2"
+        ))


Works totally fine - just for sake of sharing information, the mpt library we use already has these functionality exposed but all good 👍

nikkolasg · 2025-01-15T16:01:37Z

mp2-common/src/eth.rs

@@ -113,6 +172,104 @@ pub struct ProofQuery {
    pub(crate) slot: StorageSlot,
 }

+/// Struct used for storing relevant data to query blocks as they come in.
+/// The constant `NO_TOPICS` is the number of indexed items in the event (excluding the event signature) and
+/// `MAX_DATA` is the number of 32 byte words of data we expect in addition to the topics.


Ah i see it's the number of words here - ok so can we make it clear when you talk about words or bytes inside the name as well ? RECEIPT_MAX_DATA_WORD and RECEIPT_WORD_DATA_SIZE or stg ?

also it's not necessarily the number of 32 bytes words we expect but the number of bytes words we want to extract. There can be more data in the event but we can only extract so much.

I've renamed it to indicate its how many words we want to extract

nikkolasg · 2025-01-15T16:09:26Z

mp2-common/src/eth.rs

+        let topics_header_len = alloy::rlp::length_of_length((1 + NO_TOPICS) * ENCODED_TOPIC_SIZE);
+
+        // If the we have more than one piece of data it is rlp encoded as a string with length greater than 55 bytes
+        let data_header_len = alloy::rlp::length_of_length(MAX_DATA * MAX_DATA_SIZE);
+
+        let address_size = 21;
+        let topics_size = (1 + NO_TOPICS) * ENCODED_TOPIC_SIZE + topics_header_len;
+        let data_size = MAX_DATA * MAX_DATA_SIZE + data_header_len;
+
+        let payload_size = address_size + topics_size + data_size;
+        let header_size = alloy::rlp::length_of_length(payload_size);
+
+        let size = header_size + payload_size;
+
+        // The address itself starts after the header plus one byte for the address header.
+        let add_rel_offset = header_size + 1;
+
+        // The event signature offset is after the header, the address and the topics list header.
+        let sig_rel_offset = header_size + address_size + topics_header_len + 1;
+
+        let topics: [usize; NO_TOPICS] =
+            create_array(|i| sig_rel_offset + (i + 1) * ENCODED_TOPIC_SIZE);
+
+        let data: [usize; MAX_DATA] = create_array(|i| {
+            header_size + address_size + topics_size + data_header_len + (i * MAX_DATA_SIZE)
+        });
+
+        let event_sig = alloy::primitives::keccak256(event_signature.as_bytes());


More of a remark than a request for change but the logic to parse RLP is already available with the rlp crate we're using - that would avoid having some hardcoded constants about the sizes etc and we would just need to know what is the position of the address in the list etc. Less prone to error but if you have tests for that thing, i'm all good 👍

Ah yes I made this function so that I didn't have to actually have and RLP encoded receipt in order to make the struct. As long as you told me the events signature, contract address and the number of topics and words of additional data we can construct everything we need to know about the table.

nikkolasg · 2025-01-15T16:42:33Z

mp2-common/src/eth.rs

+    pub async fn query_receipt_proofs<T: Transport + Clone>(
+        &self,
+        provider: &RootProvider<T>,
+        block: BlockNumberOrTag,
+    ) -> Result<Vec<ReceiptProofInfo>> {


I know i am the one that asked for this but given the recent complication we've had on dev/testnet recently, i'm wondering if we should just not use a different APIs.

The big issues we've had is that sometimes we need to retry the call because the node was down or wasn't completely in sync.
Here we're letting the execution happen inside the function.
I see potentially two ways to solve this:

Having explicit error types: at least for the eth portion so it's gradually introducing the good practice that we should have our own errors. And if this calls returns EthError::FailedRPC(...) then on the DQ side we can retry automatically. If it's another error then it would be an application logic and we should error it and panic or stg.

Having this method split in twos: where there is one method fetch_receipt_info that DQ would call with retries, and a method construct_proof on the return argument of the former to get the Vec<ReceiptProofInfo> type.

We could also make a system of callbacks but that looks more complicated.

Any opinions, wdyt ?

I think explicit error types would be the best solution but should probably be handled in its own issue as once we start doing it for one part of the code its quickly going to effect everywhere just because of Result<T, E> bubbling up everywhere. So for the time being I'll add the internals as two functions and then change whats there now to just call these in succession leaving us the most options for later.

nikkolasg · 2025-01-16T17:22:43Z

mp2-v1/src/values_extraction/planner.rs

+    async fn prove_value_extraction<const MAX_COLUMNS: usize, T: Transport + Clone>(
+        &self,
+        contract: Address,
+        epoch: u64,
+        pp: &PublicParameters<512, MAX_COLUMNS>,
+        provider: &RootProvider<T, Ethereum>,
+    ) -> Result<Vec<u8>>


Ok sorry maybe lack of specs from my side:
A. we don't want the worker to have to do an external call for him to make the tasks. The task taht we give to a worker should be self contained so we're reducing the surface for errors.
If the calls to the external API (in this instance another JSON RPC call) fails, it's not because of the worker's fault or our inputs, and debugging that becomes a nightmare.
B. We want to let DQ manage the different tasks it needs to do the way it wants. Here this method is dictating that a single worker has to prove everything, which is not our setting. And in DQ, we may want to batch subset of tasks together for better parallelization.

SO that means:

We need to have, similar to the other APIs, a method that just takes the parameters, and the inputs for a given proof

We probably need to change the previous method that creates the update tree, to create an update tree of all the inputs but the children proofs - or just simply imitate what we have now but i think your solution goes one step further which is good imo: giving an update tree of "tasks to do" and DQ will manage to make sure each tasks get fully "assembled" (i.e. we give the children proofs as well) and in the order it wants.

Does that make sense ?

nikkolasg · 2025-01-16T17:33:30Z

mp2-v1/src/values_extraction/planner.rs

+                    let input = CircuitInput::new_extension(
+                        proof_data.node.clone(),
+                        child_proof.proof.ok_or(anyhow::anyhow!(
+                            "Extension node child proof was a None value"
+                        ))?,
+                    );


Maybe an update tree of proof_data would work and then DQ can insert the children proofs when it has them to build the CircuitInput that it would then pass down to a worker ?

nikkolasg · 2025-01-16T17:44:51Z

mp2-v1/tests/common/cases/indexing.rs

+                let current_row_epoch = self.table.row.current_epoch();
+                let current_row_keys = self
+                    .table
+                    .row
+                    .keys_at(current_row_epoch)
+                    .await
+                    .into_iter()
+                    .map(TableRowUpdate::<BlockPrimaryIndex>::Deletion)
+                    .collect::<Vec<_>>();
+                [current_row_keys, table_row_updates].concat()


I'm not a fan of this tbh - could we rather make the test create a new row tree ? In general DQ would follow what is implemented on the integrated test and it doesn't seem like a good way forward when we go at scale (400k - 1M entries).
Just changing the row to be a fresh one is anyway something we wanna do for offchain data - im havent' checked that part so unclear how it's done now but that doesn't remove the goal.

From what I can tell currently Ryhope doesn't have the API to do this because we need to keep the same epoch as we progress and the only way I can wipe the row tree currently is to just call MerkleRowTree::new() which would forget the epoch.

mp2-v1/tests/common/cases/indexing.rs

mp2-v1/tests/common/mod.rs

nicholas-mainardi

Nice work for a super-complex PR!
Looked only at circuits for now, as there are already enough comments I believe xD. Besides what written in the comments, as a general note I would like to ask if possible to use much more constants rather than hard-coded values: I believe they make code much more readable as they allow to explain the semantic of the actual value being employed as a constant.

nicholas-mainardi · 2025-01-15T16:01:30Z

mp2-common/src/array.rs

+        let lt = less_than_or_equal_to_unsafe(b, upper_bound, array_len, num_bits_size as usize);
+
+        let t = b._true();
+        b.connect(t.target, lt.target);


Isn't this check redundant given that the index of the being accessed is already checked in random_access_large_array to be within SIZE?

Yes you are correct I'll remove it

nicholas-mainardi · 2025-01-16T12:32:45Z

mp2-common/src/mpt_sequential/mod.rs

+///     And it returns:
+/// * New key with the pointer moved.
+/// * The child hash / value of the node.
+/// * A boolean that must be true if the given node is a leaf or an extension.


I know it's not related to your PR but I noticed it now: shouldn't this be true if the given node is a branch one?

nicholas-mainardi · 2025-01-16T12:47:59Z

mp2-test/src/circuit.rs

@@ -85,7 +85,7 @@ pub fn setup_circuit<
    };

    println!("[+] Circuit data built in {:?}s", now.elapsed().as_secs());
-
+    println!("FRI config: {:?}", circuit_data.common.fri_params);


nicholas-mainardi · 2025-01-16T12:58:58Z

mp2-test/src/circuit.rs

+
+/// Given a `PartitionWitness` that has only inputs set, populates the rest of the witness using the
+/// given set of generators.
+pub fn debug_generate_partial_witness<


Thanks for adding this debugging utility. So, it looks to me that this method is basically a copy of generate_partial_witness of Plonky2 with some print information, is it correct? If it is, then wdyt about modifying directly the method in our Plonky2 fork (maybe adding a debug input parameter that specifies whether to print out this data or not)? In this way, we don't duplicate the logic. Wdyt?

nicholas-mainardi · 2025-01-16T14:56:52Z

mp2-test/src/mpt_sequential.rs

+                2 => event_contract.testThreeIndexed().into_transaction_request(),
+                3 => event_contract.testOneData().into_transaction_request(),
+                4 => event_contract.testTwoData().into_transaction_request(),
+                _ => unreachable!(),


Why are you generating twice the same type of events? You were meant to call event_contract.twoEmits() here?

Yeah don't know why that was doing that twice anymore

nicholas-mainardi · 2025-01-22T09:54:22Z

mp2-v1/src/values_extraction/leaf_receipt.rs

+            let row_id = Scalar::from_noncanonical_biguint(row_id);
+
+            let exp_digest = total * row_id;
+            assert_eq!(pi.values_digest(), exp_digest.to_weierstrass());


We should use the methods provided as API here to check they are coherent with the circuit logic

nicholas-mainardi · 2025-01-22T10:02:21Z

mp2-v1/src/values_extraction/api.rs

        })
    }

+    /// Create a circuit input for proving a leaf MPT node of a transaction receipt.
+    pub fn new_receipt_leaf<const NO_TOPICS: usize, const MAX_DATA: usize>(
+        last_node: &[u8],


Could we name it just node?

nicholas-mainardi · 2025-01-22T10:05:01Z

mp2-v1/src/values_extraction/api.rs

+    LeafSingle(LeafSingleCircuit<MAX_COLUMNS>),
+    LeafMapping(LeafMappingCircuit<MAX_COLUMNS>),
+    LeafMappingOfMappings(LeafMappingOfMappingsCircuit<MAX_COLUMNS>),
+    LeafReceipt(ReceiptLeafCircuit<NODE_LEN, 7>),


Can we use a constant instead of 7, also explaining why it is 7?

nicholas-mainardi · 2025-01-22T10:37:01Z

mp2-v1/src/values_extraction/gadgets/metadata_gadget.rs

-pub(crate) struct MetadataTarget<const MAX_COLUMNS: usize, const MAX_FIELD_PER_EVM: usize> {
+pub(crate) struct TableMetadataTarget<const MAX_COLUMNS: usize, const INPUT_COLUMNS: usize>
+where
+    [(); MAX_COLUMNS - INPUT_COLUMNS]:,


Do we need INPUT_COLUMNS to be a const generic? Can't we just use a vector given that the number of inputs columns is fixed for each circuit type? Because it is introduced a lot of const generic bounds which might make compilation heavier, so if we can get rid of it, I think it will be beneficial

This is such a good point and I'm annoyed I didn't think of it because the INPUT_COLUMNS gave me so much grief

I've just thought the same actually applies for ExtractedColumnInfo as well since we always iterate over the maximum possible number of columns.

nicholas-mainardi · 2025-01-22T10:43:48Z

mp2-v1/src/values_extraction/gadgets/metadata_gadget.rs

+    /// Information about all extracted columns of the table
+    pub(crate) extracted_columns: [ExtractedColumnInfoTarget; MAX_COLUMNS - INPUT_COLUMNS],
+    /// The number of actual columns
+    pub(crate) num_actual_columns: Target,


I think this should be replaced by a vector of flags which are employed as selectors in digest computation, and then the number of columns should be computed in an ad-hoc method using these flags and the number of input columns. Otherwise, if this is provided as a witness, there might be a mismatch between how many columns we actually add to the digests and the actual number of columns

Zyouell requested review from nikkolasg, nicholas-mainardi and silathdiir January 14, 2025 16:00

silathdiir and others added 26 commits January 14, 2025 16:12

test with receipts encoding

0aa2abd

wip

a0104b6

further testing

23ec3ad

WIP: Receipt Trie leaves

608123f

Receipt Leaf Circuit added with tests

e581c00

Change Receipt query test

e662ee7

Address review comments

41957b4

Value digest computation corrected

0cc129d

Moved receipt value extraction location

a3a4d88

Added unit tests for receipt leaf api

0e4d25d

Rebased onto feat/receipt-trie

932e855

Reworked block extraction to extract all three roots

0652882

Added testing for final extraction API

0865682

Addressed review comments

d61d0f1

Fixed tests to pass CI

e523e81

Resolves CRY-22

3cc4527

Changed TableSource to be a trait

8de0000

Fixed stack too deep error

dca3fd1

Changed final extraction circuit set size constant

6bbd419

resolves CRY-23

aaddc76

resolves CRY-25

8894f2c

Fixed to_receipt method

804d4ee

Updated ethers receipt compatibility test

6cacb83

Receipt value extraction prover

1168d1e

Resolves CRY-26

06dba29

Zyouell added 6 commits January 14, 2025 16:21

Fixed topic cell id discrepancy

e27bb37

Fixed receipt value digest

a1b1281

RowTreeUpdate Debugging

a17c251

Correct Receipt Row Tree

7ea6126

Fixed slot input lengths for test

52b55f6

StorageSlotInfo Single Value fix

c5b826e

Zyouell force-pushed the zyouell/receipt-rebase branch from 1ae6d3e to c5b826e Compare January 14, 2025 16:35

nikkolasg reviewed Jan 16, 2025

View reviewed changes

nicholas-mainardi requested changes Jan 22, 2025

View reviewed changes

Zyouell added 8 commits January 23, 2025 10:14

WIP: Review comments

f4e2ea9

Error types for retries in mp2_common::eth

60b7a3e

Added empty block tree proof

ffbc11a

Fix for integrated test

0093b2e

Integrated test update

3c50be5

Integrated test update

37bbbe1

Integrated test update

4a8de8c

Integrated test update

894b169

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Receipts working with Generic Struct Extraction #430

Receipts working with Generic Struct Extraction #430

Zyouell commented Jan 14, 2025

linear bot commented Jan 14, 2025

nikkolasg left a comment

nikkolasg Jan 15, 2025

nikkolasg Jan 15, 2025

nikkolasg Jan 15, 2025

Zyouell Jan 20, 2025

nikkolasg Jan 15, 2025

Zyouell Jan 20, 2025

nikkolasg Jan 15, 2025

Zyouell Jan 20, 2025

nikkolasg Jan 16, 2025

nikkolasg Jan 16, 2025

nikkolasg Jan 16, 2025

Zyouell Jan 29, 2025

nicholas-mainardi left a comment

nicholas-mainardi Jan 15, 2025

Zyouell Jan 24, 2025

nicholas-mainardi Jan 16, 2025

nicholas-mainardi Jan 16, 2025

Zyouell Jan 22, 2025

nicholas-mainardi Jan 16, 2025

nicholas-mainardi Jan 16, 2025

Zyouell Jan 22, 2025

nicholas-mainardi Jan 22, 2025

nicholas-mainardi Jan 22, 2025

nicholas-mainardi Jan 22, 2025

nicholas-mainardi Jan 22, 2025

Zyouell Jan 23, 2025

Zyouell Jan 23, 2025

nicholas-mainardi Jan 22, 2025

Receipts working with Generic Struct Extraction #430

Are you sure you want to change the base?

Receipts working with Generic Struct Extraction #430

Conversation

Zyouell commented Jan 14, 2025

linear bot commented Jan 14, 2025

nikkolasg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicholas-mainardi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment