-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receipts working with Generic Struct Extraction #430
base: generic-extraction-integration-test-mapping-of-mappings
Are you sure you want to change the base?
Receipts working with Generic Struct Extraction #430
Conversation
- Move the dependencies (including version) to the workspace `Cargo.toml`. - Update the available dependencies. As update `alloy` to `0.4` (`0.5` and `0.6` of alloy could not work with integration test). - Keep the current version of `ethereum-types` and `rlp`, since both is related with our forked `eth-trie.rs`. Integration test could work.
1ae6d3e
to
c5b826e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work ! Changed a LOT of things but overall it seems we're much better with this PR !
Left many comments but many of them are aesthetics. Some of them are a bit nitty gritty, especially around the interface with DQ - if not clear, we should probably jump on a quick call at some point maybe with @hackaugusto
mp2-common/src/eth.rs
Outdated
} else if item_count == 2 { | ||
// The first item is the encoded path, if it begins with a 2 or 3 it is a leaf, else it is an extension node | ||
let first_item = rlp.at(0)?; | ||
|
||
// We want the first byte | ||
let first_byte = first_item.as_raw()[0]; | ||
|
||
// The we divide by 16 to get the first nibble | ||
match first_byte / 16 { | ||
0 | 1 => Ok(NodeType::Extension), | ||
2 | 3 => Ok(NodeType::Leaf), | ||
_ => Err(anyhow!( | ||
"Expected compact encoding beginning with 0,1,2 or 3" | ||
)), | ||
} | ||
} else { | ||
Err(anyhow!( | ||
"RLP encoded Node item count was {item_count}, expected either 17 or 2" | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works totally fine - just for sake of sharing information, the mpt library we use already has these functionality exposed but all good 👍
mp2-common/src/eth.rs
Outdated
@@ -113,6 +172,104 @@ pub struct ProofQuery { | |||
pub(crate) slot: StorageSlot, | |||
} | |||
|
|||
/// Struct used for storing relevant data to query blocks as they come in. | |||
/// The constant `NO_TOPICS` is the number of indexed items in the event (excluding the event signature) and | |||
/// `MAX_DATA` is the number of 32 byte words of data we expect in addition to the topics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah i see it's the number of words here - ok so can we make it clear when you talk about words or bytes inside the name as well ? RECEIPT_MAX_DATA_WORD
and RECEIPT_WORD_DATA_SIZE
or stg ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also it's not necessarily the number of 32 bytes words we expect but the number of bytes words we want to extract. There can be more data in the event but we can only extract so much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed it to indicate its how many words we want to extract
mp2-common/src/eth.rs
Outdated
let topics_header_len = alloy::rlp::length_of_length((1 + NO_TOPICS) * ENCODED_TOPIC_SIZE); | ||
|
||
// If the we have more than one piece of data it is rlp encoded as a string with length greater than 55 bytes | ||
let data_header_len = alloy::rlp::length_of_length(MAX_DATA * MAX_DATA_SIZE); | ||
|
||
let address_size = 21; | ||
let topics_size = (1 + NO_TOPICS) * ENCODED_TOPIC_SIZE + topics_header_len; | ||
let data_size = MAX_DATA * MAX_DATA_SIZE + data_header_len; | ||
|
||
let payload_size = address_size + topics_size + data_size; | ||
let header_size = alloy::rlp::length_of_length(payload_size); | ||
|
||
let size = header_size + payload_size; | ||
|
||
// The address itself starts after the header plus one byte for the address header. | ||
let add_rel_offset = header_size + 1; | ||
|
||
// The event signature offset is after the header, the address and the topics list header. | ||
let sig_rel_offset = header_size + address_size + topics_header_len + 1; | ||
|
||
let topics: [usize; NO_TOPICS] = | ||
create_array(|i| sig_rel_offset + (i + 1) * ENCODED_TOPIC_SIZE); | ||
|
||
let data: [usize; MAX_DATA] = create_array(|i| { | ||
header_size + address_size + topics_size + data_header_len + (i * MAX_DATA_SIZE) | ||
}); | ||
|
||
let event_sig = alloy::primitives::keccak256(event_signature.as_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a remark than a request for change but the logic to parse RLP is already available with the rlp crate we're using - that would avoid having some hardcoded constants about the sizes etc and we would just need to know what is the position of the address in the list etc. Less prone to error but if you have tests for that thing, i'm all good 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I made this function so that I didn't have to actually have and RLP encoded receipt in order to make the struct. As long as you told me the events signature, contract address and the number of topics and words of additional data we can construct everything we need to know about the table.
mp2-common/src/eth.rs
Outdated
pub async fn query_receipt_proofs<T: Transport + Clone>( | ||
&self, | ||
provider: &RootProvider<T>, | ||
block: BlockNumberOrTag, | ||
) -> Result<Vec<ReceiptProofInfo>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know i am the one that asked for this but given the recent complication we've had on dev/testnet recently, i'm wondering if we should just not use a different APIs.
The big issues we've had is that sometimes we need to retry the call because the node was down or wasn't completely in sync.
Here we're letting the execution happen inside the function.
I see potentially two ways to solve this:
- Having explicit error types: at least for the eth portion so it's gradually introducing the good practice that we should have our own errors. And if this calls returns
EthError::FailedRPC(...)
then on the DQ side we can retry automatically. If it's another error then it would be an application logic and we should error it and panic or stg. - Having this method split in twos: where there is one method
fetch_receipt_info
that DQ would call with retries, and a methodconstruct_proof
on the return argument of the former to get theVec<ReceiptProofInfo>
type. - We could also make a system of callbacks but that looks more complicated.
Any opinions, wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think explicit error types would be the best solution but should probably be handled in its own issue as once we start doing it for one part of the code its quickly going to effect everywhere just because of Result<T, E>
bubbling up everywhere. So for the time being I'll add the internals as two functions and then change whats there now to just call these in succession leaving us the most options for later.
async fn prove_value_extraction<const MAX_COLUMNS: usize, T: Transport + Clone>( | ||
&self, | ||
contract: Address, | ||
epoch: u64, | ||
pp: &PublicParameters<512, MAX_COLUMNS>, | ||
provider: &RootProvider<T, Ethereum>, | ||
) -> Result<Vec<u8>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sorry maybe lack of specs from my side:
A. we don't want the worker to have to do an external call for him to make the tasks. The task taht we give to a worker should be self contained so we're reducing the surface for errors.
If the calls to the external API (in this instance another JSON RPC call) fails, it's not because of the worker's fault or our inputs, and debugging that becomes a nightmare.
B. We want to let DQ manage the different tasks it needs to do the way it wants. Here this method is dictating that a single worker has to prove everything, which is not our setting. And in DQ, we may want to batch subset of tasks together for better parallelization.
SO that means:
- We need to have, similar to the other APIs, a method that just takes the parameters, and the inputs for a given proof
- We probably need to change the previous method that creates the update tree, to create an update tree of all the inputs but the children proofs - or just simply imitate what we have now but i think your solution goes one step further which is good imo: giving an update tree of "tasks to do" and DQ will manage to make sure each tasks get fully "assembled" (i.e. we give the children proofs as well) and in the order it wants.
Does that make sense ?
let input = CircuitInput::new_extension( | ||
proof_data.node.clone(), | ||
child_proof.proof.ok_or(anyhow::anyhow!( | ||
"Extension node child proof was a None value" | ||
))?, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an update tree of proof_data
would work and then DQ can insert the children proofs when it has them to build the CircuitInput
that it would then pass down to a worker ?
let current_row_epoch = self.table.row.current_epoch(); | ||
let current_row_keys = self | ||
.table | ||
.row | ||
.keys_at(current_row_epoch) | ||
.await | ||
.into_iter() | ||
.map(TableRowUpdate::<BlockPrimaryIndex>::Deletion) | ||
.collect::<Vec<_>>(); | ||
[current_row_keys, table_row_updates].concat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this tbh - could we rather make the test create a new row tree ? In general DQ would follow what is implemented on the integrated test and it doesn't seem like a good way forward when we go at scale (400k - 1M entries).
Just changing the row to be a fresh one is anyway something we wanna do for offchain data - im havent' checked that part so unclear how it's done now but that doesn't remove the goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell currently Ryhope doesn't have the API to do this because we need to keep the same epoch as we progress and the only way I can wipe the row tree currently is to just call MerkleRowTree::new()
which would forget the epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work for a super-complex PR!
Looked only at circuits for now, as there are already enough comments I believe xD. Besides what written in the comments, as a general note I would like to ask if possible to use much more constants rather than hard-coded values: I believe they make code much more readable as they allow to explain the semantic of the actual value being employed as a constant.
mp2-common/src/array.rs
Outdated
let lt = less_than_or_equal_to_unsafe(b, upper_bound, array_len, num_bits_size as usize); | ||
|
||
let t = b._true(); | ||
b.connect(t.target, lt.target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this check redundant given that the index of the being accessed is already checked in random_access_large_array
to be within SIZE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are correct I'll remove it
mp2-common/src/mpt_sequential/mod.rs
Outdated
/// And it returns: | ||
/// * New key with the pointer moved. | ||
/// * The child hash / value of the node. | ||
/// * A boolean that must be true if the given node is a leaf or an extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's not related to your PR but I noticed it now: shouldn't this be true if the given node is a branch one?
mp2-test/src/circuit.rs
Outdated
@@ -85,7 +85,7 @@ pub fn setup_circuit< | |||
}; | |||
|
|||
println!("[+] Circuit data built in {:?}s", now.elapsed().as_secs()); | |||
|
|||
println!("FRI config: {:?}", circuit_data.common.fri_params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
|
||
/// Given a `PartitionWitness` that has only inputs set, populates the rest of the witness using the | ||
/// given set of generators. | ||
pub fn debug_generate_partial_witness< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this debugging utility. So, it looks to me that this method is basically a copy of generate_partial_witness
of Plonky2 with some print information, is it correct? If it is, then wdyt about modifying directly the method in our Plonky2 fork (maybe adding a debug input parameter that specifies whether to print out this data or not)? In this way, we don't duplicate the logic. Wdyt?
mp2-test/src/mpt_sequential.rs
Outdated
2 => event_contract.testThreeIndexed().into_transaction_request(), | ||
3 => event_contract.testOneData().into_transaction_request(), | ||
4 => event_contract.testTwoData().into_transaction_request(), | ||
_ => unreachable!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you generating twice the same type of events? You were meant to call event_contract.twoEmits()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah don't know why that was doing that twice anymore
let row_id = Scalar::from_noncanonical_biguint(row_id); | ||
|
||
let exp_digest = total * row_id; | ||
assert_eq!(pi.values_digest(), exp_digest.to_weierstrass()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the methods provided as API here to check they are coherent with the circuit logic
mp2-v1/src/values_extraction/api.rs
Outdated
}) | ||
} | ||
|
||
/// Create a circuit input for proving a leaf MPT node of a transaction receipt. | ||
pub fn new_receipt_leaf<const NO_TOPICS: usize, const MAX_DATA: usize>( | ||
last_node: &[u8], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we name it just node
?
mp2-v1/src/values_extraction/api.rs
Outdated
LeafSingle(LeafSingleCircuit<MAX_COLUMNS>), | ||
LeafMapping(LeafMappingCircuit<MAX_COLUMNS>), | ||
LeafMappingOfMappings(LeafMappingOfMappingsCircuit<MAX_COLUMNS>), | ||
LeafReceipt(ReceiptLeafCircuit<NODE_LEN, 7>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a constant instead of 7
, also explaining why it is 7
?
pub(crate) struct MetadataTarget<const MAX_COLUMNS: usize, const MAX_FIELD_PER_EVM: usize> { | ||
pub(crate) struct TableMetadataTarget<const MAX_COLUMNS: usize, const INPUT_COLUMNS: usize> | ||
where | ||
[(); MAX_COLUMNS - INPUT_COLUMNS]:, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need INPUT_COLUMNS
to be a const generic? Can't we just use a vector given that the number of inputs columns is fixed for each circuit type? Because it is introduced a lot of const generic bounds which might make compilation heavier, so if we can get rid of it, I think it will be beneficial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a good point and I'm annoyed I didn't think of it because the INPUT_COLUMNS
gave me so much grief
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just thought the same actually applies for ExtractedColumnInfo
as well since we always iterate over the maximum possible number of columns.
/// Information about all extracted columns of the table | ||
pub(crate) extracted_columns: [ExtractedColumnInfoTarget; MAX_COLUMNS - INPUT_COLUMNS], | ||
/// The number of actual columns | ||
pub(crate) num_actual_columns: Target, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be replaced by a vector of flags which are employed as selectors in digest computation, and then the number of columns should be computed in an ad-hoc method using these flags and the number of input columns. Otherwise, if this is provided as a witness, there might be a mismatch between how many columns we actually add to the digests and the actual number of columns
This PR closes CRY-21 upon merge.
A very sizeable PR that makes the Generic Struct Extraction Code and Receipt Extraction Code work together.
It has made reasonably large differences to both. The most noticeable is the change to
ColumnData
,ColumnDataGadget
andMetadataGadget
. Since the EVM only works in bytes I have removed thebit_offset
aspects of the code and rewritten the extraction circuits to reflect this. They now no longer use lookups which should result in an overall speed up in proving time (since you no longer have to run the lookup PIOP).In addition to this, instead of just having
ColumnInfo
we now haveExtractedColumnInfo
andInputColumnInfo
. This is to reduce work done when computing value digests but still keep everything grouped together.The Receipt extraction now uses the new
TableMetadata
struct just like struct extraction does and I have also made aFrom<EventLogInfo>
implementation to easily construct columns with identifiers.I tried to remove some of the unnecessary generic constants (for instance those on storage leaf nodes and MPT extension nodes) that have a fixed length when viewed in a circuit. In order to make the compiler stop complaining about
unconstrained generic constant
I had to use the actual number values rather than being able to declare a constant and use that.For Receipt proving in the integration test we use the code in
mp2_v1/src/value_extraction/planner.rs
since we don't need to keep track of previous value extraction in the receipt case.