-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add panic catch on operator calling FFI #1196
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested on macos and go still panics. Have you rebuilt the ffis after every change?
I might have tested it wrongly, but would you please check If it works on your machine if it does, I'll make sure to try again.
Would be nice a review from @Oppen |
This sucessfully caught panics for the merkle tree verification and sp1 verification but failed when I tested with risc0. |
Could you check again, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is working for me now! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to have been committed by accident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me and the code looks good. Just remove the leftover binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caught all panics on a local testnet.
Motivation
We have some calls to external Rust functions on operator Golang code. If any of those functions panics, that error is propagated to the golang application and it fails completly. We want to catch those errors so this doesn't happen.
Changes
recover()
onVerifySp1Proof
,VerifyRiscZeroReceipt
andVerifyMerkleTreeBatch
.catch_unwind
from Rust side forverify_sp1_proof_ffi
,verify_risc_zero_receipt_ffi
andverify_merkle_tree_batch_ffi
.verify_sp1_proof_ffi
,verify_risc_zero_receipt_ffi
andverify_merkle_tree_batch_ffi
to returni32
instead ofbool
to inform about errors.How to test
In order to test properly, we first have to take the
staging
branch as baseline and add apanic!()
in the following FFI Rust functions (one at a time):verify_merkle_tree_batch_ffi
onmerkle_tree/lib/src/lib.rs
verify_sp1_proof_ffi
onsp1/lib/src/lib.rs
verify_risc_zero_receipt_ffi
onrisc_zero/lib/src/lib.rs
This way, we will make the golang operator crash on a call to any of these FFIs, stoping execution immediately.
Note: Explicit instructions on how to run the operator are provided in the last section of this PR. The operator must be rebuilt after modifying the FFI functions
verify_merkle_tree_batch_ffi
we have to run the hole system and start sending transactions with:make batcher_send_infinite_sp1
verify_sp1_proof_ffi
we have to run the hole system and start sending transactions with:make batcher_send_infinite_sp1
verify_risc_zero_receipt_ffi
we have to run the hole system and start sending transactions with:make batcher_send_risc0_burst BURST_SIZE=5
Once you have tested the crashing behaviour, you can now switch to
fix/operator-ffi-panic-catch
branch and repeat the process, but this time the panics will have to be added to theinner
versions of the provided FFI functions, which are:inner_verify_merkle_tree_batch_ffi
onmerkle_tree/lib/src/lib.rs
inner_verify_sp1_proof_ffi
onsp1/lib/src/lib.rs
inner_verify_risc_zero_receipt_ffi
onrisc_zero/lib/src/lib.rs
These should also be tested individually.
After executing you should see some errors being logged in the operator, providing information about the rust code panics. If the operator hasn't crashed, then everything is working properly.
How to run the operator
make anvil_start_with_block_time
make aggregator_start
make batcher_start_local
make build_operator
make operator_register_and_start CONFIG_FILE=config-files/config-operator-1.yaml
Closes #1174