Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add panic catch on operator calling FFI #1196

Merged
merged 11 commits into from
Oct 18, 2024

Conversation

JulianVentura
Copy link
Collaborator

@JulianVentura JulianVentura commented Oct 7, 2024

Motivation

We have some calls to external Rust functions on operator Golang code. If any of those functions panics, that error is propagated to the golang application and it fails completly. We want to catch those errors so this doesn't happen.

Changes

  • Add golang panic handling with recover() on VerifySp1Proof, VerifyRiscZeroReceipt and VerifyMerkleTreeBatch.
  • Add a function wrapper with a catch_unwind from Rust side for verify_sp1_proof_ffi, verify_risc_zero_receipt_ffi and verify_merkle_tree_batch_ffi.
  • Modify verify_sp1_proof_ffi, verify_risc_zero_receipt_ffi and verify_merkle_tree_batch_ffi to return i32 instead of bool to inform about errors.

How to test

In order to test properly, we first have to take the staging branch as baseline and add a panic!() in the following FFI Rust functions (one at a time):

  • verify_merkle_tree_batch_ffi on merkle_tree/lib/src/lib.rs
  • verify_sp1_proof_ffi on sp1/lib/src/lib.rs
  • verify_risc_zero_receipt_ffi on risc_zero/lib/src/lib.rs

This way, we will make the golang operator crash on a call to any of these FFIs, stoping execution immediately.

Note: Explicit instructions on how to run the operator are provided in the last section of this PR. The operator must be rebuilt after modifying the FFI functions

  1. To test the call to verify_merkle_tree_batch_ffi we have to run the hole system and start sending transactions with:
    • make batcher_send_infinite_sp1
  2. To test the call to verify_sp1_proof_ffi we have to run the hole system and start sending transactions with:
    • make batcher_send_infinite_sp1
  3. To test the call to verify_risc_zero_receipt_ffi we have to run the hole system and start sending transactions with:
    • make batcher_send_risc0_burst BURST_SIZE=5

Once you have tested the crashing behaviour, you can now switch to fix/operator-ffi-panic-catch branch and repeat the process, but this time the panics will have to be added to the inner versions of the provided FFI functions, which are:

  • inner_verify_merkle_tree_batch_ffi on merkle_tree/lib/src/lib.rs
  • inner_verify_sp1_proof_ffi on sp1/lib/src/lib.rs
  • inner_verify_risc_zero_receipt_ffi on risc_zero/lib/src/lib.rs

These should also be tested individually.

After executing you should see some errors being logged in the operator, providing information about the rust code panics. If the operator hasn't crashed, then everything is working properly.

How to run the operator

  1. Execute anvil:
  • make anvil_start_with_block_time
  1. Launch aggregator:
  • make aggregator_start
  1. Launch batcher:
  • make batcher_start_local
  1. Launch operator:
  • make build_operator
  • make operator_register_and_start CONFIG_FILE=config-files/config-operator-1.yaml

Closes #1174

@JulianVentura JulianVentura self-assigned this Oct 8, 2024
@JulianVentura JulianVentura marked this pull request as ready for review October 8, 2024 18:51
@JuArce JuArce linked an issue Oct 9, 2024 that may be closed by this pull request
Copy link
Collaborator

@MarcosNicolau MarcosNicolau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested on macos and go still panics. Have you rebuilt the ffis after every change?

I might have tested it wrongly, but would you please check If it works on your machine if it does, I'll make sure to try again.

Makefile Outdated Show resolved Hide resolved
@JuArce
Copy link
Collaborator

JuArce commented Oct 9, 2024

Would be nice a review from @Oppen

@PatStiles
Copy link
Contributor

This sucessfully caught panics for the merkle tree verification and sp1 verification but failed when I tested with risc0.

@JulianVentura JulianVentura marked this pull request as draft October 14, 2024 20:29
@JulianVentura
Copy link
Collaborator Author

I tested on macos and go still panics. Have you rebuilt the ffis after every change?

I might have tested it wrongly, but would you please check If it works on your machine if it does, I'll make sure to try again.

Could you check again, please?

@JulianVentura JulianVentura marked this pull request as ready for review October 16, 2024 19:59
Copy link
Collaborator

@MarcosNicolau MarcosNicolau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is working for me now! 🚀

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have been committed by accident.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This as well.

Copy link
Collaborator

@Oppen Oppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me and the code looks good. Just remove the leftover binaries.

Copy link
Contributor

@PatStiles PatStiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caught all panics on a local testnet.

@entropidelic entropidelic merged commit 84fc022 into staging Oct 18, 2024
5 checks passed
@entropidelic entropidelic deleted the fix/operator-ffi-panic-catch branch October 18, 2024 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add try catch on verifiers calls
7 participants