Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: regressions introduced by latest main #831

Closed
enitrat opened this issue Oct 3, 2024 · 44 comments
Closed

bug: regressions introduced by latest main #831

enitrat opened this issue Oct 3, 2024 · 44 comments
Assignees

Comments

@enitrat
Copy link

enitrat commented Oct 3, 2024

main (commit hash to confirm again) introduced a few regressions.

clone https://github.com/kkrt-labs/ef-tests
ensure correct branch git checkout dev/bump-native
make
make setup-kakarot
cargo test --test tests stStack --no-fail-fast --features "v1,native,ci" -- --nocapture
^ this will run the full suite for the stStack family. The first issue is also reproducible when running a specific test, with
e.g. cargo test test_sar_2_xor_255_256_d0g0v0_Cancun --features v1,native -- --nocapture

  1. Execution regressions

The execution is no longer the same. What used to pass, now fails with:

 WARN ef_testing::models::result: stStackTests::shallowStack_d10g0v0_Cancun reverted:
Transaction execution has failed:
0: Error in the called contract (contract address: 0x0000000000000000000000000000000000000000000000000000000000000003, class hash: 0x06153ccf69fd20f832c794df36e19135f0070d0576144f0b47f75a226e4be530, selector: 0x015d40a3d6ca2ac30f4031e42be28da9b056fef9bb7357ac5e85627ee876e5ad):
Error at pc=0:25:
Cairo traceback (most recent call last):
Unknown location (pc=0:731)
Unknown location (pc=0:677)
Unknown location (pc=0:291)
Unknown location (pc=0:314)

1: Error in the called contract (contract address: 0x04c15a3ff2822ca5b75a67a2ff0226b8b18a94c1c7d396b0073269c5b074d14f, class hash: 0x05b953b2f2ed1e6a4943dc934cc8a5fb12de801745a40857f4459bb375c0f04c, selector: 0x007ec457cd7ed1630225a8328f826a29a327b19486f6b2882b4176545ebdbe3d):
Native execution error: EOA: could not decode tx

thread 'stStackTests::stStackTests::test_shallowStack_d10g0v0_Cancun' panicked at crates/ef-testing/tests/stStackTests.rs:1379:24:
Error while running the test: Other(
gas used mismatch: expected 300000, got 0
nonce mismatch for 0xa94f5374fce5edbc8e2a8697c15331677e6ebf0b: expected                              0x1, got                              0x0
revert reason:
)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test stStackTests::stStackTests::test_shallowStack_d10g0v0_Cancun ... FAILED

^ This happens for practically all tests and highlights a difference somewhere during execution. Probably added by one of the latest commits un the last PR

 WARN ef_testing::models::result: stStackTests::shallowStack_d53g0v0_Cancun reverted:
Transaction execution has failed:
0: Error in the called contract (contract address: 0x0000000000000000000000000000000000000000000000000000000000000003, class hash: 0x06153ccf69fd20f832c794df36e19135f0070d0576144f0b47f75a226e4be530, selector: 0x015d40a3d6ca2ac30f4031e42be28da9b056fef9bb7357ac5e85627ee876e5ad):
Error at pc=0:25:
Cairo traceback (most recent call last):
Unknown location (pc=0:731)
Unknown location (pc=0:677)
Unknown location (pc=0:291)
Unknown location (pc=0:314)

1: Error in the called contract (contract address: 0x04c15a3ff2822ca5b75a67a2ff0226b8b18a94c1c7d396b0073269c5b074d14f, class hash: 0x05b953b2f2ed1e6a4943dc934cc8a5fb12de801745a40857f4459bb375c0f04c, selector: 0x007ec457cd7ed1630225a8328f826a29a327b19486f6b2882b4176545ebdbe3d):
Native execution error: Native execution error: Invalid chain id

thread 'stStackTests::stStackTests::test_shallowStack_d54g0v0_Cancun' panicked at crates/ef-testing/tests/stStackTests.rs:1955:24:
Error while running the test: Other(
gas used mismatch: expected 300000, got 0
nonce mismatch for 0xa94f5374fce5edbc8e2a8697c15331677e6ebf0b: expected                              0x1, got                              0x0
revert reason:
)
test stStackTests::stStackTests::test_shallowStack_d54g0v0_Cancun ... FAILED
  1. a weird SIGRAP issue that terminates the execution:
// ... logs above
error: test failed, to rerun pass `-p ef-testing --test tests`

Caused by:
  process didn't exit successfully: `/Users/msaug/kkrt-labs/ef-tests/target/debug/deps/tests-48dbd4207bfa156f stStack --nocapture` (signal: 5, SIGTRAP: trace/breakpoint trap)

^ This happens when running multiple tests at one - but i don't know exactly which one triggers this

@azteca1998 azteca1998 self-assigned this Oct 3, 2024
@enitrat
Copy link
Author

enitrat commented Oct 3, 2024

I ran another set of tests based on 1b8656c and it seems the issue was already present at this commit, i will come back further

@azteca1998
Copy link
Collaborator

Is the write-after-free issue no longer reproducible? I can debug that one while you try to find when this one happened.

@enitrat
Copy link
Author

enitrat commented Oct 3, 2024

Is the write-after-free issue no longer reproducible?

So far (on 79a8dcf) I haven't been able to repro. I'm not sure what conditions lead to it happening; perhaps because of parallel threads during test execution

@enitrat
Copy link
Author

enitrat commented Oct 3, 2024

so, I confirm that the regression was actually added on c866fbf. Things work fine when using ae17dd3 commit.

edit: mistake on my end, it comes with 79a8dcf

@azteca1998
Copy link
Collaborator

Strange. That commit just replaces an assert with logic to handle that assert. Since the test used to pass it means that the assert wasn't triggered, therefore entering the branch that generates the exact same instructions as the previous commit.

There should be no difference in the generated code between before and after this commit.

@enitrat
Copy link
Author

enitrat commented Oct 3, 2024

🤔 let me retry, perhaps a cache issue. in the meantime, I added repro guidelines based on latest main. It's a bit tedious because i need to update several deps repos and might have lost myself in the process 😓

@azteca1998
Copy link
Collaborator

I think the issue was already there in the commit that updates to LLVM 19... Either that or I don't know how to run your tests.

I had to comment the lines that disable the stack tests in blockchain-tests-skip.yml. Otherwise all tests would be ignored and did nothing if forced to run anyways.

I also tried running make runtime in native at the LLVM 19 update commit and in the current main just in case there was an update there, but with no luck.

@enitrat
Copy link
Author

enitrat commented Oct 3, 2024

I think the issue was already there in the commit that updates to LLVM 19... Either that or I don't know how to run your tests.

I didn't have the issue in branch fix-clone-drop-for-enum-structs

I had to comment the lines that disable the stack tests in blockchain-tests-skip.yml. Otherwise all tests would be ignored and did nothing if forced to run anyways.

Oh, sorry, I committed this file 🤦

I also tried running make runtime in native at the LLVM 19 update commit and in the current main just in case there was an update there, but with no luck.

Native is running in the blockifier dependency, which on the branch above is fixed to 79a8dcf, so even if you update the runtime locally, the executor will be compiled based on the Native version used in the "sequencer" dependency.

@azteca1998
Copy link
Collaborator

I changed that dependency too, otherwise it wouldn't compile. Both blockifier and your crate pointed to my local folder.

@enitrat
Copy link
Author

enitrat commented Oct 3, 2024

Ok - I re-ran on c866fbf, it works, I must've forgotten to clear the cached executor.

@enitrat
Copy link
Author

enitrat commented Oct 3, 2024

I changed that dependency too, otherwise it wouldn't compile. Both blockifier and your crate pointed to my local folder.

I'm surprised, I don't see any compilation error. I edited the skipfile for it to work properly

@enitrat
Copy link
Author

enitrat commented Oct 3, 2024

Here's the LLDB trace for the write-after-free issue when running: cargo test --test tests stStack --no-fail-fast --features "v1,native,ci" (runs all tests in a single target and filters for stStack)

Here, we don't see cairo native being active: it's because the initial call is made to a Cairo0 account, thus is executed by the regular CairoVM.

The call flow is
tx -> relayer (cairo0) -> account (cairo1) -> kakarot(cairo1) -> end

I guess it fails in the relayer step?

* thread #18, name = 'stStackTests::stStackTests::test_stackOverflowM1DUP_d10g0v0_Can', stop reason = signal SIGABRT
  * frame #0: 0x0000000197ed4764 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000197f0bc28 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x0000000197e19ae8 libsystem_c.dylib`abort + 180
    frame #3: 0x0000000197d3ae28 libsystem_malloc.dylib`malloc_vreport + 908
    frame #4: 0x0000000197d515d4 libsystem_malloc.dylib`malloc_zone_error + 104
    frame #5: 0x0000000197d5128c libsystem_malloc.dylib`free_list_checksum_botch + 40
    frame #6: 0x0000000197d32f5c libsystem_malloc.dylib`tiny_free_list_remove_ptr + 452
    frame #7: 0x0000000197d32458 libsystem_malloc.dylib`tiny_free_no_lock + 436
    frame #8: 0x0000000197d32120 libsystem_malloc.dylib`free_tiny + 496
    frame #9: 0x000000010b9a052c tests-0123119940fab6c0`__rdl_realloc [inlined] std::sys::pal::unix::alloc::_$LT$impl$u20$core..alloc..global..GlobalAlloc$u20$for$u20$std..alloc..System$GT$::dealloc::hc599de7cd06b14b4 at alloc.rs:48:9 [opt]
    frame #10: 0x000000010b9a0524 tests-0123119940fab6c0`__rdl_realloc [inlined] std::sys::pal::common::alloc::realloc_fallback::hf475d51777e5c5cb at alloc.rs:56:9 [opt]
    frame #11: 0x000000010b9a04dc tests-0123119940fab6c0`__rdl_realloc [inlined] std::sys::pal::unix::alloc::_$LT$impl$u20$core..alloc..global..GlobalAlloc$u20$for$u20$std..alloc..System$GT$::realloc::he5221d6b126ec4ab at alloc.rs:56:13 [opt]
    frame #12: 0x000000010b9a04dc tests-0123119940fab6c0`__rdl_realloc at alloc.rs:424:13 [opt]
    frame #13: 0x000000010b6d8104 tests-0123119940fab6c0`alloc::alloc::Global::grow_impl::h2451d937db69a051 [inlined] alloc::alloc::realloc::h0c8df755c611e652 at alloc.rs:138:14
    frame #14: 0x000000010b6d80e8 tests-0123119940fab6c0`alloc::alloc::Global::grow_impl::h2451d937db69a051(self=0x0000000136226e28, ptr=(pointer = ""), old_layout=Layout @ 0x000000031fff42f8, new_layout=Layout @ 0x000000031fff4308, zeroed=false) at alloc.rs:215:31
    frame #15: 0x000000010b6d8454 tests-0123119940fab6c0`_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$::grow::hf3ec351522dd37f2(self=0x0000000136226e28, ptr=(pointer = ""), old_layout=Layout @ 0x000000031fff4550, new_layout=Layout @ 0x000000031fff4560) at alloc.rs:268:18
    frame #16: 0x000000010b6d5008 tests-0123119940fab6c0`alloc::raw_vec::finish_grow::h74ef05c11f3e31f1(new_layout=Result<core::alloc::layout::Layout, core::alloc::layout::LayoutError> @ 0x000000031fff4620, current_memory=Option<(core::ptr::non_null::NonNull<u8>, core::alloc::layout::Layout)> @ 0x000000031fff48a8, alloc=0x0000000136226e28) at raw_vec.rs:587:13
    frame #17: 0x000000010b1c428c tests-0123119940fab6c0`alloc::raw_vec::RawVec$LT$T$C$A$GT$::grow_amortized::h2e7b27970944a87d(self=0x0000000136226e18, len=6, additional=5) at raw_vec.rs:486:19
    frame #18: 0x000000010b1c165c tests-0123119940fab6c0`alloc::raw_vec::RawVec$LT$T$C$A$GT$::try_reserve::he07a584328cd9550(self=0x0000000136226e18, len=6, additional=5) at raw_vec.rs:372:13
    frame #19: 0x000000010b1700bc tests-0123119940fab6c0`alloc::vec::Vec$LT$T$C$A$GT$::try_reserve::hcf4171a006c4cb35(self=0x0000000136226e18, additional=5) at mod.rs:1039:9
    frame #20: 0x000000010b132290 tests-0123119940fab6c0`cairo_vm::vm::vm_memory::memory::Memory::insert::h72d4a8bef1e04c33(self=0x000000031fff68a0, key=(segment_index = 1, offset = 10), val=0x000000031fff5160) at memory.rs:209:13
    frame #21: 0x000000010b15a32c tests-0123119940fab6c0`cairo_vm::vm::vm_core::VirtualMachine::insert_deduced_operands::h069c2aaece09cf80(self=0x000000031fff6870, deduced_operands=(__0 = '\x01'), operands=0x000000031fff5160, operands_addresses=0x000000031fff5200) at vm_core.rs:395:13
    frame #22: 0x000000010b15a478 tests-0123119940fab6c0`cairo_vm::vm::vm_core::VirtualMachine::run_instruction::hb944f2c213bd0e88(self=0x000000031fff6870, instruction=0x000000011f8495a0) at vm_core.rs:407:9
    frame #23: 0x000000010b15af48 tests-0123119940fab6c0`cairo_vm::vm::vm_core::VirtualMachine::step_instruction::h5738b4ef0ffbbe0d(self=0x000000031fff6870) at vm_core.rs:527:17
    frame #24: 0x000000010b15b290 tests-0123119940fab6c0`cairo_vm::vm::vm_core::VirtualMachine::step::hb91319f27d5fe2de(self=0x000000031fff6870, hint_processor=&mut dyn cairo_vm::hint_processor::hint_processor_definition::HintProcessor @ 0x000000031fff5c60, exec_scopes=0x000000031fff6b38, hint_datas=&[alloc::boxed::Box<dyn core::any::Any, alloc::alloc::Global>] @ 0x000000031fff5c78, constants=0x000000031fff6af0) at vm_core.rs:567:9
    frame #25: 0x000000010b20c98c tests-0123119940fab6c0`cairo_vm::vm::runners::cairo_runner::CairoRunner::run_until_pc::h1501663d902ed11b(self=0x000000031fff6870, address=(segment_index = 9, offset = 0), hint_processor=&mut dyn cairo_vm::hint_processor::hint_processor_definition::HintProcessor @ 0x000000031fff5ee8) at cairo_runner.rs:655:13
    frame #26: 0x000000010b20e798 tests-0123119940fab6c0`cairo_vm::vm::runners::cairo_runner::CairoRunner::run_from_entrypoint::hd3b1809f65144c52(self=0x000000031fff6870, entrypoint=699, args=&[&cairo_vm::vm::runners::cairo_runner::CairoArg] @ 0x000000031fff6458, verify_secure=true, program_segment_size=Option<usize> @ 0x000000031fff6470, hint_processor=&mut dyn cairo_vm::hint_processor::hint_processor_definition::HintProcessor @ 0x000000031fff6480) at cairo_runner.rs:1111:9
    frame #27: 0x0000000101c984cc tests-0123119940fab6c0`blockifier::execution::deprecated_entry_point_execution::run_entry_point::hd0cd38b2c30e115b(runner=0x000000031fff6870, hint_processor=0x000000031fff6c20, entry_point_pc=699, args=Vec<cairo_vm::vm::runners::cairo_runner::CairoArg, alloc::alloc::Global> @ 0x000000031fff8130) at deprecated_entry_point_execution.rs:198:18

@enitrat
Copy link
Author

enitrat commented Oct 3, 2024

LLDB trace when running: cargo test stStack --features v1,native (only run the stStack tests)

* thread #25, name = 'stStackTests::test_stackOverflowM1DUP_d13g0v0_Cancun', stop reason = signal SIGABRT
  * frame #0: 0x0000000197ed4764 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000197f0bc28 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x0000000197e19ae8 libsystem_c.dylib`abort + 180
    frame #3: 0x0000000197d3ae28 libsystem_malloc.dylib`malloc_vreport + 908
    frame #4: 0x0000000197d515d4 libsystem_malloc.dylib`malloc_zone_error + 104
    frame #5: 0x0000000197d51354 libsystem_malloc.dylib`_tiny_check_and_zero_inline_meta_from_freelist + 188
    frame #6: 0x0000000197d2ff70 libsystem_malloc.dylib`tiny_malloc_from_free_list + 640
    frame #7: 0x0000000197d2f670 libsystem_malloc.dylib`tiny_malloc_should_clear + 216
    frame #8: 0x0000000197d2e24c libsystem_malloc.dylib`szone_malloc_should_clear + 92
    frame #9: 0x00000001001fdd80 stStackTests-9af27425979037f3`alloc::alloc::alloc::h333164c5ca5691c6(layout=Layout @ 0x000000037001adb8) at alloc.rs:100:9
    frame #10: 0x00000001001fde98 stStackTests-9af27425979037f3`alloc::alloc::Global::alloc_impl::h51d1578827c2007b(self=0x000000037001afe7, layout=Layout @ 0x000000037001ae58, zeroed=false) at alloc.rs:183:73
    frame #11: 0x00000001001fe438 stStackTests-9af27425979037f3`_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$::allocate::h5f5e0c10f2fbdaf9(self=0x000000037001afe7, layout=Layout @ 0x000000037001afa0) at alloc.rs:243:9
    frame #12: 0x00000001001cf620 stStackTests-9af27425979037f3`alloc::boxed::Box$LT$T$C$A$GT$::try_new_uninit_in::h95ed53467f54c743(alloc=Global @ 0x000000037001afe7) at boxed.rs:529:13
    frame #13: 0x00000001001cf140 stStackTests-9af27425979037f3`alloc::boxed::Box$LT$T$C$A$GT$::new_uninit_in::h3d30583466457255(alloc=Global @ 0x000000037001b0a7) at boxed.rs:491:15
    frame #14: 0x00000001001cc850 stStackTests-9af27425979037f3`alloc::collections::btree::node::LeafNode$LT$K$C$V$GT$::new::h1fcbe726819ce489(alloc=Global @ 0x000000037001b11e) at node.rs:83:28
    frame #15: 0x00000001001d5e20 stStackTests-9af27425979037f3`alloc::collections::btree::map::entry::VacantEntry$LT$K$C$V$C$A$GT$::insert::hf5380ac41efb76b9 [inlined] alloc::collections::btree::node::NodeRef$LT$alloc..collections..btree..node..marker..Owned$C$K$C$V$C$alloc..collections..btree..node..marker..Leaf$GT$::new_leaf::h7db0531265f53255 at node.rs:217:29
    frame #16: 0x00000001001d5e1c stStackTests-9af27425979037f3`alloc::collections::btree::map::entry::VacantEntry$LT$K$C$V$C$A$GT$::insert::hf5380ac41efb76b9(self=VacantEntry<ruint::Uint<256, 4>, ruint::Uint<256, 4>, alloc::alloc::Global> @ 0x000000037001b420, value=Uint<256, 4> @ 0x000000037001b460) at entry.rs:356:32
    frame #17: 0x00000001001fd9ec stStackTests-9af27425979037f3`alloc::collections::btree::map::BTreeMap$LT$K$C$V$C$A$GT$::insert::h9e0752f0049e0ed9(self=0x000000037001b500, key=<unavailable>, value=Uint<256, 4> @ 0x000000037001b610) at map.rs:991:17
    frame #18: 0x00000001001fb504 stStackTests-9af27425979037f3`_$LT$serde..de..impls..$LT$impl$u20$serde..de..Deserialize$u20$for$u20$alloc..collections..btree..map..BTreeMap$LT$K$C$V$GT$$GT$..deserialize..MapVisitor$LT$K$C$V$GT$$u20$as$u20$serde..de..Visitor$GT$::visit_map::hf24149fa3d417930((null)=(marker = core::marker::PhantomData<alloc::collections::btree::map::BTreeMap<ruint::Uint<256, 4>, ruint::Uint<256, 4>, alloc::alloc::Global> > @ 0x000000037001b687), map=MapAccess<serde_json::read::StrRead> @ 0x000000037001b4f0) at impls.rs:1554:29
    frame #19: 0x00000001001e1294 stStackTests-9af27425979037f3`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_map::h1601009c3235db83(self=0x000000037001d2c0, visitor=(marker = core::marker::PhantomData<alloc::collections::btree::map::BTreeMap<ruint::Uint<256, 4>, ruint::Uint<256, 4>, alloc::alloc::Global> > @ 0x000000037001b727)) at de.rs:1795:31
    frame #20: 0x00000001001fdfb0 stStackTests-9af27425979037f3`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$alloc..collections..btree..map..BTreeMap$LT$K$C$V$GT$$GT$::deserialize::h0b063634e9c07089(deserializer=0x000000037001d2c0) at impls.rs:1562:17
    frame #21: 0x00000001001d1a58 stStackTests-9af27425979037f3`_$LT$core..marker..PhantomData$LT$T$GT$$u20$as$u20$serde..de..DeserializeSeed$GT$::deserialize::h26523c67ea91cdc8((null)=PhantomData<alloc::collections::btree::map::BTreeMap<ruint::Uint<256, 4>, ruint::Uint<256, 4>, alloc::alloc::Global>> @ 0x000000037001b8b7, deserializer=0x000000037001d2c0) at mod.rs:800:9
    frame #22: 0x00000001001da2d8 stStackTests-9af27425979037f3`_$LT$serde_json..de..MapAccess$LT$R$GT$$u20$as$u20$serde..de..MapAccess$GT$::next_value_seed::hae6bf5f3728cadfb(self=0x000000037001b9c8, seed=PhantomData<alloc::collections::btree::map::BTreeMap<ruint::Uint<256, 4>, ruint::Uint<256, 4>, alloc::alloc::Global>> @ 0x000000037001b917) at de.rs:2011:9
    frame #23: 0x00000001001d770c stStackTests-9af27425979037f3`serde::de::MapAccess::next_value::hfba31c5fe87f3614(self=0x000000037001b9c8) at mod.rs:1873:9
    frame #24: 0x00000001001f3890 stStackTests-9af27425979037f3`_$LT$ef_tests..models.._..$LT$impl$u20$serde..de..Deserialize$u20$for$u20$ef_tests..models..Account$GT$..deserialize..__Visitor$u20$as$u20$serde..de..Visitor$GT$::visit_map::h7e8f82f91c8fde30((null)=(marker = core::marker::PhantomData<ef_tests::models::Account> @ 0x000000037001c077, lifetime = core::marker::PhantomData<void *> @ 0x000000037001c077), __map=MapAccess<serde_json::read::StrRead> @ 0x000000037001b9c8) at models.rs:198:32

-> In this one, we don't even reach the step where a tx is sent 🤔 Could there be side effects due to multithreading and the execution of another test in parallel?

@enitrat
Copy link
Author

enitrat commented Oct 3, 2024

I believe it was introduced in 4aa739b

I did a run on 483726e - fine

The next one on 47ffbc1 - broken

@azteca1998
Copy link
Collaborator

You're right. I implemented an if statement backwards. Could you try with #832?

@enitrat
Copy link
Author

enitrat commented Oct 4, 2024

Fixed 👍

@azteca1998
Copy link
Collaborator

What else was pending?

  • Memory usage? I think it should've improved with this fix.
  • The problem where values got corrupted? Was it a different manifestation of this same bug?

@enitrat
Copy link
Author

enitrat commented Oct 4, 2024

Memory usage looks a lot better - perhaps a few persisting issues?:

image

I am starting a run on CI and will report back in ~1hr

I think everything's fine now (secp256 point is on the blockifier side)

@enitrat
Copy link
Author

enitrat commented Oct 4, 2024

Unfortunately it still fails our CI... I think it consumes too much memory
https://github.com/kkrt-labs/kakarot-ssj/actions/runs/11182543855/job/31089141465

There's no explicit shutdown message so I believe we're just hitting a resource limit

@enitrat
Copy link
Author

enitrat commented Oct 4, 2024

#826 (comment)

The max memory usage hit in a run previously (before the snapshot-drop fixes) was ~13Gb, now it's 25Gb which might be too big for our runner. So we probably will need to upgrade it, or find a way to decrease memory consumption in Native mode

@azteca1998
Copy link
Collaborator

Wait, don't upgrade yet. I know for a fact we still have some memory leaks.

I'm working on a piece of code that'll help me find them. I'll let you know once I've fixed the ones I've found.

@enitrat
Copy link
Author

enitrat commented Oct 4, 2024

thank you very much 🙏

@azteca1998
Copy link
Collaborator

Could you try with #833? That one fixes all the memory leaks I could find running all our JIT tests.

@enitrat
Copy link
Author

enitrat commented Oct 7, 2024

image looks good now !

@enitrat
Copy link
Author

enitrat commented Oct 8, 2024

hm actually I think there is a very specific issue that happens in specific test cases that end up running forever: CI took 6hrs to run and ended up failing due to time limits (instead of finishing in ~50mins).

I'll investigate this manually later today

https://github.com/kkrt-labs/kakarot-ssj/actions/runs/11222856617/job/31196253181?pr=1011

@enitrat
Copy link
Author

enitrat commented Oct 8, 2024

I just hit an issue when running locally:

* thread #6, name = 'stStaticCall::test_static_ABAcalls0_d1g0v0_Cancun', stop reason = EXC_BAD_ACCESS (code=2, address=0x170633e80)
    frame #0: 0x000000017280212c 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::instructions::memory_operations::MemoryOperation::exec_pc(f179) + 18816
504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::instructions::memory_operations::MemoryOperation::exec_pc(f179):
->  0x17280212c <+18816>: str    x22, [x19, #0x20]
    0x172802130 <+18820>: ldr    x22, [sp, #0x1108]
    0x172802134 <+18824>: str    x22, [x19, #0x10]
    0x172802138 <+18828>: ldr    w22, [sp, #0x338]
(lldb) bt
* thread #6, name = 'stStaticCall::test_static_ABAcalls0_d1g0v0_Cancun', stop reason = EXC_BAD_ACCESS (code=2, address=0x170633e80)
  * frame #0: 0x000000017280212c 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::instructions::memory_operations::MemoryOperation::exec_pc(f179) + 18816
    frame #1: 0x000000017248f9c8 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::execute_opcode(f135) + 381252
    frame #2: 0x00000001723ec248 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::execute_code(f107) + 9600
    frame #3: 0x00000001723949a4 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::process_message(f78) + 24288
    frame #4: 0x0000000172ae167c 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::call_helpers::CallHelpersImpl::generic_call(f255) + 47004
    frame #5: 0x00000001726d5b90 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::instructions::system_operations::SystemOperations::exec_staticcall(f148) + 103480
    frame #6: 0x00000001725bb4f8 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::execute_opcode(f135) + 1608820
    frame #7: 0x00000001723ec248 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::execute_code(f107) + 9600
    frame #8: 0x00000001723949a4 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::process_message(f78) + 24288
    frame #9: 0x0000000172ae167c 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::call_helpers::CallHelpersImpl::generic_call(f255) + 47004
    frame #10: 0x00000001726d5b90 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::instructions::system_operations::SystemOperations::exec_staticcall(f148) + 103480
    frame #11: 0x00000001725bb4f8 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::execute_opcode(f135) + 1608820
    frame #12: 0x00000001723ec248 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::execute_code(f107) + 9600
    frame #13: 0x00000001723949a4 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::process_message(f78) + 24288
    frame #14: 0x0000000172ae167c 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::call_helpers::CallHelpersImpl::generic_call(f255) + 47004
    frame #15: 0x0000000172787ca8 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::instructions::system_operations::SystemOperations::exec_call(f154) + 139700
    frame #16: 0x000000017259f078 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::execute_opcode(f135) + 1492980
    frame #17: 0x00000001723ec248 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::execute_code(f107) + 9600
    frame #18: 0x00000001723949a4 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::process_message(f78) + 24288
    frame #19: 0x0000000172345fc0 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::process_message_call(f62) + 5284
    frame #20: 0x00000001722fa60c 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$evm::interpreter::EVMImpl::process_transaction(f48) + 72040
    frame #21: 0x00000001722c18e0 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$contracts::kakarot_core::eth_rpc::EthRPC::<contracts::kakarot_core::kakarot::KakarotCore::ContractState, contracts::kakarot_core::kakarot::KakarotCore::_KakarotCoreState, contracts::kakarot_core::kakarot::KakarotCore::ContractStateDrop>::eth_send_transaction(f39) + 15028
    frame #22: 0x0000000172290f5c 504898746202199744673968940527932241844513528453156538068464172101734859290`impl$contracts::kakarot_core::eth_rpc::__wrapper__EthRPC__eth_send_transaction::<contracts::kakarot_core::kakarot::KakarotCore::ContractState, contracts::kakarot_core::kakarot::KakarotCore::_KakarotCoreState, contracts::kakarot_core::kakarot::KakarotCore::ContractStateDrop, contracts::kakarot_core::kakarot::KakarotCore::ContractStateEthRPC>(f23) + 4148
    frame #23: 0x00000001722921b0 504898746202199744673968940527932241844513528453156538068464172101734859290`_mlir_ciface_f23 + 84
    frame #24: 0x00000001007f2d94 stStaticCall-b270e27c5493ce0a`invoke_trampoline + 108
    frame #25: 0x00000001004c7d94 stStaticCall-b270e27c5493ce0a`blockifier::execution::native::utils::run_native_executor::hbc529b733575cb07(native_executor=0x00006000021cc010, function_id=0x0000000117015b00, call=<unavailable>, syscall_handler=<unavailable>) at utils.rs:45:28

will add instructions to repro from scratch (if I manage to)

@enitrat
Copy link
Author

enitrat commented Oct 8, 2024

^ so actually the above happens if after I increase the default stack size with export RUST_MIN_STACK=16777216. If I don't, it just stack-overflows. Same repro instructions as in this issue but running cargo test test_static_ABAcalls0_d1g0v0_Cancun --features v1,native -- --nocapture

@azteca1998
Copy link
Collaborator

I'm having difficulties running that command. The compiler complains about reth-trie.

error[E0277]: the trait bound `reth_primitives::Account: From<revm::revm_primitives::AccountInfo>` is not satisfied
  --> /Users/esteve/.cargo/git/checkouts/reth-36d3ea1d1152b20c/75b7172/crates/trie/trie/src/state.rs:35:38
   |
35 |                 let hashed_account = account.info.clone().map(Into::into);
   |                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `From<revm::revm_primitives::AccountInfo>` is not implemented for `reth_primitives::Account`, which is required by `revm::revm_primitives::AccountInfo: Into<_>`
   |
   = help: the following other types implement trait `From<T>`:
             `reth_primitives::Account` implements `From<&GenesisAccount>`
             `reth_primitives::Account` implements `From<reth_primitives::revm_primitives::AccountInfo>`
   = note: required for `revm::revm_primitives::AccountInfo` to implement `Into<reth_primitives::Account>`

error[E0277]: the trait bound `reth_primitives::Account: From<revm::revm_primitives::AccountInfo>` is not satisfied
  --> /Users/esteve/.cargo/git/checkouts/reth-36d3ea1d1152b20c/75b7172/crates/trie/trie/src/state.rs:35:63
   |
35 |                 let hashed_account = account.info.clone().map(Into::into);
   |                                                               ^^^^^^^^^^ the trait `From<revm::revm_primitives::AccountInfo>` is not implemented for `reth_primitives::Account`, which is required by `revm::revm_primitives::AccountInfo: Into<_>`
   |
   = help: the following other types implement trait `From<T>`:
             `reth_primitives::Account` implements `From<&GenesisAccount>`
             `reth_primitives::Account` implements `From<reth_primitives::revm_primitives::AccountInfo>`
   = note: required for `revm::revm_primitives::AccountInfo` to implement `Into<reth_primitives::Account>`

error[E0277]: the trait bound `reth_primitives::Account: From<revm::revm_primitives::AccountInfo>` is not satisfied
  --> /Users/esteve/.cargo/git/checkouts/reth-36d3ea1d1152b20c/75b7172/crates/trie/trie/src/state.rs:62:86
   |
62 |                 let hashed_account = account.account.as_ref().map(|a| a.info.clone().into());
   |                                                                                      ^^^^ the trait `From<revm::revm_primitives::AccountInfo>` is not implemented for `reth_primitives::Account`, which is required by `revm::revm_primitives::AccountInfo: Into<_>`
   |
   = help: the following other types implement trait `From<T>`:
             `reth_primitives::Account` implements `From<&GenesisAccount>`
             `reth_primitives::Account` implements `From<reth_primitives::revm_primitives::AccountInfo>`
   = note: required for `revm::revm_primitives::AccountInfo` to implement `Into<reth_primitives::Account>`
help: consider removing this method call, as the receiver has type `revm::revm_primitives::AccountInfo` and `revm::revm_primitives::AccountInfo: From<revm::revm_primitives::AccountInfo>` trivially holds
   |
62 -                 let hashed_account = account.account.as_ref().map(|a| a.info.clone().into());
62 +                 let hashed_account = account.account.as_ref().map(|a| a.info.into());
   |

@enitrat
Copy link
Author

enitrat commented Oct 10, 2024

@tcoratger can you help here?

@enitrat
Copy link
Author

enitrat commented Oct 10, 2024

@azteca1998 i'm unable to repro your issue from a fresh clone.

I have:

❯ rustc --version
rustc 1.81.0 (eeb90cda1 2024-09-04)

can you try a cargo clean and to not run cargo update ?

I also synced the repo with latest cairo-native main, do a git pull

edit: after syncing with latest cairo-native main, i can't deserialize my v1 classes anymore to run the test anymore - investigating what changed

@tcoratger
Copy link
Collaborator

Yeah probably some versioning conflict (maybe between revm and reth which use revm as an underlying dependency) because I'm sure that the conversion mentioned in your error message is available in reth here https://github.com/paradigmxyz/reth/blob/75b7172cf77eb4fd65fe1a6924f75066fb09fcd1/crates/primitives-traits/src/account.rs#L165

@azteca1998
Copy link
Collaborator

I'm not sure it'll be fixed, but could you try with the latest main? We just merged a fix for a bug that may or may not be related to this issue (it's difficult to tell).

@enitrat
Copy link
Author

enitrat commented Oct 14, 2024

Nothing changed :(

I am still stuck with this issue:

  • Default stack size: fatal runtime error: stack overflow
  • export RUST_MIN_STACK=16777216(16Mib) -> stack overflow
  • export RUST_MIN_STACK=134217728 (128Mib) -> signal: 11, SIGSEGV: invalid memory reference

This used to be fine with 16Mib before iirc

Did #833 significantly increase stack usage ?

@azteca1998
Copy link
Collaborator

azteca1998 commented Oct 14, 2024

It shouldn't have changed the stack memory usage. It could be that it now takes more for it to crash, therefore needs a bit more stack.

I'll keep investigating then, thanks for checking it.

@azteca1998
Copy link
Collaborator

Hello, I've finally managed to fix it. It took some modifications in blockifier to be able to compile and run it but it ran.

It seems the issue (or the part that was causing the segfault) was related to the bug I fixed the other day, but I forgot to fix it in the syscall handler. @edg-l noticed and fixed it, so right now it doesn't segfault anymore, but still doesn't pass:

path: "../../build/v1/contracts_KakarotCore.contract_class.json"
path: "../../build/v1/contracts_AccountContract.contract_class.json"
path: "../../build/v1/contracts_UninitializedAccount.contract_class.json"
Got uninitialized account's json of length 48297
Got account's json of length 660024
Got kakarot's json of length 14061466
raw json length 48297
class def parsed
native contract
Creating new executor
thread 'stStaticCall::test_static_ABAcalls0_d1g0v0_Cancun' panicked at crates/ef-testing/src/evm_sequencer/sequencer/mod.rs:247:121:
called `Result::unwrap()` on an `Err` value: "not a valid contract class"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@enitrat
Copy link
Author

enitrat commented Oct 15, 2024

@azteca1998 is your CAIRO_NATIVE_RUNTIME_LIBRARY in available as environment variable?

also: i don't think you're using the latest branch version (can you git pull?) in the latest branch you would see the reason why the executor cannot be created, and you wouldnt need blockifier changes I believe (rev b4d9b48a0209ded413eb31438d73a8dfd7b1cd37)

edit: bumped native latest main, ef-test rev is: 6afd893c2c7b987a8eeeb90e302f982438f09a69, checking if if works now.

@edg-l
Copy link
Member

edg-l commented Oct 15, 2024

@azteca1998 is your CAIRO_NATIVE_RUNTIME_LIBRARY in available as environment variable?

also: i don't think you're using the latest branch version (can you git pull?) in the latest branch you would see the reason why the executor cannot be created, and you wouldnt need blockifier changes I believe (rev b4d9b48a0209ded413eb31438d73a8dfd7b1cd37)

edit: bumped native latest main, ef-test rev is: 6afd893c2c7b987a8eeeb90e302f982438f09a69, checking if if works now.

Just want to comment that native main doesn't yet have this fix in #853 so you should try with that

@enitrat
Copy link
Author

enitrat commented Oct 15, 2024

ah, ok i will probably wait for it to be on main as i'm focused mainly on something else atm

@azteca1998
Copy link
Collaborator

I thought it was already merged, but it seems that wasn't the case. Anyways, one of the PRs merged today has fixed the segfault (it isn't failing anymore in main).

@enitrat
Copy link
Author

enitrat commented Oct 16, 2024

Checking today on 84ceaa3, no improvements

I'm wondering if it's a side effect from increasing the RUST_MIN_STACK size to a high number?

@azteca1998
Copy link
Collaborator

no improvements

What do you mean? It still segfaults for you?

@enitrat
Copy link
Author

enitrat commented Oct 16, 2024

still getting a stack overflow with default / low stack sizes, or a signal: 11, SIGSEGV: invalid memory reference with an increased stack size (see #831 (comment))

@azteca1998
Copy link
Collaborator

Strange, did you cargo clean and rebuild everything (including the runtime)? The stack size shouldn't have an effect on causing segfaults.

@enitrat
Copy link
Author

enitrat commented Oct 20, 2024

Closing in to continue in #870

@enitrat enitrat closed this as completed Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants