Manticore's mechanisms for computational budgets are nondeterministic #1670

bradlarsen · 2020-04-16T22:02:01Z

bradlarsen
Apr 16, 2020

In pull request #1668, I had made some innocuous changes, but saw that test case in the CI / tests (ethereum_bench) fail. I re-ran the CI tests a couple times, and eventually, that test passed!
I've done some investigation of the sporadically-failing EthBenchmark.test_integer_overflow_storageinvariant test case, and have been sporadically able to reproduce the failures locally, on master.

One issue that Manticore has is that its test suite includes some tests that hit various nondeterministic mechanisms to enforce computational "budgets". These make the observable behavior of those tests nondeterministic (Heisen-bugs!).

Some mechanisms Manticore uses to enforce computational budgets:

Time and memory constraints are passed to the Z3 SMT solver
Manticore itself implements an explicit timeout mechanisms in other places, such as in the solver code
Manticore implements another timeout mechanism in Unicorn fallback emulation code

Some possible sources of nondeterminism:

Loading of the machine running Manticore (sometimes it may be too slow and hit a timeout)
Fluctuations in Z3 memory use (there might possibly be small run-to-run variation in this, causing the memory limit to be hit, even with no change in queries given)
Manticore's --core.seed argument, by default, sets Python's random seed with a random value
Z3 itself has nondeterministic / stochastic properties, controlled (hopefully!) via its own command-line or SMTLIB seed option that Manticore doesn't explicitly set
The PYTHONHASHSEED environment variable, when not explicitly set, introduces run-to-run variation in str and bytes hash values, which might possibly make observable differences in Manticore's behavior
Iterating over an unordered collection, such as a set in Python, could have observable differences from run to run if the iteration introduces any side effects
Manticore by default uses multiple workers when exploring states; states might possibly be explored in different orders from run to run, perhaps with meaningful differences
With Manticore's multiple workers, each worker gets its own Z3 instance

Anyway, the summary of all this is that Manticore's various mechanisms to enforce computational budgets, combined with nondeterminism, combined with test cases that can possibly exceed those computational budgets, leaves us in a state where test cases will sporadically fail.

Additionally, Manticore's --smt.defaultunsat command-line option, which defaults to true and causes Z3's "unknown" answers to be treated instead as "unsat", may be particularly good at evoking nondeterministic behavior in Manticore when timeouts are hit.

These nondeterministic computational budget mechanisms may also be the underlying cause of the sporadic "Model is not Available" errors that were investigated in #1659.

bradlarsen · 2020-04-20T20:54:28Z

bradlarsen
Apr 20, 2020
Author

A bit of evidence from the last 3 runs of CI on master, which are all testing the same commit:

I see the following aggregate wall clock times: 40m12s, 65m12s, 41m22s.

“Okay,” you say. “Maybe there is noise in there from running so many different jobs and aggregating them.”

So, we look just the at the ethereum_vm jobs:

Then, just for the ethereum_vm jobs from the same commit, we see the following aggregate wall clock times: 34m9s, 64m50s, 29m15s. That’s more than 2x variation in runtime there, just from running the same job multiple times!

Digging further, we see that the “slowest 100 tests” data bounces around between those 3 ethereum_vm jobs, too. Let’s look at just the slowest 3 test cases.

From the first run:

========================== slowest 100 test durations ==========================
1201.88s call     tests/ethereum_bench/test_consensys_benchmark.py::EthBenchmark::test_integer_overflow_benign_2
624.77s call     tests/ethereum_bench/test_consensys_benchmark.py::EthBenchmark::test_integer_overflow_mul
562.07s call     tests/ethereum_bench/test_consensys_benchmark.py::EthBenchmark::test_integer_overflow_storageinvariant

From the second run:

========================== slowest 100 test durations ==========================
2717.12s call     tests/ethereum_bench/test_consensys_benchmark.py::EthBenchmark::test_integer_overflow_storageinvariant
872.04s call     tests/ethereum_bench/test_consensys_benchmark.py::EthBenchmark::test_integer_overflow_benign_2
596.54s call     tests/ethereum_bench/test_consensys_benchmark.py::EthBenchmark::test_integer_overflow_mul

From the third run:

========================== slowest 100 test durations ==========================
854.36s call     tests/ethereum_bench/test_consensys_benchmark.py::EthBenchmark::test_integer_overflow_benign_2
596.62s call     tests/ethereum_bench/test_consensys_benchmark.py::EthBenchmark::test_integer_overflow_mul
592.99s call     tests/ethereum_bench/test_consensys_benchmark.py::EthBenchmark::test_integer_overflow_storageinvariant

The test_integer_overflow_storageinvariant test varies between 562s and 2717s in these 3 CI runs, and it’s not even always the slowest test case!

Indeed, we do get lots of Manticore performance variation from run to run, even without changing anything.

1 reply

bradlarsen Jul 23, 2020
Author

Sadly, the CI logs are no longer available. Note: GitHub Actions logs are not persistent; if you have to refer to them, you better save them elsewhere!

bradlarsen · 2020-04-21T17:40:21Z

bradlarsen
Apr 21, 2020
Author

Note, these issues are particularly relevant if you're trying to profile Manticore to determine if certain changes have made a difference to performance. As things are currently, you are very likely to be fooled by randomness in the results.

Your best bet in that situation is the following:

select a specific seed for Manticore, and use the same one during each run
set the environment variable PYTHONHASHSEED to a particular value
run Manticore with --core.mprocessing=single
run your experiment several times, to get a sense for the run-to-run variance in your setup

Even still, this may not be enough to eliminate run-to-run nondeterminism in Manticore! But it's a starting point.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manticore's mechanisms for computational budgets are nondeterministic #1670

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Manticore's mechanisms for computational budgets are nondeterministic #1670

bradlarsen Apr 16, 2020

Replies: 2 comments · 1 reply

bradlarsen Apr 20, 2020 Author

bradlarsen Jul 23, 2020 Author

bradlarsen Apr 21, 2020 Author

bradlarsen
Apr 16, 2020

Replies: 2 comments 1 reply

bradlarsen
Apr 20, 2020
Author

bradlarsen Jul 23, 2020
Author

bradlarsen
Apr 21, 2020
Author