Manticore's mechanisms for computational budgets are nondeterministic #1670
Replies: 2 comments 1 reply
-
A bit of evidence from the last 3 runs of CI on
I see the following aggregate wall clock times: 40m12s, 65m12s, 41m22s. “Okay,” you say. “Maybe there is noise in there from running so many different jobs and aggregating them.” So, we look just the at the
Then, just for the Digging further, we see that the “slowest 100 tests” data bounces around between those 3 From the first run:
From the second run:
From the third run:
The Indeed, we do get lots of Manticore performance variation from run to run, even without changing anything. |
Beta Was this translation helpful? Give feedback.
-
Note, these issues are particularly relevant if you're trying to profile Manticore to determine if certain changes have made a difference to performance. As things are currently, you are very likely to be fooled by randomness in the results. Your best bet in that situation is the following:
Even still, this may not be enough to eliminate run-to-run nondeterminism in Manticore! But it's a starting point. |
Beta Was this translation helpful? Give feedback.
-
In pull request #1668, I had made some innocuous changes, but saw that test case in the
CI / tests (ethereum_bench)
fail. I re-ran the CI tests a couple times, and eventually, that test passed!I've done some investigation of the sporadically-failing
EthBenchmark.test_integer_overflow_storageinvariant
test case, and have been sporadically able to reproduce the failures locally, onmaster
.One issue that Manticore has is that its test suite includes some tests that hit various nondeterministic mechanisms to enforce computational "budgets". These make the observable behavior of those tests nondeterministic (Heisen-bugs!).
Some mechanisms Manticore uses to enforce computational budgets:
Some possible sources of nondeterminism:
--core.seed
argument, by default, sets Python'srandom
seed with a random valuePYTHONHASHSEED
environment variable, when not explicitly set, introduces run-to-run variation instr
andbytes
hash values, which might possibly make observable differences in Manticore's behaviorset
in Python, could have observable differences from run to run if the iteration introduces any side effectsAnyway, the summary of all this is that Manticore's various mechanisms to enforce computational budgets, combined with nondeterminism, combined with test cases that can possibly exceed those computational budgets, leaves us in a state where test cases will sporadically fail.
Additionally, Manticore's
--smt.defaultunsat
command-line option, which defaults to true and causes Z3's "unknown" answers to be treated instead as "unsat", may be particularly good at evoking nondeterministic behavior in Manticore when timeouts are hit.These nondeterministic computational budget mechanisms may also be the underlying cause of the sporadic "Model is not Available" errors that were investigated in #1659.
Beta Was this translation helpful? Give feedback.
All reactions