-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(testing
): add mutation testing support
#478
Comments
More notes:
Paper has a handy list of mutations we could implement |
Related DappTools PRs/Issues: |
potential inspiration: https://github.com/JoranHonig/vertigo has some good tips in it like knowing what test hits a particular mutation we could do this by:
Vertigo works on solidity, but we may want to do part solidity, part raw opcodes so we can be faster and not have to recompile for all mutation types |
alternatively we could sacrifice some disk space by copying the source of the project into if the command exits with status 0 the tests passed, otherwise they failed. afterwards we clean up by deleting those directories (or keep them around for diffing). if our cache works on hashes we theoretically(?) would only recompile the mutated contract anyway |
Yoyoyo, just seeing this thread now. Not sure if this is possible already but if foundry could include the solc json ast in the compilation artifacts then it'll be relatively straightforward to create a Vertigo adapter for foundry.
That is close how Vertigo works right now (except that it doesn't create all of the mutations up front), and how the integration would work 👍
There are some papers that evaluate this approach. From the top of my head there are two benefits to doing source code mutation:
|
Hey all, We have been working on a mutation generator called Gambit. It is implemented in Rust so it might be possible to integrate it with Forge easily. Gambit is a mutation generator only and it works by traversing the Solidity AST to identify where to perform the mutations, then performs the mutations directly at the source level, and dumps them in a designated directory. We are actively working on adding more functionality and better mutation generation when possible so let us know if you have any suggestions. PRs are also welcome :) |
This is great @chandrakananandi thank you for sharing! The AST modification logic makes sense - cc @mattsse we could add that a mutate fn to the Project struct in ethers solc? Which would traverse all the tests and perform a random mutation? And we would run the the tests against the mutation inside of forge mutate? |
There is also Universal Simulator, one of the engineers from Trail of Bits maintains it - here is some usage: |
Indeed, it's very cool! The difference between Gambit and the universalmutator is that they are regex based whereas Gambit is guided by the AST and also runs the compiler to filter out any invalid mutants. On the other hand, universalmutator supports many more languages since their technqiue relies on regexes and they don't need any additional AST information! There is a cool paper on it too. |
Would be supportive of a draft PR or some kind of RFC proposing exactly how we'd integrate the two together, if you'd want to give it a shot @chandrakananandi! |
Hey @chandrakananandi, congrats on the announcement today! Figured this might be a good time to revisit what the UX for what an MVP forge integration might look like. Previously we came up something like this:
Follow up versions could:
|
Hi Matt @mds1! Thanks for writing this out. The flow makes sense! One quick note: at the moment, Gambit actually already compiles the mutants to ensure that they are all valid. So the As for the benign / equivalent mutant question: this is a very interesting problem and full on equivalence checking for mutants is yet another undecidable verification task. So our current thinking is that we should avoid generating benign mutants in the first place. For now, Gambit allows the user to specify what functions and contracts to mutate which is an easy way to avoid some useless mutants because hopefully the user has knowledge about the codebase we can leverage. We are actively working on other light weight strategies as well. There is also TCE (Trivial Compiler Equivalence) which people have used for this. The Certora Prover itself can be used here too but like I said, that's yet another verification task on its own. For now, I am most interested in gathering data (lots of mutants, the result of running tests on them, some potentially manual analysis of mutants to understand how often we get equivalent mutants) and guiding our strategy for equivalence detection based on that. For the scale of data we will need for this, I think integrating with a testing framework like foundry would be really great! Is there any dataset of solidity contracts together with foundry tests that you could point me to? |
Ah interesting, wonder if there should be an option to disable that, since otherwise we'll be compiling twice (once for gambit to verify, then the forge compilation)? Then instead forge would need to discard a mutation that doesn't compile
Are you just looking for repos that use foundry for building/testing? If so, the forge integration tests run against some external repos which could be a good starting point (some of those repos use dapptools, which foundry should be backwards compatible with). Then some other repos that use foundry include seaport, maple, alchemix, uniswap universal router, optimism's bedrock contracts, and a bunch of my own repos including multicall, solidity-generators, solidity-trigonometry, umbra. There's also a bunch in the awesome-foundry repo. But let me know if I misunderstood! 😅 |
Yes this is definitely possible and in fact something we might do in an upcoming PR.
This is wonderful, and exactly what I was looking for. I wanted to gather some benchmarks so that we can see how Gambit helps them. Thanks very much! |
Hey @mds1, thought I'd jump in! I'm currently working on Gambit PR that I'm hoping to land some time next week/the following Monday. After that I can look at integrating w/ Foundry. WRT equivalent mutants: these are just hard! There are a few ways to handle them:
There is also the problem of mutant redundancy. For instance, consider the boolean condition
I've marked all cells where the mutant differs from the original value with an (X). Note that Redundancy is bad: the more redundant the mutant is (that is, the easier it is to kill), the less utility it has for things like mutation score (killed/total) or as a testing goal for a developer. As a trivial example, if a mutant always causes a crash it is very easy to detect: all tests that execute this mutant kill it! This is called a trivial mutant. So avoiding equivalent mutants is really important, but trying to minimize redundancies is also critical. All this to say, I'm hoping to improve Gambit's mutation routine to avoid many common patterns of equivalent mutants and to try to start reasoning about common mutant redundancies. This brings me to Foundry! If we are able to integrate Gambit with Foundry we can use test data to compute the kill matrix (pretty badass, no?). This is just a binary matrix of "does test T kill mutant M?". We can use this data to approximate redundancy information and help to detect equivalent mutants. This is of course very expensive to compute, so I'd be using this to inspect manually to improve Gambit for future uses. Anyway, that's my brain dump...excited to chat more once I land this PR! |
Hey all, poking around a bit in
Test Coverage Data Running tests is expensive, so avoiding test runs that don't ever touch mutated code is desirable. To do this I'd want coverage information.
I'm not sure if it's worth using test coverage in the initial PR, maybe I'll just get a proof of concept going. But I think incorporating coverage will be worth it in the medium to long term. |
Hey @BenTheKush! Here's some answers to your questions: 2: I think we might already have some conditional args like that and I believe clap supports that, so it should be ok. Deferring to @Evalir @mattsse to confirm. Note that we'll also need to support those options via the CLI and via the config file, I don't think they currently sync automatically (i.e. options need to be added in two places) 3: One downside here is that copying the whole directory might get costly (i.e. slow) because some deps can be quite large with a lot of unrelated files (e.g. when installing chainlink as a dep for their interfaces, you get tons of non-solidity code from the rest of the system). Since I don't think we need to generate mutants for deps, we might want to try only copying the src dir (i.e. set of files that may be mutated) and symlink to the rest. Not sure offhand if this will cause issues with remappings though, another @mattsse or @Evalir question. But otherwise yea I think that should be sufficient Test coverage: Coverage data is only generated when running |
Thanks @mds1, yeah this all makes sense. Having conditional args would be perfect. And I agree, copying an entire directory over for mutation is not ideal. I might start there for a proof of concept/MVP and then see if there is something clever we can do to not have to copy everything. There's probably some clever manipulation of the I'm not super familiar with the Solidity ecosystem so not sure what the stumbling blocks are...but I guess this is a great way to learn :) WRT Coverage: happy to make it conditional! And again, I probably won't be implementing any optimizations in the first pass. I'm going to do some requirements engineering this week and try to come up with a design. I think a lot of the changes will be happening on the Gambit side anyway, and I'll set up an API that |
Hi there! I think mutation testing would be very useful to enhance testing suites. Did someone make progress on this topic since May? |
@MerlinEgalite we had started investigating gambit + foundry but had other things come up. We have plans to resume this in early 2024. Will keep you posted. |
Hi @chandrakananandi, I'd like to work on integrating Gambit and Foundry |
This is the doc about integrating Foundry and Gambit. Please feel free to drop your comments https://hackmd.io/@tICezjHsSaiehIn9jbcUAA/rJXZb_JS6 @chandrakananandi Does Certora have any plans on open-sourcing the UI you demoed at your talk at SoliditySummit? |
The UI source is actually open-source: https://github.com/Certora/gambit-report-viewer but note that it was not implemented by Certora and isn't really regularly maintained at the moment. We have plenty of ideas for how to improve it but we don't have resources to invest in it right now. |
@chandrakananandi I just realized you dropped some comments on the design doc. I have responded. Are you ok with joining a Telegram group if I create one to sync on this? |
If anyone would like to join a Telegram group to find out about progress on this or give feedback on the design doc here is a link https://t.me/+a_SIusNyUMkzNDI8 The group will be deleted once this is accomplished Right now I'm at the stage of testing the generated mutants and generating a mutation test report. |
Progress updates on this task can be found here https://hackmd.io/@tICezjHsSaiehIn9jbcUAA/SkTEyvuHa We now have an alpha working implementation of integrating Gambit with Foundry. If you have any questions please feel free to join the TG group posted above and ask. @chandrakananandi If you could join the Telegram group posted above that would be awesome. Also, how open is the Certora team to allowing changes to Gambit? |
@samparsky Thanks so much for the updates! I responded to all the points in the doc: https://hackmd.io/@tICezjHsSaiehIn9jbcUAA/SkTEyvuHa I am not the best at keeping up with telegram and other apps but I will join it! I am certainly open to accepting PRs for gambit for improving the tool! |
testing
): add mutation testing support
Mutation testing is used to design new tests and evaluate the quality of an existing test suite. In other words, it can tell you about the quality of your tests, where coverage only tells you about how much code you have not yet written a test for.
It works by changing the behavior of a program by changing parts of the code ("mutating it"), running the test suite with this new mutation and checking if the test suite passed. If the test suite passed, the mutation is considered to have survived - if not, it is considered to have died.
If
dead / total
is below some threshold, then there is room for improvement in the test suite. In other words, the more dead mutations you have, the better.I think this would be a pretty cool tool to have in Foundry - and I think it is somewhat trivial to implement as well (simplified strategy):
Here's a paper on mutation testing for Solidity that I found interesting: https://arxiv.org/pdf/2105.03626.pdf
The text was updated successfully, but these errors were encountered: