Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add testing improvements ADR (adr-011) #1197
docs: add testing improvements ADR (adr-011) #1197
Changes from 2 commits
e730e61
9db9d21
87e719d
9e809e0
cb5cd23
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on not artificially increasing coverage. However I tend to disagree that usage of UTs should be minimized. Test driven development with UTs can always reduce how much we need to rely on more complex higher level tests. Also TDD generally encourages better software.
I bring this up as there are still ways we can improve UTs with fuzzing for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly referring to code coverage results. You can have 100% code coverage and still not cover much of the actual code interactions going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea seems reasonable, my comment was just trying to communicate that imo unit tests are always helpful, even in complex interactions, by testing individual components of the system. Higher level tests have their place as well for integrations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with some things here, similar to Shawns comment.
My feeling is we should maximize our (reasonable) use of unit tests. Of course there are situations when they are not the right tool and it's fine to move to other testing layers, but unit tests have many advantages over integration or e2e tests.
Some reasons for unit tests are: better localizability of errors, less flakeyness, less unstated assumptions on implementation details e.g. output formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the text.
I'm not saying UTs are bad, I'm saying that the coverage metric is misleading and gives a false sense of security.
Have in mind that we're talking about UTs in ICS which relies on cosmos-sdk and ibc-go.
Let's go over a scenario of writing UTs for code that uses ibc-go and cosmos-sdk:
Even though the operation is trivial, when writing UTs you rely on mocks for the
ChannelKeeper
which have to be generated correctly. The UTs get more convoluted the more storekeepers
you add.Writing this as an integration test is comparatively easier, since you het access to the "real"
ChannelKeeper
and theProviderKeeper
without adding extra complexity.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that in this scenario, UTs are less complex, and more useful. The process of generating mocks is a single automated CLI command. Accessing the "real" external keepers requires far more manual setup as we've seen in our integration tests. Not to mention, setting up external keepers goes against the intention of TDD, you're no longer only testing code from the local module.
Integration tests should be reserved for situations where your intention is truly to test the high level integration of software components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated text is good for me, thanks! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quint does seem cool and all, but it seems from @p-offtermatt's experience that it doesn't suit all our needs in its current form.
We should consider a tool like https://github.com/flyingmutant/rapid to define our model, and generate high level test cases. Sure, we wouldn't be writing things in a TLA-esq syntax. But rapid might provide the usefulness we need in the short to medium term.
You could conceivably write an executable spec in the language of the rapid framework, arguably in a more readable form to layman like me who don't know TLA+.
I favor the simplicity of writing everything (both the model and driver) in golang, avoiding the need for specialized knowledge in an experimental specification language, and avoiding a test trace interpreter.
My question here is, what does Quint offer that rapid does not? I'm failing to see the value props of using Quint in its current form besides the fact that it's cool to write your spec in a TLA-esq syntax and we want to support Informal projects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point Shawn, I also considered using Rapid.
My feeling is that it's probably very hard to get a specification of reasonable abstraction level with Rapid,
potentially harder than with Quint. This is just an educated guess, but part of the complexity is just that the protocol is inherently complex.
My current intuition is that if we are just interested in good test coverage, doing property-based testing with Rapid is probably better in terms of cost/benefit, but this wouldn't yield executable specs.
If we are interested in executable specs, Quint still seems reasonable to me, but I don't have a good feeling on how to get better data on this - maybe just trying the rapid approach next would be a good idea to see how it goes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points, here's some followup questions to consider
Why is this the case? What makes a protocol easier to describe/abstract with Quint compared to Rapid, or any language/framework for that matter? Does Rapid have any functionality/advantages that Quint does not?
Maybe we can try modeling just a subprotocol of replicated security with both Quint and Rapid, and compare side by side how challenging it is to model with each.
Regarding the challenge of creating a succinct model: The particular language or framework we use matters. But generally well written and encapsulated software matters just as much imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good points here Shawn. The working hypothesis is that for high-level specifications where implementation-level details are abstracted away, a language that purposely abstracts a lot of these details away is easier to use than a language that needs more fine-grained management of details like memory, concurrency, ... . We will see how it turns out, but I agree that giving both a try could be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏 including mismatched consumer/provider versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that is not clear to me is on how can we guarantee that the driver can execute the exact same trace as it was given by the model. For example, consider we use docker containers, and the model-generated trace that we have is a sequence of events like: "chain
A
sends messagem
, chainB
sends messagem'
, chainA
receivesm'
, ..." then to run this exact trace we would need to enforce how/when the relayer relays packets or coordinate when chainsA
andB
take steps. Is this something we can (easily) do?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relayers can be controlled in a relatively fine-grained manner - you can in general make them clear packets on a given channel, then stop clearing more packets. What you described is easily possible in principle - just clear the channel that m' was sent on first.
Coordinating when exactly certain steps are taken is more difficult, it's e.g. possible with CometMock.
But generally, we often do not assume these very strong orderings. For example, assume we want to delegate on
A
, this triggers a message m to be sent fromA
toB
, we want to first submit atx
toB
, thenB
should receivem
, then we want to submit anothertx'
toB
. This is easily possible - we submit the delegation toA
, wait until it is included, submittx
toB
, wait until it is included, have the relayer relaym
toB
, wait until m was included, submittx'
toB
, wait until it is included. Only the relative order of the events matters, we don't need precise coordination between block numbers on either chain.Does that cover the kinds of examples you were thinking of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Thank you very much @p-offtermatt !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍