docs: add testing improvements ADR (adr-011) #1197

MSalopek · 2023-08-11T13:18:27Z

Description

Closes: #1165

Add long-term QA/testing goals ADR:

matrix tests
cometmock tests
e2e test refactoring
MBT considerations

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

included the correct docs: prefix in the PR title
targeted the correct branch (see PR Targeting)
provided a link to the relevant issue or specification
reviewed "Files changed" and left comments if necessary
confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

Confirmed the correct docs: prefix in the PR title
Confirmed all author checklist items have been addressed
Confirmed that this PR only changes documentation
Reviewed content for consistency
Reviewed content for spelling and grammar
Tested instructions (if applicable)
Checked that the documentation website can be built and deployed successfully (run make build-docs)

shaspitz

Great writeup and points to consider

docs/docs/adrs/adr-011-improving-test-confidence.md

shaspitz · 2023-08-11T18:36:43Z

docs/docs/adrs/adr-011-improving-test-confidence.md

+
+Out of all the approaches used, unit testing has the most tools available and the coverage can simply be displayed as % of code lines tested. Although this is a very nice and very easy to understand metric, it does not speak about the quality of the test coverage.
+
+Since distributed systems testing is a lot more involved, our reliance on simple unit testing tools should be minimized due to lack of useful feedback. Of course, sometimes unit tests are still necessary and helpful, but in some cases where unit tests are not helpful, we should use e2e or integration tests as appropriate rather than taking pains to add unit tests only to artificially increase coverage.


Agreed on not artificially increasing coverage. However I tend to disagree that usage of UTs should be minimized. Test driven development with UTs can always reduce how much we need to rely on more complex higher level tests. Also TDD generally encourages better software.

I bring this up as there are still ways we can improve UTs with fuzzing for example

This is mostly referring to code coverage results. You can have 100% code coverage and still not cover much of the actual code interactions going on.

Changed to

Since distributed systems testing is a lot more involved, unit tests are oftentimes not sufficient to cover complex interactions. Unit tests are still necessary and helpful, but in cases where unit tests are not helpful e2e or integration tests should be favored.

Yea seems reasonable, my comment was just trying to communicate that imo unit tests are always helpful, even in complex interactions, by testing individual components of the system. Higher level tests have their place as well for integrations

shaspitz · 2023-08-11T19:02:25Z

docs/docs/adrs/adr-011-improving-test-confidence.md

+Specifications written in a dedicated and executable specification language are easier to maintain than the ones written entirely entirely in text.
+Additionally, we can create models based on the specification OR make the model equivalent to a specification.
+
+Models do not care about the intricacies of implementation and neither do specifications. Since both models and specifications care about concisely and accurately describing a system (such as a finite state machine), we see a benefit of adding model based tools (such as [quint](https://github.com/informalsystems/quint)) to our testing and development workflows.


Quint does seem cool and all, but it seems from @p-offtermatt's experience that it doesn't suit all our needs in its current form.

We should consider a tool like https://github.com/flyingmutant/rapid to define our model, and generate high level test cases. Sure, we wouldn't be writing things in a TLA-esq syntax. But rapid might provide the usefulness we need in the short to medium term.

You could conceivably write an executable spec in the language of the rapid framework, arguably in a more readable form to layman like me who don't know TLA+.

I favor the simplicity of writing everything (both the model and driver) in golang, avoiding the need for specialized knowledge in an experimental specification language, and avoiding a test trace interpreter.

My question here is, what does Quint offer that rapid does not? I'm failing to see the value props of using Quint in its current form besides the fact that it's cool to write your spec in a TLA-esq syntax and we want to support Informal projects

Good point Shawn, I also considered using Rapid.

My feeling is that it's probably very hard to get a specification of reasonable abstraction level with Rapid,
potentially harder than with Quint. This is just an educated guess, but part of the complexity is just that the protocol is inherently complex.

My current intuition is that if we are just interested in good test coverage, doing property-based testing with Rapid is probably better in terms of cost/benefit, but this wouldn't yield executable specs.

If we are interested in executable specs, Quint still seems reasonable to me, but I don't have a good feeling on how to get better data on this - maybe just trying the rapid approach next would be a good idea to see how it goes.

Good points, here's some followup questions to consider

My feeling is that it's probably very hard to get a specification of reasonable abstraction level with Rapid

Why is this the case? What makes a protocol easier to describe/abstract with Quint compared to Rapid, or any language/framework for that matter? Does Rapid have any functionality/advantages that Quint does not?

Maybe we can try modeling just a subprotocol of replicated security with both Quint and Rapid, and compare side by side how challenging it is to model with each.

Regarding the challenge of creating a succinct model: The particular language or framework we use matters. But generally well written and encapsulated software matters just as much imo

Yeah, good points here Shawn. The working hypothesis is that for high-level specifications where implementation-level details are abstracted away, a language that purposely abstracts a lot of these details away is easier to use than a language that needs more fine-grained management of details like memory, concurrency, ... . We will see how it turns out, but I agree that giving both a try could be a good idea.

shaspitz · 2023-08-11T19:12:45Z

docs/docs/adrs/adr-011-improving-test-confidence.md

+
+#### Matrix tests
+Instead of only running tests against current `main` branch we should adopt an approach where we also:
+- **run regression tests against released software versions** (`ICS v1/v2/v3`)


🙏 including mismatched consumer/provider versions

docs/docs/adrs/adr-011-improving-test-confidence.md

shaspitz · 2023-08-11T19:24:34Z

docs/docs/adrs/adr-011-improving-test-confidence.md

+
+
+### Negative
+It might be easier to forgo the MBT tooling and instead focus on pure property based testing


p-offtermatt

Thanks, nice writeup. Left a few comments.

p-offtermatt · 2023-08-14T12:42:57Z

docs/docs/adrs/adr-011-improving-test-confidence.md

+
+Out of all the approaches used, unit testing has the most tools available and the coverage can simply be displayed as % of code lines tested. Although this is a very nice and very easy to understand metric, it does not speak about the quality of the test coverage.
+
+Since distributed systems testing is a lot more involved, our reliance on simple unit testing tools should be minimized due to lack of useful feedback. Of course, sometimes unit tests are still necessary and helpful, but in some cases where unit tests are not helpful, we should use e2e or integration tests as appropriate rather than taking pains to add unit tests only to artificially increase coverage.


I disagree with some things here, similar to Shawns comment.
My feeling is we should maximize our (reasonable) use of unit tests. Of course there are situations when they are not the right tool and it's fine to move to other testing layers, but unit tests have many advantages over integration or e2e tests.
Some reasons for unit tests are: better localizability of errors, less flakeyness, less unstated assumptions on implementation details e.g. output formats.

I will update the text.

I'm not saying UTs are bad, I'm saying that the coverage metric is misleading and gives a false sense of security.

Have in mind that we're talking about UTs in ICS which relies on cosmos-sdk and ibc-go.

Let's go over a scenario of writing UTs for code that uses ibc-go and cosmos-sdk:

Write a test for a function that performs the following operation: 1. fetch a clientID from ibc-go ChannelKeeper 2. store the fetched clientID to blockchain state using the ProviderKeeper

Even though the operation is trivial, when writing UTs you rely on mocks for the ChannelKeeper which have to be generated correctly. The UTs get more convoluted the more store keepers you add.

Writing this as an integration test is comparatively easier, since you het access to the "real" ChannelKeeper and the ProviderKeeper without adding extra complexity.

Changed to

Since distributed systems testing is a lot more involved, unit tests are oftentimes not sufficient to cover complex interactions. Unit tests are still necessary and helpful, but in cases where unit tests are not helpful e2e or integration tests should be favored.

Let's go over a scenario of writing UTs for code that uses ibc-go and cosmos-sdk

I'd argue that in this scenario, UTs are less complex, and more useful. The process of generating mocks is a single automated CLI command. Accessing the "real" external keepers requires far more manual setup as we've seen in our integration tests. Not to mention, setting up external keepers goes against the intention of TDD, you're no longer only testing code from the local module.

Integration tests should be reserved for situations where your intention is truly to test the high level integration of software components

The updated text is good for me, thanks! :)

docs/docs/adrs/adr-011-improving-test-confidence.md

mpoke

Thanks @MSalopek

docs/docs/adrs/adr-011-improving-test-confidence.md

insumity · 2023-08-22T11:43:39Z

docs/docs/adrs/adr-011-improving-test-confidence.md

+Harness is the infrastructure layer of the pipeline that accepts inputs from the driver.
+
+There can be multiple harnesses as long as they can perform 3 things:
+* bootstrap a test execution environment (local, docker, k8s…)


Something that is not clear to me is on how can we guarantee that the driver can execute the exact same trace as it was given by the model. For example, consider we use docker containers, and the model-generated trace that we have is a sequence of events like: "chain A sends message m, chain B sends message m', chain A receives m', ..." then to run this exact trace we would need to enforce how/when the relayer relays packets or coordinate when chains A and B take steps. Is this something we can (easily) do?

Relayers can be controlled in a relatively fine-grained manner - you can in general make them clear packets on a given channel, then stop clearing more packets. What you described is easily possible in principle - just clear the channel that m' was sent on first.

Coordinating when exactly certain steps are taken is more difficult, it's e.g. possible with CometMock.

But generally, we often do not assume these very strong orderings. For example, assume we want to delegate on A, this triggers a message m to be sent from A to B, we want to first submit a tx to B, then B should receive m, then we want to submit another tx' to B. This is easily possible - we submit the delegation to A, wait until it is included, submit tx to B, wait until it is included, have the relayer relay m to B, wait until m was included, submit tx' to B, wait until it is included. Only the relative order of the events matters, we don't need precise coordination between block numbers on either chain.

Does that cover the kinds of examples you were thinking of?

Does that cover the kinds of examples you were thinking of?

Yes. Thank you very much @p-offtermatt !

docs/docs/adrs/adr-011-improving-test-confidence.md

Co-authored-by: Shawn <[email protected]>

shaspitz

I've nitpicked your PR enough ;) great job and thanks for bringing up these topics

* docs: add testing improvements ADR (adr-011) (#1197) * docs: add testing improvements ADR * docs: fix links and typos * Update docs/docs/adrs/adr-011-improving-test-confidence.md Co-authored-by: Shawn <[email protected]> * Update docs/docs/adrs/adr-011-improving-test-confidence.md Co-authored-by: Shawn <[email protected]> * docs: update from comments --------- Co-authored-by: Shawn <[email protected]> * refactor: clean protos for clarity (#1071) * consumer proto folder * provider comments, move one msg * ccv comments * moved shiz * proto changes to build * make proto gen again * no circ dep * Update wire.proto * comments * dun builds * progress save * FIN * refactors * rm improper proto ref * rm todos, issue now * lint till I die * update comments, rebuild protos * change ccv.go to wire.go, and test file * consistent comment * Update wire.proto * example alias diff --------- Co-authored-by: MSalopek <[email protected]>

docs: add testing improvements ADR

e730e61

MSalopek self-assigned this Aug 11, 2023

MSalopek requested a review from a team as a code owner August 11, 2023 13:18

docs: fix links and typos

9db9d21

MSalopek mentioned this pull request Aug 11, 2023

Outline long term QA goals cosmos/gaia#2427

Closed

16 tasks

shaspitz reviewed Aug 11, 2023

View reviewed changes

p-offtermatt reviewed Aug 14, 2023

View reviewed changes

mpoke reviewed Aug 16, 2023

View reviewed changes

docs/docs/adrs/adr-011-improving-test-confidence.md Outdated Show resolved Hide resolved

mpoke approved these changes Aug 16, 2023

View reviewed changes

insumity reviewed Aug 22, 2023

View reviewed changes

docs/docs/adrs/adr-011-improving-test-confidence.md Outdated Show resolved Hide resolved

insumity reviewed Aug 22, 2023

View reviewed changes

yaruwangway reviewed Aug 23, 2023

View reviewed changes

docs/docs/adrs/adr-011-improving-test-confidence.md Show resolved Hide resolved

MSalopek and others added 2 commits August 23, 2023 12:26

Update docs/docs/adrs/adr-011-improving-test-confidence.md

87e719d

Co-authored-by: Shawn <[email protected]>

Update docs/docs/adrs/adr-011-improving-test-confidence.md

9e809e0

Co-authored-by: Shawn <[email protected]>

github-actions bot added C:Docs Assigned automatically by the PR labeler C:ADR Assigned automatically by the PR labeler labels Aug 23, 2023

docs: update from comments

cb5cd23

shaspitz approved these changes Aug 23, 2023

View reviewed changes

MSalopek changed the title ~~docs: add testing improvements ADR~~ docs: add testing improvements ADR (adr-011) Aug 24, 2023

MSalopek merged commit 2a1e3a9 into main Aug 24, 2023
13 checks passed

MSalopek deleted the masa/1165-testing-adr branch August 24, 2023 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add testing improvements ADR (adr-011) #1197

docs: add testing improvements ADR (adr-011) #1197

MSalopek commented Aug 11, 2023 •

edited by mpoke

Loading

shaspitz left a comment

shaspitz Aug 11, 2023

MSalopek Aug 23, 2023

MSalopek Aug 23, 2023 •

edited

Loading

shaspitz Aug 23, 2023

shaspitz Aug 11, 2023

p-offtermatt Aug 14, 2023 •

edited

Loading

shaspitz Aug 14, 2023

p-offtermatt Aug 22, 2023 •

edited

Loading

shaspitz Aug 11, 2023

shaspitz Aug 11, 2023

p-offtermatt left a comment

p-offtermatt Aug 14, 2023

MSalopek Aug 23, 2023

MSalopek Aug 23, 2023

shaspitz Aug 23, 2023

p-offtermatt Aug 24, 2023 •

edited

Loading

mpoke left a comment

insumity Aug 22, 2023 •

edited

Loading

p-offtermatt Aug 22, 2023

insumity Aug 25, 2023 •

edited

Loading

shaspitz left a comment


		Out of all the approaches used, unit testing has the most tools available and the coverage can simply be displayed as % of code lines tested. Although this is a very nice and very easy to understand metric, it does not speak about the quality of the test coverage.

		Since distributed systems testing is a lot more involved, our reliance on simple unit testing tools should be minimized due to lack of useful feedback. Of course, sometimes unit tests are still necessary and helpful, but in some cases where unit tests are not helpful, we should use e2e or integration tests as appropriate rather than taking pains to add unit tests only to artificially increase coverage.



		### Negative
		It might be easier to forgo the MBT tooling and instead focus on pure property based testing

docs: add testing improvements ADR (adr-011) #1197

docs: add testing improvements ADR (adr-011) #1197

Conversation

MSalopek commented Aug 11, 2023 • edited by mpoke Loading

Description

Author Checklist

Reviewers Checklist

shaspitz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MSalopek Aug 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p-offtermatt Aug 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p-offtermatt Aug 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p-offtermatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p-offtermatt Aug 24, 2023 • edited Loading

Choose a reason for hiding this comment

mpoke left a comment

Choose a reason for hiding this comment

insumity Aug 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

insumity Aug 25, 2023 • edited Loading

Choose a reason for hiding this comment

shaspitz left a comment

Choose a reason for hiding this comment

MSalopek commented Aug 11, 2023 •

edited by mpoke

Loading

MSalopek Aug 23, 2023 •

edited

Loading

p-offtermatt Aug 14, 2023 •

edited

Loading

p-offtermatt Aug 22, 2023 •

edited

Loading

p-offtermatt Aug 24, 2023 •

edited

Loading

insumity Aug 22, 2023 •

edited

Loading

insumity Aug 25, 2023 •

edited

Loading