Fuzz testing jsonchema #1499

aciba90 · 2022-06-07T16:26:41Z

Proposed Commit Message

Fuzz testing jsonchema

- Add `hypothesis` and `hypothesis-jsonschema`
- Add `hypothesis-slow` tox env to run slow tests
- Implement `JsonLocalResolver` to avoid `hypothesis-jsonschema` errors
- Add fuzz test covering `validate_cloudconfig_schema`

Additional Context

SC-829

Test Steps

pytest
tox --recreate -e hypothesis-slow
# To test the deb build process
make deb

Checklist:

My code follows the process laid out in the documentation
I have updated or added any unit tests accordingly
I have updated or added any documentation accordingly

aciba90 · 2022-06-09T11:41:04Z

This PR enables hypothesis for future unit-testing things not related to jsonschema. If we agree to do so, then:

we have to add python3-hypothesis as deb build dependency in bionic, focal, jammy, impish and dev branches
I would like to add documentation under doc/rtd/topics/testing.rst explaining how to configure / choose hypothesis profiles

holmanb

Nice, thanks for this! I'm just starting the review, but I have a quick comment:

Due to the nature of hypothesis testing, failed tests may be discovered at any time that are not related to code changed in a PR.
Additionally, running this locally it looks like this adds a full minute to the tox py3 environment.

Due to the above reasons, I think we want this separate from the py3 environment, and we may want to think about how (and whether) this works with CI, since a failed test might not correlate to a users changes.

I'm looking forward to digging more into this PR (tomorrow)!

aciba90 · 2022-06-14T08:58:15Z

1. Due to the nature of hypothesis testing, failed tests may be discovered at any time that are not related to code changed in a PR.

2. Additionally, running this locally it looks like this adds a full minute to the tox py3 environment.
Due to the above reasons, I think we want this separate from the py3 environment, and we may want to think about how (and whether) this works with CI, since a failed test might not correlate to a users changes.

We could create a separated tox and CI environment by:

Allow it to fail: jobs-that-are-allowed-to-fail.
Use the auto-applied hypothesis markers by the-hypothesis-pytest-plugin or add a custom maker to have more granular control.

I'm looking forward to digging more into this PR (tomorrow)!

Thanks for the feedback, and I am also looking forward to integrating this PR, I think hypothesis is a super powerful tool for testing and it could help us a lot!

TheRealFalcon · 2022-06-14T12:46:51Z

I agree that we shouldn't include hypothesis in CI. I think it makes more sense as a separate jenkins job and separate tox definition with the test having a mark so that is not run by default (unless it can run fast). My goal in suggesting the HAS_HYPOTHESIS_JSONSCHEMA bit was to be able to run unit tests without requiring a hypothesis dependency, but that currently doesn't work. There are a few modules that import hypothesis directly and will cause pytest to fail if hypothesis isn't installed.

Also, running unit tests by directly invoking pytest on the command line leads to E hypothesis.errors.InvalidArgument: Expected a SearchStrategy but got mapping['config']=None (type=NoneType).

aciba90 · 2022-06-14T13:11:43Z

I agree that we shouldn't include hypothesis in CI. I think it makes more sense as a separate jenkins job and separate tox definition with the test having a mark so that is not run by default (unless it can run fast).

Noted, I will improve this.

My goal in suggesting the HAS_HYPOTHESIS_JSONSCHEMA bit was to be able to run unit tests without requiring a hypothesis dependency, but that currently doesn't work. There are a few modules that import hypothesis directly and will cause pytest to fail if hypothesis isn't installed.

I added hypothesis as a required test dependency, thinking into future possible tests that could make a good use of it.
The dependency probably was not installed in your tox env, to do so run: tox --recreate -e py3.

Also, running unit tests by directly invoking pytest on the command line leads to E hypothesis.errors.InvalidArgument: Expected a SearchStrategy but got mapping['config']=None (type=NoneType).

I did not test that, let me check it.

- Implement JsonLocalResolver to avoid errors in hypothesis-jsonschema due to not resolving json references at all - Handle profile registration for hypothesis<3.47

Revert .travis as long tests are going to be run in Jenkins.

aciba90 · 2022-06-14T14:05:30Z

I created a new tox env called hypothesis-slow which runs unit tests marked with hypothesis_slow. These test are excluded from other unit test envs.

Also, running unit tests by directly invoking pytest on the command line leads to E hypothesis.errors.InvalidArgument: Expected > a SearchStrategy but got mapping['config']=None (type=NoneType).

~~I can successfully run pytest directly. Could you, @TheRealFalcon, maybe provide more info about your virtual env? Perhaps the hypothesis version that you have installed is enough.~~

aciba90 · 2022-06-14T16:58:06Z

I made hypothesis optional dependency. I think now everything should be okay to be integrated as a Jenkins job. Please, let me know if you see something else to improve.

aciba90 · 2022-06-21T17:23:55Z

I think the real reason for the need of JsonLocalResolver is that there is some kind of inconsistency with the cc_users_groups schema which causes hypothesis-jsonschema to fail in the generation of samples.

Furthermore, if I remove that schema and execute the test without JsonLocalResolver, then it finds correct samples that make the schema validation fail.

I am going to mark this PR as WIP and investigate why this is happening. @holmanb

aciba90 · 2022-06-22T12:24:22Z

I think the real reason for the need of JsonLocalResolver is that there is some kind of inconsistency with the cc_users_groups schema which causes hypothesis-jsonschema to fail in the generation of samples.

It seems this sub-schema is not fully jsonschema compliant, symptoms:

[lp1979491: schema: wrong validation of cc_users_groups](https://bugs.launchpad.net/cloud-init/+bug/1979491)
https://github.com/Zac-HD/hypothesis-jsonschema/issues/97

As this PR is only to enable fuzz testing, I decided to avoid the problem: tests/unittests/config/test_schema.py#L1146

Furthermore, if I remove that schema and execute the test without JsonLocalResolver, then it finds correct samples that make the schema validation fail.

This was my fault, as I was reusing the same schema in hypothesis-jsonschema.given and our validation, but hypothesis do not deep-copy it and as a result I was validating against a modified schema.

This PR is ready to be reviewed, @holmanb.

holmanb · 2022-06-24T13:39:09Z

More of a comment/question than a review:

In the run output I see that Draft202012Validator is used for filtering data, like this:

52.94%, Retried draw from text().filter(Draft202012Validator(schema={'type': 'string'}, format_checker=None).is_valid).filter(lambda s: s not in out) to satisfy filter

I know the docs say they support draft4. Maybe we need to configure a setting to use the correct version?

holmanb · 2022-06-24T14:23:11Z

It seems this sub-schema is not fully jsonschema compliant, symptoms:

lp1979491: schema: wrong validation of cc_users_groups

Error resolving sub-schema property with type object or array python-jsonschema/hypothesis-jsonschema#97

Interesting find. Thanks for documenting that. It seems the fuzz tester found it's first bug?

Furthermore, if I remove that schema and execute the test without JsonLocalResolver, then it finds correct samples that make the schema validation fail.

And second bug and ....?

aciba90 · 2022-06-27T10:47:14Z

More of a comment/question than a review:

In the run output I see that Draft202012Validator is used for filtering data, like this:
52.94%, Retried draw from text().filter(Draft202012Validator(schema={'type': 'string'}, format_checker=None).is_valid).filter(lambda s: s not in out) to satisfy filter
I know the docs say they support draft4. Maybe we need to configure a setting to use the correct version?

hypothesis-jsonschema does not offer how to configure / force this. I guess they select the validator depending on the $schema value.

In your example, they are generating values of a sub-schema {'type': 'string'} which seems to be a leaf of our schema and does not contain a specified $schema.
I guess they will generate and merge sub-config from leaves to root and the final validator will contain the correct Draft*Validator as we specify "$schema": "http://json-schema.org/draft-04/schema#". But this is only a guess.

If you want me to have a deeper inspection into hypothesis-jsonschema I can do it.

It seems the fuzz tester found its first bug?

Indirectly, yes, as hypothesis-jsonschema fails while generating examples from our schema when we do not remove cc_users_groups from it.

Furthermore, if I remove that schema and execute the test without JsonLocalResolver, then it finds correct samples that make the schema validation fail.

And second bug and ....?

This was my fault, I described it here: #1499 (comment)
I wasn't using our specified original schema for validation.

holmanb

Thanks for working through our requests and suggestions, I am happy with how this looks.

I know the docs say they support draft4. Maybe we need to configure a setting to use the correct version?

hypothesis-jsonschema does not offer how to configure / force this. I guess they select the validator depending on the $schema value.

In your example, they are generating values of a sub-schema {'type': 'string'} which seems to be a leaf of our schema and does not contain a specified $schema. I guess they will generate and merge sub-config from leaves to root and the final validator will contain the correct Draft*Validator as we specify "$schema": "http://json-schema.org/draft-04/schema#". But this is only a guess.

Interesting, thanks :)

If you want me to have a deeper inspection into hypothesis-jsonschema I can do it.

I don't think that's needed.

Indirectly, yes, as hypothesis-jsonschema fails while generating examples from our schema when we do not remove cc_users_groups from it.

Nice :)

This was my fault..

No problem at all. No need for this to be blocked on an external bug it found. Once that is fixed we can circle back and drop the schema cleaning functions. Thanks for putting this together. I think I'm good with it, pending any remaining changes by other reviewers.

holmanb · 2022-07-06T22:58:45Z

@TheRealFalcon Do you have any unresolved issues with this PR? If not I can merge it. I just want to make sure there isn't anything outstanding.

TheRealFalcon · 2022-07-06T23:21:18Z

@holmanb I'm good, thanks!

aciba90 force-pushed the fuzz_testing_jsonschema branch 2 times, most recently from 0babd39 to 8100847 Compare June 8, 2022 11:47

aciba90 changed the title ~~WIP Fuzz testing jsonchema~~ Fuzz testing jsonchema Jun 9, 2022

aciba90 marked this pull request as ready for review June 9, 2022 11:42

holmanb self-assigned this Jun 13, 2022

holmanb requested changes Jun 13, 2022

View reviewed changes

aciba90 added 6 commits June 14, 2022 15:50

Set up hypothesis-jsonschema

1568e5d

Add own hypothesis_jsonschema module

32c7932

Fix travis

fd5fced

Fix hypothesis

55bcdc7

- Implement JsonLocalResolver to avoid errors in hypothesis-jsonschema due to not resolving json references at all - Handle profile registration for hypothesis<3.47

Configure hypothesis only for unit tests

36ed9d4

Fix types and pass hypothesis envvar

f61826b

aciba90 force-pushed the fuzz_testing_jsonschema branch from 5fcdc82 to 2561781 Compare June 14, 2022 13:54

Add new tox env to run long hypothesis tests

4feedc2

Revert .travis as long tests are going to be run in Jenkins.

aciba90 force-pushed the fuzz_testing_jsonschema branch from 2561781 to 4feedc2 Compare June 14, 2022 13:55

aciba90 marked this pull request as draft June 14, 2022 15:03

aciba90 added 2 commits June 14, 2022 18:34

Make hypothesis optional dep

adc8334

Remove Hypothesis profiles

4d100ba

aciba90 marked this pull request as ready for review June 14, 2022 16:56

aciba90 requested a review from holmanb June 20, 2022 19:45

aciba90 marked this pull request as draft June 21, 2022 17:24

Fix test

26bd6cc

aciba90 marked this pull request as ready for review June 22, 2022 12:24

holmanb approved these changes Jun 28, 2022

View reviewed changes

holmanb requested a review from TheRealFalcon July 6, 2022 22:57

holmanb merged commit c58ea03 into canonical:main Jul 6, 2022

aciba90 deleted the fuzz_testing_jsonschema branch July 7, 2022 11:40

ubuntu-server-builder mentioned this pull request May 12, 2023

Release 22.3 #4008

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzz testing jsonchema #1499

Fuzz testing jsonchema #1499

aciba90 commented Jun 7, 2022 •

edited

Loading

aciba90 commented Jun 9, 2022

holmanb left a comment

aciba90 commented Jun 14, 2022

TheRealFalcon commented Jun 14, 2022

aciba90 commented Jun 14, 2022

aciba90 commented Jun 14, 2022 •

edited

Loading

aciba90 commented Jun 14, 2022

aciba90 commented Jun 21, 2022

aciba90 commented Jun 22, 2022 •

edited by holmanb

Loading

holmanb commented Jun 24, 2022

holmanb commented Jun 24, 2022

aciba90 commented Jun 27, 2022

holmanb left a comment

holmanb commented Jul 6, 2022

TheRealFalcon commented Jul 6, 2022

Fuzz testing jsonchema #1499

Fuzz testing jsonchema #1499

Conversation

aciba90 commented Jun 7, 2022 • edited Loading

Proposed Commit Message

Additional Context

Test Steps

Checklist:

aciba90 commented Jun 9, 2022

holmanb left a comment

Choose a reason for hiding this comment

aciba90 commented Jun 14, 2022

TheRealFalcon commented Jun 14, 2022

aciba90 commented Jun 14, 2022

aciba90 commented Jun 14, 2022 • edited Loading

aciba90 commented Jun 14, 2022

aciba90 commented Jun 21, 2022

aciba90 commented Jun 22, 2022 • edited by holmanb Loading

holmanb commented Jun 24, 2022

holmanb commented Jun 24, 2022

aciba90 commented Jun 27, 2022

holmanb left a comment

Choose a reason for hiding this comment

holmanb commented Jul 6, 2022

TheRealFalcon commented Jul 6, 2022

aciba90 commented Jun 7, 2022 •

edited

Loading

aciba90 commented Jun 14, 2022 •

edited

Loading

aciba90 commented Jun 22, 2022 •

edited by holmanb

Loading