Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzz testing jsonchema #1499

Merged
merged 10 commits into from
Jul 6, 2022
Merged

Conversation

aciba90
Copy link
Contributor

@aciba90 aciba90 commented Jun 7, 2022

Proposed Commit Message

Fuzz testing jsonchema

- Add `hypothesis` and `hypothesis-jsonschema`
- Add `hypothesis-slow` tox env to run slow tests
- Implement `JsonLocalResolver` to avoid `hypothesis-jsonschema` errors
- Add fuzz test covering `validate_cloudconfig_schema`

Additional Context

SC-829

Test Steps

pytest
tox --recreate -e hypothesis-slow
# To test the deb build process
make deb

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@aciba90 aciba90 force-pushed the fuzz_testing_jsonschema branch 2 times, most recently from 0babd39 to 8100847 Compare June 8, 2022 11:47
@aciba90
Copy link
Contributor Author

aciba90 commented Jun 9, 2022

This PR enables hypothesis for future unit-testing things not related to jsonschema. If we agree to do so, then:

  • we have to add python3-hypothesis as deb build dependency in bionic, focal, jammy, impish and dev branches
  • I would like to add documentation under doc/rtd/topics/testing.rst explaining how to configure / choose hypothesis profiles

@aciba90 aciba90 changed the title WIP Fuzz testing jsonchema Fuzz testing jsonchema Jun 9, 2022
@aciba90 aciba90 marked this pull request as ready for review June 9, 2022 11:42
@holmanb holmanb self-assigned this Jun 13, 2022
Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for this! I'm just starting the review, but I have a quick comment:

  1. Due to the nature of hypothesis testing, failed tests may be discovered at any time that are not related to code changed in a PR.
  2. Additionally, running this locally it looks like this adds a full minute to the tox py3 environment.

Due to the above reasons, I think we want this separate from the py3 environment, and we may want to think about how (and whether) this works with CI, since a failed test might not correlate to a users changes.

I'm looking forward to digging more into this PR (tomorrow)!

@aciba90
Copy link
Contributor Author

aciba90 commented Jun 14, 2022

1. Due to the nature of hypothesis testing, failed tests may be discovered at any time that are not related to code changed in a PR.

2. Additionally, running this locally it looks like this adds a full minute to the tox py3 environment.

Due to the above reasons, I think we want this separate from the py3 environment, and we may want to think about how (and whether) this works with CI, since a failed test might not correlate to a users changes.

We could create a separated tox and CI environment by:

  1. Allow it to fail: jobs-that-are-allowed-to-fail.
  2. Use the auto-applied hypothesis markers by the-hypothesis-pytest-plugin or add a custom maker to have more granular control.

I'm looking forward to digging more into this PR (tomorrow)!

Thanks for the feedback, and I am also looking forward to integrating this PR, I think hypothesis is a super powerful tool for testing and it could help us a lot!

@TheRealFalcon
Copy link
Member

I agree that we shouldn't include hypothesis in CI. I think it makes more sense as a separate jenkins job and separate tox definition with the test having a mark so that is not run by default (unless it can run fast). My goal in suggesting the HAS_HYPOTHESIS_JSONSCHEMA bit was to be able to run unit tests without requiring a hypothesis dependency, but that currently doesn't work. There are a few modules that import hypothesis directly and will cause pytest to fail if hypothesis isn't installed.

Also, running unit tests by directly invoking pytest on the command line leads to E hypothesis.errors.InvalidArgument: Expected a SearchStrategy but got mapping['config']=None (type=NoneType).

@aciba90
Copy link
Contributor Author

aciba90 commented Jun 14, 2022

I agree that we shouldn't include hypothesis in CI. I think it makes more sense as a separate jenkins job and separate tox definition with the test having a mark so that is not run by default (unless it can run fast).

Noted, I will improve this.

My goal in suggesting the HAS_HYPOTHESIS_JSONSCHEMA bit was to be able to run unit tests without requiring a hypothesis dependency, but that currently doesn't work. There are a few modules that import hypothesis directly and will cause pytest to fail if hypothesis isn't installed.

I added hypothesis as a required test dependency, thinking into future possible tests that could make a good use of it.
The dependency probably was not installed in your tox env, to do so run: tox --recreate -e py3.

Also, running unit tests by directly invoking pytest on the command line leads to E hypothesis.errors.InvalidArgument: Expected a SearchStrategy but got mapping['config']=None (type=NoneType).

I did not test that, let me check it.

aciba90 added 6 commits June 14, 2022 15:50
- Implement JsonLocalResolver to avoid errors in hypothesis-jsonschema
  due to not resolving json references at all
- Handle profile registration for hypothesis<3.47
@aciba90 aciba90 force-pushed the fuzz_testing_jsonschema branch from 5fcdc82 to 2561781 Compare June 14, 2022 13:54
Revert .travis as long tests are going to be run
in Jenkins.
@aciba90 aciba90 force-pushed the fuzz_testing_jsonschema branch from 2561781 to 4feedc2 Compare June 14, 2022 13:55
@aciba90
Copy link
Contributor Author

aciba90 commented Jun 14, 2022

I created a new tox env called hypothesis-slow which runs unit tests marked with hypothesis_slow. These test are excluded from other unit test envs.

Also, running unit tests by directly invoking pytest on the command line leads to E hypothesis.errors.InvalidArgument: Expected > a SearchStrategy but got mapping['config']=None (type=NoneType).

I can successfully run pytest directly. Could you, @TheRealFalcon, maybe provide more info about your virtual env? Perhaps the hypothesis version that you have installed is enough.

@aciba90 aciba90 marked this pull request as draft June 14, 2022 15:03
@aciba90 aciba90 marked this pull request as ready for review June 14, 2022 16:56
@aciba90
Copy link
Contributor Author

aciba90 commented Jun 14, 2022

I made hypothesis optional dependency. I think now everything should be okay to be integrated as a Jenkins job. Please, let me know if you see something else to improve.

@aciba90 aciba90 requested a review from holmanb June 20, 2022 19:45
@aciba90
Copy link
Contributor Author

aciba90 commented Jun 21, 2022

I think the real reason for the need of JsonLocalResolver is that there is some kind of inconsistency with the cc_users_groups schema which causes hypothesis-jsonschema to fail in the generation of samples.

Furthermore, if I remove that schema and execute the test without JsonLocalResolver, then it finds correct samples that make the schema validation fail.

I am going to mark this PR as WIP and investigate why this is happening. @holmanb

@aciba90 aciba90 marked this pull request as draft June 21, 2022 17:24
@aciba90
Copy link
Contributor Author

aciba90 commented Jun 22, 2022

I think the real reason for the need of JsonLocalResolver is that there is some kind of inconsistency with the cc_users_groups schema which causes hypothesis-jsonschema to fail in the generation of samples.

It seems this sub-schema is not fully jsonschema compliant, symptoms:

[lp1979491: schema: wrong validation of cc_users_groups](https://bugs.launchpad.net/cloud-init/+bug/1979491)
https://github.com/Zac-HD/hypothesis-jsonschema/issues/97

As this PR is only to enable fuzz testing, I decided to avoid the problem: tests/unittests/config/test_schema.py#L1146

Furthermore, if I remove that schema and execute the test without JsonLocalResolver, then it finds correct samples that make the schema validation fail.

This was my fault, as I was reusing the same schema in hypothesis-jsonschema.given and our validation, but hypothesis do not deep-copy it and as a result I was validating against a modified schema.

This PR is ready to be reviewed, @holmanb.

@aciba90 aciba90 marked this pull request as ready for review June 22, 2022 12:24
@holmanb
Copy link
Member

holmanb commented Jun 24, 2022

More of a comment/question than a review:

In the run output I see that Draft202012Validator is used for filtering data, like this:

52.94%, Retried draw from text().filter(Draft202012Validator(schema={'type': 'string'}, format_checker=None).is_valid).filter(lambda s: s not in out) to satisfy filter

I know the docs say they support draft4. Maybe we need to configure a setting to use the correct version?

@holmanb
Copy link
Member

holmanb commented Jun 24, 2022

It seems this sub-schema is not fully jsonschema compliant, symptoms:

Interesting find. Thanks for documenting that. It seems the fuzz tester found it's first bug?

Furthermore, if I remove that schema and execute the test without JsonLocalResolver, then it finds correct samples that make the schema validation fail.

And second bug and ....?

@aciba90
Copy link
Contributor Author

aciba90 commented Jun 27, 2022

More of a comment/question than a review:

In the run output I see that Draft202012Validator is used for filtering data, like this:

52.94%, Retried draw from text().filter(Draft202012Validator(schema={'type': 'string'}, format_checker=None).is_valid).filter(lambda s: s not in out) to satisfy filter

I know the docs say they support draft4. Maybe we need to configure a setting to use the correct version?

hypothesis-jsonschema does not offer how to configure / force this. I guess they select the validator depending on the $schema value.

In your example, they are generating values of a sub-schema {'type': 'string'} which seems to be a leaf of our schema and does not contain a specified $schema.
I guess they will generate and merge sub-config from leaves to root and the final validator will contain the correct Draft*Validator as we specify "$schema": "http://json-schema.org/draft-04/schema#". But this is only a guess.

If you want me to have a deeper inspection into hypothesis-jsonschema I can do it.

It seems the fuzz tester found its first bug?

Indirectly, yes, as hypothesis-jsonschema fails while generating examples from our schema when we do not remove cc_users_groups from it.

Furthermore, if I remove that schema and execute the test without JsonLocalResolver, then it finds correct samples that make the schema validation fail.

And second bug and ....?

This was my fault, I described it here: #1499 (comment)
I wasn't using our specified original schema for validation.

Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through our requests and suggestions, I am happy with how this looks.

I know the docs say they support draft4. Maybe we need to configure a setting to use the correct version?

hypothesis-jsonschema does not offer how to configure / force this. I guess they select the validator depending on the $schema value.

In your example, they are generating values of a sub-schema {'type': 'string'} which seems to be a leaf of our schema and does not contain a specified $schema. I guess they will generate and merge sub-config from leaves to root and the final validator will contain the correct Draft*Validator as we specify "$schema": "http://json-schema.org/draft-04/schema#". But this is only a guess.

Interesting, thanks :)

If you want me to have a deeper inspection into hypothesis-jsonschema I can do it.

I don't think that's needed.

Indirectly, yes, as hypothesis-jsonschema fails while generating examples from our schema when we do not remove cc_users_groups from it.

Nice :)

This was my fault..

No problem at all. No need for this to be blocked on an external bug it found. Once that is fixed we can circle back and drop the schema cleaning functions. Thanks for putting this together. I think I'm good with it, pending any remaining changes by other reviewers.

@holmanb holmanb requested a review from TheRealFalcon July 6, 2022 22:57
@holmanb
Copy link
Member

holmanb commented Jul 6, 2022

@TheRealFalcon Do you have any unresolved issues with this PR? If not I can merge it. I just want to make sure there isn't anything outstanding.

@TheRealFalcon
Copy link
Member

@holmanb I'm good, thanks!

@holmanb holmanb merged commit c58ea03 into canonical:main Jul 6, 2022
@aciba90 aciba90 deleted the fuzz_testing_jsonschema branch July 7, 2022 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants