Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test change directory structure for invalid yaml tests #410

Merged
merged 24 commits into from
Dec 12, 2024

Conversation

thorehusfeldt
Copy link
Collaborator

Work in progress on unified verification of various schemas for generators

@thorehusfeldt
Copy link
Collaborator Author

Addresses #405

@thorehusfeldt
Copy link
Collaborator Author

test/identity has

https://github.com/RagnarGrootKoerkamp/BAPCtools/blame/a486075a1adda0593ff459e72bfe57564a30ee03/test/problems/identity/generators/generators.yaml#L202

which is (I believe) disallowed. Authored by @mzuenni and committed by @RagnarGrootKoerkamp , please comment. (data can have 2-5 keys with fixed names, unknown keys are only allowed at the top level.)

@RagnarGrootKoerkamp
Copy link
Owner

I don't think we depend on any unknown keys currently, not nested and not in the root.

Reason to allow them:

  • backward/forward compatibility??

Reason to forbid them:

  • Better linting, since you get more warnings for mistyped keys.

I'd say disallowing nested unknown keys is good for sure, and can also be convinced to drop root unknown keys.
So yes the identity/generators/generators.yaml file should be updated.

@thorehusfeldt
Copy link
Collaborator Author

Note to self: What do we do with this one?

visualizer: /visualizers/asy.py
data:
  invalid_inputs:
    visualizer: /foo
    data:
      foo: 
        in: "1 2 3"
  sample:
    data:
      foo: 
        in: "a b c"
  secret:
    data:
      'bar':
        in: "x y z"

This is nonsense (invalid inputs can’t be visualised, so it makes no sense to specify a visualiser for them). No schema forbids this.

I think we can ignore this. (I have it lying around as a yaml file to possibly validate against.)

@thorehusfeldt
Copy link
Collaborator Author

thorehusfeldt commented Nov 16, 2024

The shiny new script at test/generators_yaml now gives:

Processing ../../doc/generators.yaml... OK
Processing ../problems/divsort/generators/generators.yaml... OK
Processing ../problems/identity/generators/generators.yaml... OK
Processing ../problems/generatorincludes/generators/generators.yaml... OK
Processing ../problems/multipass/generators/generators.yaml... Error
Output from cue vet:
data.secret: field is required but not present:
    ../../support/schemas/generators.cue:56:3
Processing ../problems/pickanumber/generators/generators.yaml... OK
Processing ../problems/interactivemultipass/generators/generators.yaml... Error
Output from cue vet:
data.secret: field is required but not present:
    ../../support/schemas/generators.cue:56:3
Processing invalid_yaml/bad-argument.generators.yaml... OK (correctly rejected)
Processing invalid_yaml/bad.generators.yaml... OK (correctly rejected)
Processing invalid_yaml/bad_casename.generators.yaml... OK (correctly rejected)
Processing invalid_yaml/bad_generators.yaml... OK (correctly rejected)
Processing invalid_yaml/missing_sample.generators.yaml... OK (correctly rejected)
Processing invalid_yaml/missing_secret.generators.yaml... OK (correctly rejected)
One or more cue vet commands failed.

The offending problems are the multipass problems in test. Should I just give them a pathological secret key? (Edit: made it so.)

@thorehusfeldt
Copy link
Collaborator Author

OK, Step 1 done.

What’s new is that test/generator_yaml/test_schemata.sh verifies all valid-looking *generators.yaml in the entire rep against the cue schema in support/schemas. Also, it checks various invalid_yaml and makes sure that these files fail.

This is already useful and replaces with pride the CI line

- run: cue vet **/generators.yaml support/schemas/*cue -d '#Generators'

(which should now just run the bash script instead).

Next step is to also validate the same files against the JSON schema.

@thorehusfeldt
Copy link
Collaborator Author

As for

I don't think we depend on any unknown keys currently, not nested and not in the root.

a useful key I’d like to advocate for is version:, defined as

  version: =~ "[0-9]{4}-[0-9]{2}" | *"2024-11"

@thorehusfeldt thorehusfeldt marked this pull request as ready for review November 16, 2024 15:25
echo -n "Processing $file... "
output=$(cue vet "$file" $schemadir/*.cue -d "#Generators" 2>&1)
exit_code=$?
if [ $exit_code -eq 0 ]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both here and below: you may want to check for 0, 1, and anything else, to catch cue crashes? (But that should be rare so maybe not worth the effort.)

@RagnarGrootKoerkamp
Copy link
Owner

  • Setting visualizer for invalid cases: yes let's just allow it even though it's useless.
  • The empty secret: { data: [] } keys: lgtm
  • Yes feel free to update the CI workflow.
  • Yes, adding a version makes sense. Feel free to add that to the skel files in the repo as well.

@thorehusfeldt
Copy link
Collaborator Author

thorehusfeldt commented Nov 17, 2024

I found an inconsistency. The JSON schema has (after converting back to CUE for readability):

input_validator_flags?: string

whereas the CUE scheme has

input_validator_flags?:  *"" | string | {[string]: string}

This makes a difference, for instance for test/problems/generatorincludes/generators.yaml, which has

  secret:
    data:
      - small:
          testdata.yaml:
            output_validator_flags: space_change_sensitive
            input_validator_flags:
              connected: --small
              strongly-connected: --small

thus following the CUE schema. In problemformatpackagespeak, both legacy and draft 2023-07 versions have

String or map with the keys "name" and "flags"

and

Sequence of strings or map of strings to sequences of strings

Note that the latter (draft) version changed its names again and the key is now called input_validator_args (not _flags). It’s not easy to keep this in sync, in particular because of a lack version numbers, stringent schemas, or reference (versioned) implementations.

I think I’ll try to upgrade the JSON schema.

@thorehusfeldt
Copy link
Collaborator Author

thorehusfeldt commented Nov 17, 2024

TODOs:

  • in invalid_yaml/bad_generators.yaml, make cue vet go through all of the ----separated yaml snippets (and reject them) instead of bailing out at the first by default
  • understand why cue import of a JSON schema adds a !~ "^()^$ regex to key names (escalated to CUE slack; this may be an error)
  • refactor testdata.yaml in JSON schema
  • update testdata.yaml in JSON schema to allow multiple input_validator_flags.

@thorehusfeldt
Copy link
Collaborator Author

In bad_generators.yaml, many entries are rejected for the wrong reason. For instance:

# Solution ans visualizer must have an absolute path:
solution: a

if an invalid generator, but that's clear already from the missing data.sample.

@thorehusfeldt
Copy link
Collaborator Author

thorehusfeldt commented Nov 28, 2024

Here is where I register changes.

In test/problems/identity/generators/generators.yaml:80:

 - dir: dir/ 306 

I suggest to disallow the /. The first token in a generator call is the name of a file, the name of a directory, or the name of a generator.

Copy link
Owner

@RagnarGrootKoerkamp RagnarGrootKoerkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small comments, but overall things look nice :)

@@ -77,7 +77,7 @@ data:
data:
- dir: dir 300
- dir2: dir 301
- dir: dir/ 306
#- dir: dir/ 306 # TODO disallow

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is done now.

sample: {data: []}
secret:
data:
1: b

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an int-type key right. Did we want to allow that?

Oh, was this the discussion that CUE can't distinguish yaml key types? Maybe add a comment as such here and in the commented out part of invalid.generators.yaml.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither CUE nor JSON can see the difference between

1: foo

and

"1": "foo"

I’m not quite sure what is best here. I want to validate as much as possible in a schema language, but for the rest I think we need a python script. Undecided so far.

@@ -0,0 +1,2 @@
# The smallest valid generator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment says valid, but it's in the invalid directory

@RagnarGrootKoerkamp RagnarGrootKoerkamp merged commit cd8988d into master Dec 12, 2024
6 checks passed
@RagnarGrootKoerkamp RagnarGrootKoerkamp deleted the feat/schematest branch December 12, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants