Capability for serializing and deserializing well-formedness definitions #130

fhackett · 2024-08-09T14:18:40Z

Like the title says, this PR allows Trieste's well-formedness definitions to be serialized to Trieste ASTs (which have a plain-text conversion) and back.

The utility of this feature is in cases like:

being able to ask a tool what its input well-formedness definition is (in a hypothetical future scenario where two tools will exchange Trieste ASTs directly)
debugging what a well-formedness definition actually looks like when it has been defined as a long series of mutations from some other definition.
wanting to know what a well-formedness definition for well-formedness definitions looks like.

In its current state, the code can serialize its own well-formedness definition and then re-read it. It's challenging to fuzz the code as-is because it's not in the format of a reader/writer: reader/writers can't work with Wellformedness as inputs or outputs.

While I believe the code to be usable in its current state (in that you can use it with Trieste's native AST serializers and it will crash on bad inputs), I aspire to wrap this code with a reader/writer that have proper error handling.

fhackett · 2024-08-13T13:01:05Z

Based on the conversation yesterday, I incorporated the current serializer as part of the AST debug writing process.

This is not a way to read external WF definitions, nor is it that great of an output format (it's full of the long internal token names, for one), but it works and you can have a lot of fun diffing WFs between passes.

Note that I made some effort to ensure that WF output is canonical and ordered, so you can compare different WFs and a text diff will get quite close to a semantic diff, in terms of where nodes appear and disappear, mapping similarities, etc.

I also noticed that actually giving an output mode outside of "just serialize the AST" probably belongs in another file (and outside this changeset), because otherwise we'll have an interesting circular dependency between the pass machinery and, well, itself.

fhackett · 2024-08-13T13:03:35Z

All that to say, I think this is moderately useful, reasonably extensible in some ways I discussed, and ready for review in its own right (give or take inevitable style questions, etc).

Note one feature that isn't literally just how Wellformed works, which is the optional ability to specify a namespace that forms a string prefix of all (outside of core Trieste stuff like "top") token names. It might be useful in the future, and if you ignore it, it gets set to the universal "" prefix and does nothing.

initial wf_meta to Node and back

5c72cc6

fhackett marked this pull request as draft August 9, 2024 14:18

fhackett added 3 commits August 9, 2024 15:24

remove problematic emplace_back

ff9a6ed

move key code out of assert()

971e0a6

add wf to debug output

01a59a5

fhackett mentioned this pull request Aug 13, 2024

Fuzzer failure: segfault in yaml_fuzz_to_json, seed 1597729744 #125

Open

fhackett marked this pull request as ready for review August 13, 2024 13:01

fhackett added 3 commits August 15, 2024 16:51

more readable WF dump

dd9b517

Merge branch 'main' into fhackett-wf-rw

75a2f7f

Merge branch 'main' into fhackett-wf-rw

76b042b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capability for serializing and deserializing well-formedness definitions #130

Capability for serializing and deserializing well-formedness definitions #130

fhackett commented Aug 9, 2024

fhackett commented Aug 13, 2024

fhackett commented Aug 13, 2024

Capability for serializing and deserializing well-formedness definitions #130

Are you sure you want to change the base?

Capability for serializing and deserializing well-formedness definitions #130

Conversation

fhackett commented Aug 9, 2024

fhackett commented Aug 13, 2024

fhackett commented Aug 13, 2024