Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔄 Redefine schema in terms of @types/mdast #67

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

agoose77
Copy link

@agoose77 agoose77 commented Oct 21, 2024

This PR redefines the MyST schema in terms of types exposed by @types/mdast, and generates the MyST JSON schema from these types.

By making this change, we no longer need to vendor the CommonMark schema itself, and we can directly author the types that the rest of the MyST ecosystem consumes (rather than writing indirect JSON schema).

Tasks

  • Re-implement YML tests in TypeScript
  • Update docs
  • Publish schema in CI

Schema

Warning

This PR changes our spec slightly, to rectify breaking changes with MDAST and to make better use of types. These changes are as follows:

  • Table.align change from string | undefined to string[] | undefined | null (this comes from @types/mdast
  • crossReference, mystRole, mystDirective change from Node to Literal & Parent (these attributes already were used, but not at the required-type level)

Docs

To build the docs, run npm run docs:prepare, and then myst start in the docs directory.

With this PR, it's now the case that the code (rather than any IDL definition or schema) is the reference spec for working with MyST's AST. Consequently, this PR reworks the docs to focus on people who want to reason about e.g. MyST JSON AST, such as people writing plugins in Python.1

I chose to split each node into its own page:
image

image

Complex types (only arrays presently) are moved to footnotes:
image

This might be a controversial change, but I think on balance it does help readability.

Schema

The downside of this PR is that we have to use https://github.com/YousefED/typescript-json-schema, because of the way that mdast types are designed to be extended (via interface declaration merging). This mechanism is not handled by any AST-driven TS -> JSON Schema generators, so we need something that actually invokes the compiler (see above).

Footnotes

  1. This is not solely a pragmatic take on who needs the docs -- it's also hard to preserve all syntactic type information when exporting the spec to a non TS format (see also Schema). As we lose types like Array<BlockContent | DefinitionContent> (which is simply expanded to an unnamed literal intersection), I opted to focus exclusively on the parts of the AST that end up in JSON, i.e. node types, rather than e.g. named array types.

test/test.ts Outdated Show resolved Hide resolved
test/test.ts Outdated Show resolved Hide resolved
@@ -4,5 +4,4 @@ A basic Markdown Abstract Syntax Tree, mdast, is defined at [](https://github.co

## Deviations commonmark mdast

- According to the mdast spec [list items](https://github.com/syntax-tree/mdast#listitem) may only have [flow content](https://github.com/syntax-tree/mdast#flowcontent) children. However, according to the commonmark spec, list items may also be flow or phrasing content, such as [text](https://spec.commonmark.org/0.30/#example-255). This depends on spacing between the list items in the original document. In myst-spec, we choose to follow the commonmark spec and allow `ListItem` children to be `FlowContent` or `PhrasingContent`.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct w.r.t the CommonMark spec:

- one

two

produces this CommonMark AST:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">

<document xmlns="http://commonmark.org/xml/1.0">
  <list type="bullet" tight="true">
    <item>
      <paragraph>
        <text>one</text>
      </paragraph>
    </item>
  </list>
  <paragraph>
    <text>two</text>
  </paragraph>
</document>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this saying that we default to having a paragraph instead of text directly in the list item? Which is what is created?

Copy link
Author

@agoose77 agoose77 Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our documentation states that

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">

<document xmlns="http://commonmark.org/xml/1.0">
  <list type="bullet" tight="true">
    <item>
      <text>one</text>
    </item>
  </list>
</document>

is permitted by the CommonMark spec. I don't see any evidence that that is the case. What I did notice is that the HTML renderer doesn't show the <p>, but I think that might be a rendering difference rather than a spec difference.

I noticed this because whilst the spec page shows <li>TEXT</li> in the inline example, the actual examples file does not. EDIT: oops, off-by-one error :P

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add -- if my conclusion here is incorrect, then typing this proves quite hard because it is not easy/possible to extend an existing type and change the type of children in that manner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow... yeah, wild that the html doesn't show the <p> despite it being in the AST! Oof, good catch.

Copy link
Member

@rowanc1 rowanc1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good direction. Excited to have proper typescript types.

Docs and a downstream PR in mystmd consuming these changes are the major things to see before merging/releasing.

@agoose77 agoose77 requested a review from rowanc1 October 25, 2024 12:39
@agoose77 agoose77 marked this pull request as ready for review October 25, 2024 12:43
@agoose77
Copy link
Author

TODO: move to @types/mdast==4 in order to work with most other remark releases.

Copy link
Contributor

@fwkoch fwkoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great. It sure is nice to define actual typescript types - I think my original reasoning for JSON schema was (a) language agnostic (a bit too lofty a goal so early in the project, I suspect...) and (b) I wasn't super happy with any TS -> JSON options. Looks like you have one that works, though.

I'm approving this, but I agree with Rowan that we should probably consume it in mystmd before merging/releasing just to iron out any type oddities that may come up (I assume it will substantially improve things and allow us to basically scrub GenericNode/GenericParent from the codebase).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants