Proofdown Parser parses Proofdown (PML: Proof Markup Language) into a deterministic, typed AST (proofdown_ast) for use by validators, renderers, CLIs, and WASM targets. The language supports full Markdown (CommonMark + selected GFM: tables, strikethrough, autolink, task lists) plus a small, HTML-like component system. v2 adds purely additive semantics (captions, JSON Pointers, table column selectors, text viewer) enforced by a validator.
Highlights
- Deterministic AST JSON for golden tests and cross-platform parity
- Syntax-only parser; semantics enforced downstream by
proofdown_validate - Strict limits (depth, nodes, input size) and robust error model
- CLI (
pml) for parse/validate with JSON output and stable exit codes - WASM build for browser use; mdBook docs for language and contract
- mdBook (local artifact): book/book/index.html
- mdBook (GitHub Pages): https://veignsche.github.io/proofdown_parser/ (optional, if Pages enabled)
- Language v1 spec: .specs/01_proofdown_language_v1.md
- Additive semantics v2: .specs/03_proofdown_language_v2.md
- Artifact-first semantics: .specs/02_artifact_first_language_spec.md
- Authoring guide (LLM-friendly): .docs/proofdown-authoring-guide.md
- Testing artifacts catalog: .docs/testing-artifacts/README.md
- Table row schemas (JSON): .specs/schemas/README.md
- SSG contract (stable API/AST): .specs/10_contract_with_provenance_ssg.md
- Stable API:
parse(&str) -> Result<Document, ParseError>,parse_with_limits(&str, ParserLimits) - Determinism: enforced by golden JSON tests across v1/v2 fixtures
- Validator:
proofdown_validateenforces whitelist and bounds (v2 features) - CLI:
pml parse|validatewith JSON outputs; standard exit codes - WASM:
proofdown_wasmtarget; CI job builds; demo scaffold inexamples/wasm/
Run a local equivalent of the CI to avoid surprises before pushing:
bash scripts/ci_local.sh # full run
bash scripts/ci_local.sh --skip-wasm --skip-mdbook --skip-fuzz # faster loopThis mirrors: fmt, clippy, tests, schema checks, benches, CLI smoke over fixtures, optional wasm/mdBook builds, and fuzz smoke.
- Maturity: Production-ready parser crate with typed errors, validator, CLI, and WASM build.
- Implemented API:
parse(&str) -> Result<Document, ParseError>,parse_with_limits(&str, ParserLimits); helperfind_attr. - Validator:
proofdown_validate::validate(&Document, Option<&Limits>) -> Result<(), ValidateError>enforces whitelist and attribute bounds (v2 features included). - CLI:
pml parse|validatewith--json,--pretty, and--limits.*flags; documented exit codes. - WASM:
proofdown_wasmexportswasm_parse(input: &str) -> StringJSON payload; CI builds wasm32 target. - Next: link macro parsing (planned v1.1) and includes (planned v1.2).
Proofdown is a Markdown-like format with a small, HTML-like component system for typed, securable blocks used across Provenance. It combines:
- Markdown-style blocks: headings (
#..####), paragraphs, lists, and code fences. - Minimal component tags for structured UI blocks, e.g.
<grid cols=3> ... </grid>. - Optional interpolations of verified Index fields via
{{ field }}(text-only, escaped at render time). - A link macro
[[...]]for artifact- and repo-aware deep links.
Files use UTF-8 with \n newlines and the .pml extension (Proof Markup Language).
This submodule hosts a nested Cargo workspace to allow independent iteration of parser-related crates. Proposed layout (from .plans/00_workspace_plan.md):
proofdown_parser/
├─ Cargo.toml # [workspace] root (nested under parent repo)
├─ crates/
│ ├─ proofdown_ast/ # AST and error types, serde
│ ├─ proofdown_lexer/ # optional tokenizer (planned)
│ ├─ proofdown_parser/ # parser crate (this)
│ ├─ proofdown_validate/ # whitelist + attribute/limits
│ ├─ proofdown_wasm/ # wasm-bindgen bindings
│ └─ proofdown_cli/ # small CLI for dev/testing
└─ .specs/ and .plans/
Current workspace members (from root Cargo.toml):
[workspace]
members = [
"crates/proofdown_ast",
"crates/proofdown_parser",
"crates/proofdown_validate",
"crates/proofdown_cli",
"crates/proofdown_wasm",
"crates/schema_check",
"crates/proofdown_integration_example",
]
resolver = "2"Parent workspace depends on inner crates via path dependencies (do not include this nested workspace as a member in the parent):
# In the parent Cargo.toml of a consumer crate
[dependencies]
proofdown_parser = { path = "crates/proofdown_parser/crates/proofdown_parser" }- Deterministically parse Proofdown into a typed AST.
- Enforce a whitelisted component set and attribute schema (via validation pass).
- Provide precise, human-readable errors with line/column, kind, and context.
- Offer a zero-IO core suitable for WASM and sandboxed execution.
- Bound resource usage for safety (depth, nodes, input size).
- Rendering HTML (delegated to renderers/Worker).
- Fetching includes or artifacts (delegated to caller/Worker).
- HTML sanitization (render-time responsibility).
- Blocks: headings
#..####, paragraphs, component blocks. - Components (whitelisted):
- Structural:
grid,section,card(minimum set used by examples) - Artifact viewers:
artifact.summary,artifact.table,artifact.json,artifact.markdown,artifact.image,artifact.link
- Structural:
- Attributes: strings only; enforce known keys per component with simple bounds.
- Link macro
[[a:artifact_id | Optional Title]]: optional for 1.0; planned for 1.1. - Includes: out of scope for 1.0 (planned for 1.2 with cycle detection).
- Default limits: depth ≤ 16, nodes ≤ 50k, input ≤ 1 MiB.
- Markdown blocks: headings (ATX
#..####), paragraphs, block quotes, thematic breaks, fenced/indented code blocks, lists (unordered/ordered with start/tight), GFM task items, GFM tables (optional header, column alignment). - Markdown inlines: text, emphasis, strong, strikethrough (GFM), inline code, soft/hard breaks, links (url/title), images (url/title, alt from inline text).
- Components (block):
- Start tag:
<name key="value" ...>or<name key="value" ... /> - End tag:
</name>(required unless self-closing) - Children: zero or more blocks (full Markdown supported) until matching end tag.
- Start tag:
- Attributes:
key=value, quoted with"..."or bare until whitespace/>. Note: the parser recognizes component syntax generically. Whitelisting component names and attribute bounds is enforced by the validator (see Validation). Raw HTML blocks/inline are dropped.
The AST types live in crates/proofdown_ast/src/lib.rs and derive serde for stable JSON round-tripping:
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct Document { pub blocks: Vec<Block> }
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(tag = "type")]
pub enum Block {
Heading { level: u8, inlines: Vec<Inline> },
Paragraph { inlines: Vec<Inline> },
BlockQuote { children: Vec<Block> },
ThematicBreak,
CodeBlock { info: String, text: String },
List { kind: ListKind, start: Option<u64>, tight: bool, items: Vec<ListItem> },
Table { align: Vec<TableAlign>, header: Option<TableRow>, rows: Vec<TableRow> },
Component(Component),
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub enum ListKind { Bullet, Ordered }
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct ListItem { pub children: Vec<Block>, pub task: Option<bool> }
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(tag = "type")]
pub enum Inline {
Text { text: String }, Emph { children: Vec<Inline> }, Strong { children: Vec<Inline> },
Strikethrough { children: Vec<Inline> }, Code { text: String }, SoftBreak, HardBreak,
Link { url: String, title: Option<String>, children: Vec<Inline> },
Image { url: String, title: Option<String>, alt: String }
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub enum TableAlign { None, Left, Center, Right }
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct TableRow { pub cells: Vec<Vec<Inline>> }
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct Component {
pub name: String,
pub attrs: Vec<Attr>,
pub children: Vec<Block>,
pub self_closing: bool,
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct Attr { pub key: String, pub value: String }Note: The proofdown_ast crate hosts these types and the structured error model shared across crates.
Current API (typed):
pub fn parse(input: &str) -> Result<Document, ParseError>
pub fn parse_with_limits(input: &str, limits: ParserLimits) -> Result<Document, ParseError>Helper:
pub fn find_attr<'a>(attrs: &'a [Attr], key: &str) -> Option<&'a str>Related crates/APIs:
proofdown_validate::validate(doc: &Document, limits: Option<&Limits>) -> Result<(), ValidateError>- WASM surface (via
wasm-bindgen):wasm_parse(input: &str) -> Stringreturning JSON{ ok, doc|err }
use proofdown_parser::parse;
let input = r#"
# QA Evidence for {{ commit }}
<grid cols=3 gap=16>
<card title="Tests">
<artifact.summary id="tests-summary" />
</card>
</grid>
"#;
let doc = parse(input)?; // Document with typed blocks/components# QA Evidence for {{ commit }}
<grid cols=3 gap=16>
<card title="Tests">
<artifact.summary id="tests-summary" />
[[a:coverage | Full Coverage Report]]
</card>
<card title="Coverage">
<artifact.table id="coverage" />
</card>
<card title="Key Failures">
<artifact.markdown id="failures" />
</card>
</grid>link = "[[" SP* target SP* ( "|" SP* label SP* )? "]]"
label = 1*( %x21-7E )
target = a_target / repo_target / src_target / doc_target / gh_target / ci_target / sym_target / path_shorthand
a_target = "a:" id
repo_target = "repo:" path [ linefrag ]
src_target = "src:" path [ linefrag ]
doc_target = "doc:#" anchor
gh_target = "gh:" ( "issue:" 1*DIGIT / "pr:" 1*DIGIT )
ci_target = "ci:" ( "run" / "job:" jobname )
sym_target = "sym:" path "::" symbol
path_shorthand = path [ linefrag ] ; interpreted as repo:<path>
id = 1*( ALPHA / DIGIT / "-" / "_" )
path = 1*( ALPHA / DIGIT / "/" / "." / "-" / "_" )
anchor = 1*( ALPHA / DIGIT / "-" / "_" )
jobname = 1*( ALPHA / DIGIT / "-" / "_" )
symbol = 1*( VCHAR )
linefrag = "#L" 1*DIGIT [ "-L" 1*DIGIT ]Semantics: path_shorthand validates under repository rules (commit-pinned, no traversal). Invalid targets are hard errors during parsing/validation.
Structural:
grid(cols=1..6, gap=0..64)section(title: string)card(title: string)
Artifact viewers:
artifact.summary(id)— expectssummary:testJSONartifact.table(id)— expects coverage JSONartifact.json(id, collapsed=true|false, depth=0..8)artifact.markdown(id)artifact.image(id, alt, max_height=128..2048)artifact.link(id, download=true|false, title)
Repository viewers (optional for v1 if artifacts exist to resolve):
repo.code(path, range, lang, highlight)— range line count SHOULD be ≤ 400repo.link(path, label, lines)repo.tree(path, depth=1..5, include, exclude)repo.diff(base_id, head_id, path, context=0..10)repo.symbol(path, name)
Note: These are the component names used downstream by the SSG. The parser recognizes component syntax and records Component { name, attrs, children, self_closing } but does not enforce a whitelist. Unknown component names/attributes are not parse errors; they are validated by the SSG or the proofdown_validate crate.
- Pure functions; no environment access.
- Deterministic traversal order and attribute normalization.
- Limits are a parser responsibility. Default limits (depth ≤ 16, nodes ≤ 50k, input ≤ 1 MiB). Breaches produce errors (structured error model implemented).
- Parser does not enforce a component whitelist. It recognizes component syntax only; whitelist/attribute validation happen downstream (SSG or validator crate).
- No script execution or HTML emission; output is a structured AST only.
- Text sanitization happens during render; this crate is purely structural.
- Golden fixtures derived from
.specs/00_proofdown_parser.mdexamples. - Parser error cases: malformed tags, unterminated/mismatched components, malformed attributes, and limits exceeded.
- Unknown components/attributes are not parse errors and are covered by downstream validation tests.
- Determinism: parse → serialize AST (stable JSON) → byte-for-byte compare across runs.
- 1.0:
- Minimal grammar + components to power examples.
- Whitelist + attribute validation and limits.
- Rust API + unit tests and golden tests.
- 1.1:
- Link macro parsing.
- WASM bindings (
wasm-bindgen). - Expanded components (
tabs,gallery, additionalsectionbehaviors).
- 1.2:
- Includes with cycle detection and base path policy.
- Unit tests live under each crate's
tests/directory. - The parser crate (
proofdown_parser) includes smoke tests and golden tests.
Location: crates/proofdown_parser/tests/cases_harness.rs
- The harness discovers case directories under
crates/proofdown_parser/tests/cases/. - Each case is a directory containing at least
input.pmland one of:expected.jsonif parsing should succeed, containing the AST JSON structureexpected_error.jsonif parsing should fail, containing the structured error JSON{ line, col, kind, msg }
Run all cases:
cargo test -p proofdown_parser --test cases_harnessUpdate or create goldens automatically (writes expected files):
UPDATE_GOLDEN=1 cargo test -p proofdown_parser --test cases_harnessAdd a new case:
- Create a new folder under
crates/proofdown_parser/tests/cases/<case_name>/. - Add
input.pmlwith the test document content. - Run with
UPDATE_GOLDEN=1to generateexpected.json(success) orexpected_error.json(error), or write them manually. - Re-run tests normally to verify.
Seeded examples include:
heading_basic,heading_no_spaceparagraph_multi_linecomponent_self_closing,component_nestedattrs_malformed_tokenerror_unterminated_component,error_mismatched_closelimits_depth_exceeded
Build and test the parser crate:
cargo build -p proofdown_parser
cargo test -p proofdown_parser#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub enum ErrorKind { Syntax, LimitExceeded }
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct ParseError { pub line: usize, pub col: usize, pub kind: ErrorKind, pub msg: String }Parser returns ParseError with best-effort line/col and kind on syntax and limit failures.
- CLI (
proofdown_cli):pml parse <file> [--json] [--pretty] [--limits.depth=N] [--limits.nodes=N] [--limits.input-size=BYTES](exit codes: 0 OK, 1 parse error, 3 IO/usage)pml validate <file> [--json] [--pretty] [--limits.depth=N] [--limits.nodes=N] [--limits.input-size=BYTES](exit codes: 0 OK, 1 parse error, 2 validation error, 3 IO/usage)- Top-level:
--help|-h,--version|-V
- WASM (
proofdown_wasm):wasm_parse(input: &str) -> Stringreturning a JSON string{ ok, doc|err }(enablewasmfeature; build forwasm32-unknown-unknown).
- Do not include this nested workspace as a member of the parent.
- Depend via
pathon inner crates from the parent. - Gate usage with features until crates stabilize.
# crates/provenance_ssg/Cargo.toml
[dependencies]
proofdown_parser = { path = "../proofdown_parser/crates/proofdown_parser", optional = true }
[features]
external_pml = ["proofdown_parser"]In code, gate parser usage:
#[cfg(feature = "external_pml")]
{
let doc = proofdown_parser::parse(pml_text)?;
// ... render ...
}MIT OR Apache-2.0 (see Cargo.toml).
- Owner: @veighnsche