Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an XSD #273

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "xml-schemas/ebu-tt-m-xsd"]
path = xml-schemas/ebu-tt-m-xsd
url = https://github.com/ebu/ebu-tt-m-xsd.git
35 changes: 35 additions & 0 deletions schema-validator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# DAPT XSD Validator

Basic command line utility for validating DAPT documents using
the XML Schema Definition in the w3c/dapt repository.

This script uses the MIT licensed [`xmlschema`](https://github.com/sissaschool/xmlschema) library.

This script is provided as-is with no warranties of any kind.
The repository's `LICENSE.md` applies, with the contents of this folder
being considered a _code example_.

## Build

1. Install poetry - [installation instructions](https://python-poetry.org/docs/#installing-with-the-official-installer)
2. Ensure you have a version of Python greater than or equal to 3.11 available
for example with the command `poetry env use 3.11`
3. Install the dependencies by running `poetry install`

## Usage

```sh
poetry run validate -dapt_in path/to/dapt_file.ttml
```

or pass the document for validating in via stdin, e.g.:

```sh
poetry run validate < path/to/dapt_file.ttml
```

## Tests

```sh
poetry run python -m unittest
```
88 changes: 88 additions & 0 deletions schema-validator/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

24 changes: 24 additions & 0 deletions schema-validator/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[tool.poetry]
name = "dapt-xsd-val"
version = "0.1.0"
description = "Thin wrapper around xmlschema to support XSD validation using the supplied DAPT XSD"
authors = ["Nigel Megitt <[email protected]>"]
readme = "README.md"
packages = [
{ include = "src" },
{ include = "test" },
]

[tool.poetry.dependencies]
python = ">=3.11"
xmlschema = "^3.4.3"

[tool.poetry.group.dev.dependencies]
flake8 = "^7.1.2"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

[tool.poetry.scripts]
validate = "src.validate:main"
Empty file.
68 changes: 68 additions & 0 deletions schema-validator/src/validate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
import os
import sys
import argparse
import xmlschema
import logging

logging.getLogger().setLevel(logging.INFO)


schema_path = os.path.normpath(
os.path.join(
os.path.dirname(__file__),
'../../xml-schemas/dapt.xsd')
)
metadata_items_schema_path = os.path.normpath(
os.path.join(
os.path.dirname(__file__),
'../../xml-schemas/ttml2-metadata-items.xsd')
)


def validate_dapt(args):
# xmlschema gets baffled following the import of metadata_items,
# so make it load it explicitly instead, which seems to work.
schema_paths = [schema_path, metadata_items_schema_path]
logging.info('Creating schema from XSDs at {}'.format(schema_paths))
schema = xmlschema.XMLSchema(schema_paths)
schema.build()
if schema.validity:
logging.info('Schemas are valid')
else:
logging.error('Schemas are not valid, exiting early')
return -1

try:
logging.info('Validating document at {}'.format(args.dapt_in.name))
schema.validate(args.dapt_in)
except xmlschema.XMLSchemaValidationError as valex:
logging.error(str(valex))
logging.error('Document is not valid.')
return -1

logging.info(
'Document is syntactically valid with respect to the '
'DAPT XML Schema Definition; '
'this does not check all semantic requirements of the '
'DAPT specification.')
return 0


def main():
parser = argparse.ArgumentParser()

parser.add_argument(
'-dapt_in',
type=argparse.FileType('rb'),
default=sys.stdin, nargs='?',
help='Input DAPT file to validate',
action='store')
parser.set_defaults(func=validate_dapt)

args = parser.parse_args()
return args.func(args)


if __name__ == "__main__":
# execute only if run as a script
sys.exit(main())
Empty file.
31 changes: 31 additions & 0 deletions schema-validator/test/fixtures/valid_dapt.ttml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
<?xml version="1.0" encoding="UTF-8"?>
<tt xmlns="http://www.w3.org/ns/ttml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:tta="http://www.w3.org/ns/ttml#audio"
xmlns:ttm="http://www.w3.org/ns/ttml#metadata"
xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"
xmlns:ebuttm="urn:ebu:tt:metadata"
ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"
daptm:scriptRepresents="audio"
daptm:scriptType="originalTranscript"
xml:lang="en">
<head>
<metadata>
<ttm:agent type="person" xml:id="actor_A">
<ttm:name type="full">Matthias Schoenaerts</ttm:name>
</ttm:agent>
<ttm:agent type="character" xml:id="character_2">
<ttm:name type="alias">BOOKER</ttm:name>
<ttm:actor agent="actor_A"/>
</ttm:agent>
</metadata>
</head>
<body>
<div xml:id="se1" begin="3s" end="10s" ttm:agent="character_2" daptm:represents="audio.dialogue" daptm:onScreen="ON">
<ttm:desc daptm:descType="scene">high mountain valley</ttm:desc>
<metadata></metadata>
<p daptm:langSrc="en"><span>Look at this beautiful valley.</span></p>
</div>
</body>
</tt>
21 changes: 21 additions & 0 deletions schema-validator/test/test_validate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import os
import unittest
from argparse import Namespace
from src.validate import validate_dapt

fixture_dur = \
os.path.join(
os.path.dirname(__file__),
'fixtures')


class testValidate(unittest.TestCase):
maxDiff = None

def testValidFile(self):
with open(
os.path.join(fixture_dur, 'valid_dapt.ttml'), newline='') \
as dapt_file:
result = validate_dapt(args=Namespace(dapt_in=dapt_file))

self.assertEqual(result, 0)
47 changes: 47 additions & 0 deletions xml-schemas/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# DAPT XSD Readme

The [DAPT](https://www.w3.org/TR/dapt/) XML Schema is provided as an informative (i.e. non-normative) addition
to the specification as an aid to implementers.

The XML Schema is provided as an [XSD 1](https://www.w3.org/TR/xmlschema-1/) (XML Schema definition language) document.

## Usage

An XSD 1 conformant XML validator should be able to validate a DAPT document against the top level [`dapt.xsd`](dapt.xsd)
schema document.

As a convenience, a Python script is provided to allow use of the XSDs to validate
a DAPT document using only open source libraries.

## Design

The DAPT XSD is designed to be inclusive of all content predicted from the DAPT specification,
rather than defining attributes and elements from the DAPT specification
and relying on external XSDs for other namespace vocabulary.
This also permits DAPT-specific constraints, such as the prohibition of the `<animation>` element,
or the requirement for the `daptm:scriptRepresents` attribute on the root `<tt>` element,
can be applied.

### Sources

Structurally, much of the XSD consists of a copy of the [TTML2 XSD](https://github.com/w3c/ttml2/tree/main/spec/xsd), though in some cases changes have been
made to represent those additional DAPT constraints.
This means that if the TTML2 XSD changes, there could be a maintenance task to update the DAPT XSD.
However it also simplifies usage.

Additionally, DAPT namespaces and DAPT-specific data types are defined in imported files whose name is prefixed `dapt-`.

EBU-TT Metadata is imported via a git submodule pointed at the XSD 1 version of the
[EBU-TT Metadata schema](https://github.com/ebu/ebu-tt-m-xsd/tree/issue-0030-schema-v1).

### Type restrictions

Two mechanisms are used to enforce DAPT-specific type restrictions:

1. DAPT-specific complex type definitions with `xs:complexContent` containing `xs:restriction` based on the TTML2 type.
This method is used where attributes permitted on the TTML2 type are prohibited in DAPT, and/or when additional
DAPT-specific attributes need to be permitted as extensions to the TTML2 element.
The relevant element definitions then point to the restricted DAPT type rather than the base TTML2 type.
2. Edits to remove DAPT-prohibited elements or attribute groups from TTML2 types - where this
pattern is used, XML comments highlight the change in the XSD file.

23 changes: 23 additions & 0 deletions xml-schemas/dapt-datatypes.xsd
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3.org/ns/ttml/profile/dapt#datatype"
xmlns:daptd="http://www.w3.org/ns/ttml/profile/dapt#datatype">
<xs:simpleType name="contentDescriptorType">
<xs:annotation>
<xs:documentation>
descriptor-token ( descriptor-delimiter descriptor-token )*

descriptor-token
: (descriptorTokenChar)+

descriptorTokenChar # xsd:NMtoken without the "."
: NameStartChar | "-" | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

descriptor-delimiter
: "." # FULL STOP U+002E
</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string"/>
</xs:simpleType>

</xs:schema>
Loading