-
Notifications
You must be signed in to change notification settings - Fork 59
ASDF Design Overview
This page provides a high-level summary of the current design of the asdf
package. It is intended to serve as a guide for future maintainers.
Any ASDF implementation must be responsible for many aspects of parsing and creating ASDF files. When reading files, the following high-level actions must occur:
- Find and verify ASDF-specific header information
- Parse YAML tree
- Find and load core schema
- Validate the YAML content using available schemas
- Resolve schema references
- Find and parse the block index if it exists
- Find block segment if it exists
- Parse block headers and load blocks on request
- Properly handle and close all IO resources (including memory maps) when no longer needed
asdf
uses the generic_io
submodule to provide an abstraction layer for various IO resources (e.g. file on disk, network resource, IO stream, etc.). Files are opened using the generic_io.get_file
method, from which is returned a GenericFile
instance. This object can then be used to read the contents of the file, which is returned as a bytes
object.
asdf
checks for the presence of the header magic value at the beginning of the file. If this can't be found, then parsing is aborted. It also attempts to determine the file format version based on the header comment lines.
header_line = fd.read_until(b'\r?\n', 2, "newline", include=True)
self._file_format_version = cls._parse_header_line(header_line)
self.version = self._file_format_version
(link)
It then looks for the beginning of the YAML content. It parses this content and creates the tree (more details on this in the next section).
yaml_token = fd.read(4)
tree = {}
has_blocks = False
if yaml_token == b'%YAM':
reader = fd.reader_until(
constants.YAML_END_MARKER_REGEX, 7, 'End of YAML marker',
include=True, initial_content=yaml_token)
# For testing: just return the raw YAML content
if _get_yaml_content:
yaml_content = reader.read()
fd.close()
return yaml_content
# We parse the YAML content into basic data structures
# now, but we don't do anything special with it until
# after the blocks have been read
tree = yamlutil.load_tree(reader, self, self._ignore_version_mismatch)
has_blocks = fd.seek_until(constants.BLOCK_MAGIC, 4, include=True)
elif yaml_token == constants.BLOCK_MAGIC:
has_blocks = True
elif yaml_token != b'':
raise IOError("ASDF file appears to contain garbage after header.")
(link)
After parsing the YAML content, asdf
looks to see whether any binary data blocks are present, and whether a block index is present. It does not load the data blocks yet, however.
if has_blocks:
self._blocks.read_internal_blocks(
fd, past_magic=True, validate_checksums=validate_checksums)
self._blocks.read_block_index(fd, self)
(link)
asdf
uses a standard yaml
implementation for writing the metadata tree. However, it implements custom dumper and loader classes in order to enable the tagging of custom types. The dumper and loader are fairly straightforward and involve the overriding of the represent_data
and construct_object
methods, respectively.
class AsdfDumper(_yaml_base_dumper):
"""
A specialized YAML dumper that understands "tagged basic Python
data types" as implemented in the `tagged` module.
"""
def __init__(self, *args, **kwargs):
kwargs['default_flow_style'] = None
super().__init__(*args, **kwargs)
def represent_data(self, data):
node = super(AsdfDumper, self).represent_data(data)
tag_name = getattr(data, '_tag', None)
if tag_name is not None:
node.tag = tag_name
return node
(link)
class AsdfLoader(_yaml_base_loader):
"""
A specialized YAML loader that can construct "tagged basic Python
data types" as implemented in the `tagged` module.
"""
ignore_version_mismatch = False
def construct_object(self, node, deep=False):
tag = node.tag
if node.tag in self.yaml_constructors:
return super(AsdfLoader, self).construct_object(node, deep=False)
data = _yaml_to_base_type(node, self)
tag = self.ctx.type_index.fix_yaml_tag(
self.ctx, tag, self.ignore_version_mismatch)
data = tagged.tag_object(tag, data)
return data#
(link)
When reading the YAML tree, custom types are not immediately converted. Instead, each basic node in the parsed YAML tree (consisting of scalar types, strings, lists, and dicts) is tagged by adding an attribute to the node. This is done by asdf.tagged.tag_object
:
def tag_object(tag, instance, ctx=None):
"""
Tag an object by wrapping it in a ``Tagged`` instance.
"""
if isinstance(instance, Tagged):
instance._tag = tag
elif isinstance(instance, dict):
instance = TaggedDict(instance, tag)
elif isinstance(instance, list):
instance = TaggedList(instance, tag)
elif isinstance(instance, str):
instance = TaggedString(instance)
instance._tag = tag
else:
from . import AsdfFile, yamlutil
if ctx is None:
ctx = AsdfFile()
try:
instance = yamlutil.custom_tree_to_tagged_tree(instance, ctx)
except TypeError:
raise TypeError("Don't know how to tag a {0}".format(type(instance)))
instance._tag = tag
return instance
(link)
Before actually converting the tagged YAML tree to a tree containing custom types, asdf
performs schema validation.