Skip to content
Markus Demleitner edited this page Feb 4, 2015 · 5 revisions

Representing Structured Metadata and Data

(msdemlei)

This is probably fairly related to use cases 3, 4, and 9, but I'd like to make this explict.

In many parts of astronomy, there are sequences of complex (in the sense of not plain integers, floats, or strings) objects. We want to represent these. Note that sequences of atomic objects and single complex objects are contained by this as special cases.

Some examples:

Photometry Point
This is at least a value, bandpass, unit plus possibly errors, a zero point (with an error of its own, a type), all kinds of provenance-related information (waveband, aperture...), etc. (see the IVOA Photometry DM for an attempt to capture some of this); note that, for instance, multiple sets of errors (e.g., estimated in various ways) are conceivable.
A photometric time series or an SED
That's a sequence of photometry points (plus some global metadata)
A catalog of SEDs
That would be a sequence of such sequences (plus some additional global metadata)
Source extracted photmetry
When different methods of estimating magnitudes during source extraction (e.g., different apertures, PSF fitting, etc) are used, every object extracted will have several photometry points, which would be "in one row", if you will. The same structure (with a somewhat different metadata) results from multi-band photometry.
Astrometric reduction
A single line in a catalog contains multiple astrometric reductions. Each of these is an object having a position, its derivative, the associated errors, mean epoch, coordinate system metadata (reference system, reference position, epoch). Some pieces of information (e.g., the mean epoch) might be shared between instances). In case you're wondering: Here's a non-contrived table that contains such things.
Provenance
Provenance talks about how a piece of data was produced. There's W3C work on this. For what's at hand, the most important piece is the activitiy (for instance, stacking). This has parameters (for instance, the input images). So, to represent the provenance of a stacked image, we need to represent a sequence of activities, each of which contains a sequence of parameters, each of which is a compound of (something like) name, role, value, format (plus, for things that are not images things like unit, semantic tags, etc.). See also the IVOA Simulation DM, which contains similar modelling, except they add something like prototypes so the actual parameter type can be simpler.
Source references
In particular projects integrating data from many sources profit from machine-readable in-data declarations of what should be referenced when the data is used. In VO Resource, such references are modelled as pairs for a string and a format tag (bibcode, free text, doi) in vr:Source. In practice it turned out that one source per dataset was too restrictive, so again a dataset would have a sequence of complex objects.

As tables may have billions of rows, it'd be great if per-instance overhead of representing (presumably annotating) this kind of structure could be low to neglible.

Clone this wiki locally