Skip to content
Miguel de Val-Borro edited this page Oct 17, 2014 · 7 revisions

Machine understanding of data format

(brian)

For any code which receives and uses a great deal of data from sources which may not be known at the time the software was created (e.g. as may be the case for any publicly distributed or service application), it is imperative that it be possible for the software to handle problems and route data to internal or downstream handlers without special case programming and/or human intervention. Problems which may occur include things such as being passed incomplete or incorrect data. Possibly the data being passed contain more information than is expected, but is otherwise complete.

Robust handling of the data stream amounts to a need for "machine understanding". Understanding means being able to know what kind of data is being held (data models) and whether the instance is "valid". Validity may be construed to mean whether the instance is "complete" (all mandatory fields populated), is "syntactically valid" (all populated fields have the right data types) and is "semantically valid" (all of the data models declared by the instance are complete).

Furthermore, awareness of data models includes the ability to "know" that some models are "unknown"/not usable by the software. This could be whole data models which are unknown as well as having "unknown" structures in known models. In either case, this functionality allows software to adequately analyze the data received and to "know" how to proceed with processing.

Clone this wiki locally