This project contains source code in Python that implements the WCON data format for tracking data.
When the Python specification and the behavior of the Scala implementation differ, and it is not obvious which one is correct, the behavior of the Scala implementation should be presumed to be authoritative.
wcon
is registered with the Python Package Index, so just enter this command from any shell:
pip install wcon
Any problems? Visit the more detailed installation guide.
import wcon
# Open a worm file, convert it to canonical form, and save it
# (actually it's automatically converted to canon before
# saving, but here we do so explicitly)
w = wcon.WCONWorms.load_from_file('tests/minimax.wcon')
canon_w = w.to_canon
canon_w.save_to_file('test.wcon', pretty_print=True)
# From a string literal:
from io import StringIO
w2 = WCONWorms.load(StringIO('{"units":{"t":"s","x":"mm","y":"mm"}, '
'"data":[]}'))
# WCONWorms.load_from_file accepts any valid WCON, but .save_to_file
# output is always "canonical" WCON, which makes specific choices about
# how to arrange and format the WCON file. This way the functional
# equality of any two WCON files can be tested by this:
w1 = WCONWorms.load_from_file('file1.wcon')
w2 = WCONWorms.load_from_file('file2.wcon')
assert(w1 == w2)
# or:
w1.save_to_file('file1.wcon')
w2.save_to_file('file2.wcon')
import filecmp
assert(filecmp.cmp('file1.wcon', file2.wcon'))
w3 = w1 + w2 # Merge the two. An exception is raised if the data clashes
- Class
WCONWorms
- methods
load_from_file
- [class method]
- parameters:
JSON_path
, astr
, the path of the file
save_to_file
- parameters:
JSON_path
, astr
, the path of the filecompress_file
, a boolean, whether to compress the filepretty_print
, a boolean, whether to render the output on multiple lines
- parameters:
to_canon
- [property]
- returns: a copy of this object but in canonical form
__add__
- [use
+
] - Merges WCONWorms objects together. If the worm IDs or time periods are disjoint, or if the data agrees, this method works. If not, an exception is thrown.
- [use
__eq__
- [use
==
] - Return a boolean indicating whether two WCONWorms objects are the same, after conversion of all quantities to canonical units. Compares both data and metadata.
- [use
- attributes
units
: dict- May be empty, but is never None since 'units' is required to be specified.
metadata
: dict- If 'metadata' was not specified, metadata is None.
- The values in this dict might be nested into further dicts or other data types.
data
: Pandas DataFrame or None- If 'data' was not specified, data is None.
- All the worms are merged into one DataFrame
- Expensive and memory-intensive for objects containing many worms, so try to avoid using this. Use
data_as_odict
instead.
- [Note:
files
, if present in the input, is not persisted unless the.load
factory method is used.] num_worms
: intworm_ids
: listdata_as_odict
: OrderedDict of pandas DataFrames, keyed by worm ID- This is the native representation of the data in this object, and thus the fastest to load. Try to use this instead of
data
.
- This is the native representation of the data in this object, and thus the fastest to load. Try to use this instead of
- methods
- Class
MeasurementUnit
- Note: this class does not need to be used publicly, but it can be if desired.
- consequently it can be omitted from a public API
- methods
create
- [class method]
- Factory method
- parameter:
unit_string
, astr
, the unit expression (e.g."mm"
or"cm/s"
or"C"
) - returns: an instance of this class
to_canon
- transforms
v
from original units to canonical units - parameter:
v
(afloat
) - returns: float
- transforms
from_canon
- the inverse of
to_canon
- the inverse of
- attributes
unit_string
: str- The original string (e.g.
"m/s^2"
)
- The original string (e.g.
canonical_unit_string
: str- The canonical form for all units within the original string (e.g.
"mm/s^2"
)
- The canonical form for all units within the original string (e.g.
- Note: this class does not need to be used publicly, but it can be if desired.
Any top-level key other than the basic:
- files
- units
- metadata
- data
...are ignored. It is convenient, but not required, to follow the convention of beginning custom fields with the prefix "@"
. Handling custom objects requires subclassing WCONWorms
.
Thanks to the Python libraries json
and jsonschema
, it is relatively trivial to parse and validate a WCON file. Here's an example of how one might accomplish this, without even using the wcon
package:
import json, jsonschema
# The WCON schema
with open("wcon/wcon_schema.json", "r") as wcon_schema_file:
schema = json.loads(wcon_schema_file.read())
# Our example WCON file
JSON_path = '../../tests/minimax.wcon'
with open(JSON_path, 'r') as infile:
serialized_data = infile.read()
# Load the whole JSON file into a nested dict.
w = json.loads(serialized_data)
# Validate the raw file against the WCON schema
jsonschema.validate(w, schema)
With the above code we end up with a nested dictionary w
containing everything that was serialized in the minimax.wcon
file.
Using this wcon
Python package, something similar can be accomplished:
import wcon
w = wcon.WCONWorms.load_from_file('../../tests/minimax.wcon')
Here, instead of being a nested dictionary, w
is a WCONWorms
object that is more powerful. Here are some of the additional things that can be accomplished with the WCONWorms
object:
- The WCON file is validated not just against the WCON schema, but also to ensure units are valid, that every data key has a corresponding unit, and that every data segment has "aspects" of the same length. (e.g. if a skeleton at time
1.3
has 45x
-coordinates, it should also have 45y
-coordinates. This condition is not expressible in a JSON schema but it is validated programatically by the WCONWorms initializer. - Units are expressed as
MeasurementUnit
objects, which can be compared with other such objects, to verify that "mm" and "millimetres" refer to the same units, for instance. (see the below section for more details) - WCONWorms objects can have their data be converted into canonical units, and then saved again.
- WCONWorms objects can be loaded from multiple files and combined together, via the
"files"
object. - Worm data recorded in multiple "tracks", or elements, in the
"data"
object, can have such tracks merged. - Worm data can be extracted in a Pandas DataFrame format for easier downstream processing, since the dimensions of the data have been placed into one two-dimensional array, rather than in a nested array.
- WCONWorms can be subclassed by labs implementing "special features", in two places: ("type 1") top-level objects starting with
"@"
or ("type 2") objects within individual"data"
array items starting with"@"
.
The WCON format requires a "units"
object, where you specify in what units your quantities are being measured. WCONParser
represents these units internally as MeasurementUnit
objects. With MeasurementUnit
, you can convert from any supported unit expression to the canonical one:
>>> MeasurementUnit.create('m')
MeasurementUnit, original form: 'm' canonical form: 'mm'
>>> u = MeasurementUnit.create('m')
>>> u.to_canon(1)
1000.0
>>> u.from_canon(100)
0.1
>>> u = MeasurementUnit.create('F')
>>> u.to_canon(72)
22.222222222222221
>>> u = MeasurementUnit.create('m/min')
>>> u.canonical_unit_string
'mm/s'
>>> u.to_canon(5)
83.33333333333334
>>> u = MeasurementUnit.create('m^2')
>>> u.to_canon(1)
1000000.0
You can also check the equality of various unit expressions. For example, all of these expressions will evaluate to True
:
MeasurementUnit.create('mm') == MeasurementUnit.create('millimetre')
MeasurementUnit.create('Mm') == MeasurementUnit.create('megametre')
MeasurementUnit.create('mm') != MeasurementUnit.create('Mm')
SI and non-SI units not in the WCON specification are also permitted; they must be prefixed by "@". No further processing is performed on such units ("to_canon" always returns the same unit string).
MeasurementUnit.create('@counts')
MeasurementUnit.create('@intensity')
MeasurementUnit.create('@degrees')
MeasurementUnit.create('@radians')