-
Notifications
You must be signed in to change notification settings - Fork 4
Importing data into Exetera
clyyuanzi-london edited this page Jun 28, 2021
·
8 revisions
The ExeTera allows you to import data from CSV sources into HDF5, a columnar data format more suited to performing analytics.
Example:
exetera import
-s path/to/covid_schema.json \
-i "patients:path/to/patient_data.csv, assessments:path/to/assessmentdata.csv, tests:path/to/covid_test_data.csv, diet:path/to/diet_study_data.csv" \
-o /path/to/output_dataset_name.hdf5
--include "patients:(id,country_code,blood_group), assessments:(id,patient_id,chest_pain)"
--exclude "tests:(country_code)"
-
-s/--schema
: The location and name of the schema file -
-te/--territories
: If set, this only imports the listed territories. If left unset, all territories are imported -
-i/--inputs
: A comma separated list of 'name:file' pairs. This should be put in parentheses if it contains any whitespace. See the example above. -
-o/--output_hdf5
: The path and name to where the resulting hdf5 dataset should be written -
-ts/--timestamp
: An override for the timestamp to be written (defaults todatetime.now(timezone.utc)
) -
-w/--overwrite
: If set, overwrite any existing dataset with the same name; appends to existing dataset otherwise -
-n/--include
: If set, filters out all fields apart from those in the list. -
-x/--exclude
: If set, filters out the fields in this list.
Example:
importer.import_with_schema(timestamp, output_hdf5_name, schema, tokens, args.overwrite, include_fields, exclude_fields)
See the wiki for detailed examples of how to interact with the hdf5 datastore.