Skip to content

otvam/scisave

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SciSave - Serialization for Scientific Data

Summary

SciSave is a Python serialization/deserialization module:

  • Specially targeted for scientific applications.
  • Load and write JSON/MessagePack/Pickle data files.
  • Load JSON/YAML configuration files.
  • Validate data with JSON schemas.

For YAML files, the following custom extensions are used:

  • Parse relative paths with respect to the YAML file (!path).
  • Include other YAML files into the YAML file (!include).
  • Evaluate a Python literal using literal_eval (!eval).
  • Substitute YAML strings with values from environment variables (!env).
  • Substitute YAML strings with values from a provided dictionary (!sub).
  • Merge a list of dicts (!merge_dict).
  • Merge a list of lists (!merge_list).

For JSON files, the following custom extensions are used:

  • Allows the serialization/deserialization of complex numbers (__complex__).
  • Allows the serialization/deserialization of NumPy arrays (__numpy__).
  • Allows the serialization/deserialization as/from text and gzip files

For MessagePack files, the following custom extensions are used:

  • Allows the serialization/deserialization of complex numbers (__complex__).
  • Allows the serialization/deserialization of NumPy arrays (__numpy__).

For JSON schemas, the following custom extensions are used:

  • Handling NumPy types (integer, floating, and complex).
  • Handling NumPy multidimensional arrays.

The following file extensions are used:

  • .yaml, .yml - for YAML files
  • .json, .js - for JSON text files
  • .gz, .gzip - for JSON gzip files
  • .mpk, .msg", .msgpack - for MessagePack files
  • .pck, .pkl, .pickle - for Pickle files

The JSON/YAML/MessagePack files with the custom extensions are still valid files. Pickle/MessagePack is typically faster than JSON for very large data files.

SciSave is written in Python (NumPy, PyYAML, msgpack, and jsonschema are the only dependencies). SciSave is respecting semantic versioning (starting from version 1.4).

Warning

  • Pickling data is not secure.
  • Only load pickle files that you trust.

Example

An example is located in the example folder of the repository:

  • run_data.py contains an example file for the loader/dumper
  • run_bench.py contains a simple benchmark for the different formats
  • config_main.yaml YAML configuration file with custom extensions
  • config_include.yaml YAML configuration file for include extension
  • config_schema.yaml YAML file containing the JSON schema definition
  • dump.json JSON text file for testing data dumping/loading
  • dump.gz JSON gzip file for testing data dumping/loading
  • dump.pickle Pickle file for testing data dumping/loading

Project Links

Author

Copyright

(c) 2023 - Thomas Guillod

BSD 2-Clause "Simplified" License

About

SciSave - Serialization for Scientific Data

Resources

License

Stars

Watchers

Forks

Packages

No packages published