This repository includes pipelines to transform data from a FHIR server (like HAPI, GCP FHIR store, or even OpenMRS) using the FHIR format into a data warehouse based on Apache Parquet files, or another FHIR server. There is also a query library in Python to make working with FHIR-based data warehouses simpler.
These tools are intended to be generic and eventually work with any FHIR-based data source and data warehouse. Here is the list of main directories with a brief description of their content:
-
pipelines/ *START HERE*: Batch and streaming pipelines to transform data from a FHIR-based source to an analytics-friendly data warehouse or another FHIR store.
-
docker/: Docker configurations for various servers/pipelines.
-
doc/: Documentation for project contributors. See the pipelines README and wiki for usage documentation.
-
utils/: Various artifacts for setting up an initial database, running pipelines, etc.
-
dwh/: Query library for working with distributed FHIR-based data warehouses.
-
bunsen/: A fork of a subset of the Bunsen project which is used to transform FHIR JSON resources to Avro records with SQL-on-FHIR schema.
-
e2e-tests/: Scripts for testing pipelines end-to-end.
NOTE: This was originally started as a collaboration between Google and the OpenMRS community.