The FHIR Crawler is a Node.js package that allows you to crawl and extract data from FHIR servers. It is designed to be flexible and configurable alternative to FHIR Bulk Data exports, and can handle large volumes of data.
Docker or NodeJS 16+ and Git
You don't need to install anything if you plan to use the Docker version. Otherwise do:
git clone https://github.com/smart-on-fhir/fhir-crawler.git
cd fhir-crawler
npm i
First you will need to choose a folder to work with. It will contain your configuration, as well as the downloaded data. We call that the "volume" folder. You will pass the path to this folder as an argument when you run the script.
Warning: The volume folder will contain configuration secrets as well as downloaded PHI! Please make sure it is protected. Do not create a folder within the project, or any other git-controlled directory to make sure that PHI will not end up in git history and that it will not be pushed to any remote repository.
Before using this tool, a configuration file named config.js
must be created in the "volume" folder described above. An easy way to start is to make a copy of the provided example:
cp example-config.js my/folder/config.js
Then edit that config file and enter your settings. Read the comments in the file for further details about each option.
- In v1 the config file could have any name. The path to it was given to the script
via
-c
parameter. In v2 that file must be calledconfig.js
, so start by renaming it. - The config file is now
.js
instead of.ts
. To switch:- Remove type imports like
import { Config } from "../src/types"
- Switch to CommonJS exports. For example, use
module.exports = { ... }
instead ofexport default { ... }
- Remove type imports like
- The example config file is now converted to JS so you can see the difference
- Pick (or create) a "volume" folder. The script will load config from there. It will also write output files to it.
- Place/move your
config.js
file into that "volume" folder. - That should be it. Run it with
- Direct:
npm start -- -p /path/to/volume/
- Docker:
docker run -v /path/to/volume/:/app/volume/ smartonfhir/fhir-crawler
- Direct:
- Running it directly
cd
into thefhir-crawler
folder and run:# /path/to/volume/ is the folder containing your config file. It will also receive the downloaded data npm start -- -p /path/to/volume/
- Running it with Docker
# /path/to/volume/ is the folder containing your config file. It will also receive the downloaded data docker run -it -v /path/to/volume/:/app/volume/ smartonfhir/fhir-crawler
This script does two major things. First it downloads all the patients in a given group using Bulk Data Group export, then it download the specified resources associated with these patients using a standard FHIR API calls. In some cases people may need to re-run the crawler but skip the patient download part. To achieve this do the following:
- After successful export locate the patient files. There should be one or more files with
names like
1.Patient.ndson
at/path/to/volume/output/
. - Copy those patient files one level up outside of that
output
folder (because everything inoutput
will be deleted before every run) - On the next run pass the patient file names as
--patients
argument. Example:You can do the same using Docker:npm start -- -p /path/to/volume/ --patients 1.Patient.ndson --patients 2.Patient.ndson
docker run -it -v /path/to/volume/:/app/volume/ smartonfhir/fhir-crawler --patients 1.Patient.ndjson
The script will display some basic stats in the terminal, and will also generate two log files within the output folder (where the NDJSON files are downloaded):
error_log.txt
contains any errors encountered while exporting data. Those errors are more or less unpredictable, thus the log is in plain text format.request_log.tsv
contains information about HTTP requests and responses. These logs have a predictable structure so the TSV format was chosen to make them easier to consume by both humans and spreadsheet apps.
Contributions to the FHIR Crawler project are welcome and encouraged! If you find a bug or have a feature request, please open an issue on the project's GitHub page. If you want to contribute code, please fork the project, create a new branch for your changes, and submit a pull request when you're ready.
Before submitting a pull request, please make sure that your code passes the project's tests by running npm test.
The FHIR Crawler is licensed under the Apache License, Version 2.0. See the LICENSE file for more information.