Welcome to the repository dedicated to training deep learning models for variable-length time series data extracted from electronic health records, with a focus on predicting clinical outcomes. If you're interested in understanding the details of this work, you can explore the accompanying paper here-TBD.
- Python3
- For python dependencies, see requirements.txt
- R libraries:
- rms
- pROC
Follow the following steps for running the code on your data:
1. Clone the Repository
After cloning the repository, navigate to the project folder using the command line:
cd path/to/project/directory
2. Install Dependencies
Ensure you have all the required dependencies by executing the following command:
pip install -r requirements.txt
3. Dataset Formatting
Convert your dataset into a format compatible with the general pipeline:
python Code/Checkup_routines/examine_datasets.py
For detailed information and options, refer to the Checkup Routines README.
4. Run the Pipeline
Execute the pipeline by running the following command:
python Code/VL020_train.py -m Model -t path/to/train/csv/file -e path/to/evaluation/csv/file -v path/to/validation/csv/file
Explore additional options in the Important Notes section for a comprehensive understanding of available functionalities.
Here are important notes to consider before running the code:
-
For detailed information about input arguments, type the following command in the command line:
python Code/VL020_train.py -h
-
Before running the code, review and edit the config dictionary defined in
VL099_utils.py
as needed. -
Training and Test Datasets:
- Provide datasets in two ways:
- Direct the code to [.csv] files using
-t path/to/train/csv/file
and-e path/to/evaluation/csv/file
arguments. - Direct the code to [.pkl] objects containing derivation and evaluation data objects. Ensure that config['save_pkl'] is set to True during the previous saving.
- Direct the code to [.csv] files using
- Provide datasets in two ways:
-
Validation Dataset:
- Provide the validation dataset in two ways:
- Specify a [.csv] file using the
-v path/to/valid/csv/file
input argument. - Leave the
-v
input argument empty to enable the code to split the training dataset into training and validation datasets based on the validation_p key in the config dictionary.
- Specify a [.csv] file using the
- Provide the validation dataset in two ways:
-
Data Pre-processing:
- Select the data pre-processing method in the config dictionary from the list below:
- "norm" -> Normalization: Transforming data to the range [0, 1].
- "std" -> Standardization: Adjusting data to have zero mean and unit variance.
- "ple-dt" -> Piece-wise Linear Encoding with Decision Trees (PLE-DT): A specialized encoding technique.
- Select the data pre-processing method in the config dictionary from the list below:
-
Models:
- Select the model architecture you want to train with the
-m
input argument from the list below:- "lstm" -> LSTM/GRU
- "tdcnn" -> TDW-CNN
- "tcn" -> TCN
- Select the model architecture you want to train with the
-
Resuming Training:
- Resume an unfinished training pipeline from the last stored checkpoint by directing the code to the log directory using the config['log_path'] and config['log_folder'] keys.
-
RTDL Remote Repository:
- The RTDL remote repository is required for piecewise linear encoding of predictor variables. Download it here.
Feel free to reach out if you have any questions or need assistance. Happy coding!
Our code is licensed under a GPL version 3 license (see the license file for detail).
Please view our publication on JAMIA:
If you find our project useful, please consider citing our work:
@article{
}