This is the official pytorch code implementation of Lead-agnostic Self-supervised Learning for Local and Global Representations of Electrocardiogram.
Before training the model, please follow these instructions to install fairseq-signals and prepare required datasets.
Before training, you should prepare training data manifest required for training CMSC model.
$ python /path/to/fairseq_signals/data/ecg/preprocess/convert_to_cmsc_manifest.py \
/path/to/pretrain/train.tsv \
--dest /path/to/manifest
The expected results are like:
/path/to/manifest
├─ cmsc
│ └─ train.tsv
├─ cmlc
│ └─ train.tsv
└─ cmsmlc
└─ train.tsv
This configuration was used for the W2V+CMSC+RLM
model pre-trained on the PhysioNet2021
dataset in the original paper.
$ fairseq-hydra-train \
task.data=/path/to/manifest/cmsc \
--config-dir examples/w2v_cmsc/config/pretraining \
--config-name w2v_cmsc_rlm
$ fairseq-hydra-train \
task.data=/path/to/manifest/finetune \
model.model_path=/path/to/checkpoint.pt \
--config-dir examples/w2v_cmsc/config/finetuning \
--config-name diagnosis
If you want to use CinC score as an evaluation metric, add command line parameters (before --config-dir
)
criterion.report_cinc_score=True criterion.weights_file=/path/to/weights.csv
Note that you can download weights.csv
file from here.
$ fairseq-hydra-train \
task.data=/path/to/manifest/identify \
model.model_path=/path/to/checkpoint.pt \
model.num_labels=$N \
--config-dir examples/w2v_cmsc/config/finetuning \
--config-name identification
$N
should be set to the number of unique patients in the training dataset. You can manually open /path/to/manifest/identify/train.tsv
file and check the last line of that file. For example, if the last line is like *.mat 2500 69977
, then $N
should be set to 69978
.
Note that if you want to train with PhysioNet2021 dataset and test with PTB-XL dataset, prepare data manifest for PhysioNet2021 with $valid=0
and PTB-XL with $valid=1.0
seperately and place them to the same manifest directory like this:
path/to/manifest/identify
├─ train.tsv
├─ valid_gallery.tsv
└─ valid_probe.tsv
Note: valid_*.tsv
should have been from PTB-XL dataset while train.tsv
should have been from PhysioNet2021 dataset.
Please cite as:
@inproceedings{oh2022lead,
title={Lead-agnostic Self-supervised Learning for Local and Global Representations of Electrocardiogram},
author={Oh, Jungwoo and Chung, Hyunseung and Kwon, Joon-myoung and Hong, Dong-gyun and Choi, Edward},
booktitle={Conference on Health, Inference, and Learning},
pages={338--353},
year={2022},
organization={PMLR}
}