https://transformer-models.s3.amazonaws.com/2019n2c2_tack1_roberta_pt_stsc_6b_16b_3c_8c.zip
- Python 3.7.3
- Pytorch 1.1.0
- Transformers 2.5.1
General corpus: Semantic Textual Similarity Benchmark dataset from GLUE Benchmark download
Clinical corpus: 2019 N2C2 Challenge Track 1 website
- Preprocess clinical dataset
python preprocess/prepro.py \
--data_dir=path/to/clinical_sts_dataset \
--output_dir=dir/to//output/clinical_sts_dataset
- Generate datasets for five fold cross validation
python preprocess/cross_valid_generate.py \
--data_dir=path/to/processed_dataset \
--output_dir=dir/to/output
Training and prediction processes are provided in the following scripts:
single.sh Using a single model
ensemble.sh Using multi-model ensemble
Use the script cv_eval.sh to get the best hyperparameters (batch size and epoch number) based on the results of 5 fold cross validation.
--input_dir path directory containing the results of 5 fold cross validation
--output_dir path directory to output the evaluation result
Theoretically support all models in https://huggingface.co/transformers/pretrained_models.html. However, we only used Bert, Roberta and XLNet in this task.
- please cite our paper:
Yang X, He X, Zhang H, Ma Y, Bian J, Wu Y
Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models
JMIR Med Inform 2020;8(11):e19735
DOI: 10.2196/19735