Skip to content

Prediction of long COVID from proteomic and clinical data

License

Notifications You must be signed in to change notification settings

BorgwardtLab/LongCOVID

Repository files navigation

LongCOVID

A repository to share code for long COVID predictions based on a random forest classifier as well as a univariate association analysis of proteomic features to longCOVID labels.

Requirements

python 3.7.4 scikit-learn 1.1.3 pandas 1.5.2 scipy 1.9.3 numpy 1.23.5 shap 0.41.0 statsmodels 0.13.5 openpyxl 3.0.10

Required input data

The input data required to execute these scripts can be obtained from image. Please include these in a folder Data. This should comprise:

  • Proteomics_Clinical_Data_220902_Acute_plus_healthy_v5.xlsx
  • Proteomics_Clinical_Data_220902_6M_timepoint_v4.xlsx
  • Proteomics_Clinical_Data_220902_Labels_v2.xlsx
  • Table S2 Biological protein cluster compositions.xlsx

Execution

We provide the data splits used in partitions. Relevant label dictionaries need to be generated based on the label data file listed above. Run the file prediction_RF.py to generate model predictions, association analysis either for individual proteomic features, or clusters thereof can be obtained using associationAnalysis.py, and associationClusters.py respectively. In combineInterpreations.py we combine the SHAP analysis results of multiple cross validation folds.

Acknowledgements

This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 813533 (K.B.).

About

Prediction of long COVID from proteomic and clinical data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages