Toward a privacy-preserving predictive foundation model of single-cell transcriptomics with federated learning and tabular modeling
Jiayuan Ding, Jianhui Lin, Shiyu Jiang, Yixin Wang, Ziyang Miao, Zhaoyu Fang, Jiliang Tang, Min Li, Xiaojie Qiu
https://www.biorxiv.org/content/10.1101/2025.01.06.631427v1
A privacy-preserving predictive foundation model for single-cell transcriptomics, leveraging federated learning and tabular modeling.
Tabula/
├── resource/
│ ├── dataset/ # Processed pretrian datasets
│ ├── finetune_framework_x.yaml # The configuration of downstream task
│ ├── vocab.json # Genetic vocabulary
├── tabula/
│ ├── downstream/ # Downstream task implementations
│ ├── model/
│ │ ├── encoding/ # Single-cell data embedding
│ │ ├── transformer/ # Transformer backbone
│ ├── loss.py # Training loss
│ ├── training/ # Pre-training
│ │ ├── config.py # Configuration
│ │ ├── data_loader.py # Multi-client data loader
│ │ ├── federater.py # Federated framework
│ │ ├── pretrainer.py # PyTorch Lightning training framework
├── tests/ # Unit tests
├── tutorials/ # Usage examples for downstream task
├── requirements.txt # Python dependencies
├── README.md # Project description file
└── LICENSE
- CUDA >= 11.7
- Python >= 3.9
- flash-attn >= 2.3.5
- mpi4py >= 3.1.4
- Required dependencies are listed in requirements.txt
Clone the repository:
$ git clone this-repo-url
$ cd tabula
Create your conda conda environment:
$ conda install -n tabula python=3.9
Install the torch:
$ pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
Install dependencies:
$ pip install -r requirements.txt
To install mpi4py, follow these steps:
$ conda install mpi4py==3.1.4
To install flash-attention2, follow these steps, (For more information, check out flash-attention):
$ MAX_JOBS=4 pip install flash-attn==2.3.5 --no-build-isolation
Please see our example code in tutorials.
- Follow feature-staging-main review process
- create a specific branch for new feature
- implement and test on your branch; add unit tests
- create pull request
- discuss with lab members and merge into the main branch once all checks pass
- Follow python Google code style
- File and function docstrings should be written in Google style
- We use
black
to automatically format code in a standardized format. To ensure that any code changes are up to standard, usepre-commit
as such.
# Run the following two lines ONCE.
$ pip install pre-commit
$ pre-commit install
$ pre-commit run --all-files
Then, all future commits will call black
automatically to format the code. Any code that does not follow the standard will cause a check to fail.
For questions, feedback, or collaboration opportunities, please contact Xiaojie Qiu at [email protected] and Jiayuan Ding at [email protected].