Skip to content

Latest commit

 

History

History
62 lines (44 loc) · 2.36 KB

README.rst

File metadata and controls

62 lines (44 loc) · 2.36 KB

Scooby

image

Documentation Status

Code for the scooby manuscript. Scooby is the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome at single-cell resolution. For this, it leverages the pre-trained multi-omics profile predictor Borzoi as a foundation model, equips it with a cell-specific decoder, and fine-tunes its sequence embeddings. Specifically, the decoder is conditioned on the cell position in a precomputed single-cell embedding.

This repository contains model and data loading code and a train script. The reproducibility repository contains notebooks to reproduce the results of the manuscript.

Hardware requirements

  • NVIDIA GPU (tested on A40), Linux, Python (tested with v3.9)

Installation instructions

Prerequisites

scooby uses a a custom version of SnapATAC2, which can be installed with pip. This is best installed in a separate environment due to numpy version conflicts with scooby.

  • pip install snapatac2-scooby

Scooby package installation

  • pip install git+https://github.com/gagneurlab/scooby.git
  • Download file contents from the Zenodo repo
  • Use examples from the scooby reproducibility repository

Training

We offer a train script for modeling scRNA-seq only and a script for multiome modeling. Both require SNAPATAC2-preprocessed anndatas and embeddings. Training scooby takes 1-2 days on 8 NVIDIA A40 GPUs with 128GB RAM and 32 cores.

Model architecture

Currently, the model is only tested with a batch size of 1.

image