This repository contains all code to reproduce the results from the 4CG translocation preprint.
All data is available at ArrayExpress under the accession numbers E-MTAB-13585 (whole genome sequencing data), E-MTAB-14291 (whole genome long read sequencing data), E-MTAB-13586 (scRNA-Seq of splenocytes) and E-MTAB-13700 (snRNA-Seq data of liver). This includes raw read files, and per cell expression quantifications and filtered count matrices with cell type annotations.
The whole_genome_seq folder contains a snakemake workflow for alignment and deduplication. Furthermore, it contains a script containing code to generate coverage quantifications including allele-specific signals and code to generate all figures. The whole_genome_seq_ont folder contains a snakemake workflow to align the nanopore data.
The sc_rna_seq folder contains a snakemake workflow for cellranger-based alignment, and r scripts to assemble, filter and annotate the datasets. In the analysis folder, there are scripts to perform differential expression analysis and generated the plots in the main figures, as well as code to generated all supplementary figures.
The preprocessing of sci-rnaseq on liver cells was done using the sci-rocket pipeline (https://github.com/odomlab2/sci-rocket/tree/main). The sn_rna_seq folder contains the scripts used to process the data and generate main and supplemental figures.
Jasper Panten ([email protected])