This repository contains the custom code used to produce the simulation that is shown in our manuscript "Single-cell RNAseq for the study of isoforms: how is that possible?", by Ángeles Arzalluz-Luque and Ana Conesa [1].
The prefixes sr
and lr
are short for short and long reads, respectively. They designate the
simulation_function
files in the R folder,
which contain custom functions that implement some of the main steps of the simulation, and the vignette files.
The data folder includes the necessary data to run the pipeline from the beginning, in the form of .RData files. These files are also designated whith appropriate prefixes where needed.
sr_trancsriptome.rda
: contains the transcript sequences from Tardaguila et al. [2], upon which we based our simulation, and their fasta headers.gene.isoform_table.rda
: contains the isoform name and the gene they belong to, also obtained from the Tardaguila et al. [2] transcriptome.transcript_expression.rda
: contains the isoform expression table (bulk RNAseq data) produced by Tardaguila et al. [2], for the two samples (neural stem cells and oligodendrocytes) and their two replicates.sr_sim.isoform.results.NSC.rda
contains the output of RSEM+STAR, and can be loaded into R to avoid running this part of the pipeline (see next section).- The rest of the files containing the
sr
prefix include intermediate step data. What they correspond to and how they can be loaded is specified in the short read simulation vignette.
Found in the transcriptome folder:
transcriptome.fasta
is the transcriptome fasta file from Tardaguila et al. [2], necessary to simulate reads from full-length transcripts using the polyester package.annotation.gtf
is the annotation generated by Tardaguila et al. [2] for this transcriptome. Although it is not necessary to run the R code, it is required to run RSEM + STAR.
Download this repository, and change your working directory in R to the corresponding folder. Then, source the .R files containing the custom simulation functions and load the data. For instance, to start running the short-read simulation, execute the following in the R terminal:
source("sr_simulation_functions.R")
load("data/sr_transcriptome.rda")
as specified in the frist lines of the short read simulation vignette. Then, follow the vignettes.
-
Arzalluz-Luque A, Conesa A. Single-cell RNAseq for the study of isoforms-how is that possible? Genome Biol. 2018;19:1–19. https://doi.org/10.1186/s13059-018-1496-z
-
Tardaguila M, de la Fuente L, Marti C, Pereira C, Pardo-Palacios FJ, del Risco H, et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res. 2018. https://doi.org/10.1101/gr.222976.117