Single-cell RNAseq for the study of isoforms: simulation code

This repository contains the custom code used to produce the simulation that is shown in our manuscript "Single-cell RNAseq for the study of isoforms: how is that possible?", by Ángeles Arzalluz-Luque and Ana Conesa [1].

.R files

The prefixes sr and lr are short for short and long reads, respectively. They designate the simulation_function files in the R folder, which contain custom functions that implement some of the main steps of the simulation, and the vignette files.

Data

The data folder includes the necessary data to run the pipeline from the beginning, in the form of .RData files. These files are also designated whith appropriate prefixes where needed.

sr_trancsriptome.rda: contains the transcript sequences from Tardaguila et al. [2], upon which we based our simulation, and their fasta headers.
gene.isoform_table.rda: contains the isoform name and the gene they belong to, also obtained from the Tardaguila et al. [2] transcriptome.
transcript_expression.rda: contains the isoform expression table (bulk RNAseq data) produced by Tardaguila et al. [2], for the two samples (neural stem cells and oligodendrocytes) and their two replicates.
sr_sim.isoform.results.NSC.rda contains the output of RSEM+STAR, and can be loaded into R to avoid running this part of the pipeline (see next section).
The rest of the files containing the sr prefix include intermediate step data. What they correspond to and how they can be loaded is specified in the short read simulation vignette.

Transcriptome files

Found in the transcriptome folder:

transcriptome.fasta is the transcriptome fasta file from Tardaguila et al. [2], necessary to simulate reads from full-length transcripts using the polyester package.
annotation.gtf is the annotation generated by Tardaguila et al. [2] for this transcriptome. Although it is not necessary to run the R code, it is required to run RSEM + STAR.

Instructions for running the simulations

Download this repository, and change your working directory in R to the corresponding folder. Then, source the .R files containing the custom simulation functions and load the data. For instance, to start running the short-read simulation, execute the following in the R terminal:

source("sr_simulation_functions.R")
load("data/sr_transcriptome.rda")

as specified in the frist lines of the short read simulation vignette. Then, follow the vignettes.

References

Arzalluz-Luque A, Conesa A. Single-cell RNAseq for the study of isoforms-how is that possible? Genome Biol. 2018;19:1–19. https://doi.org/10.1186/s13059-018-1496-z
Tardaguila M, de la Fuente L, Marti C, Pereira C, Pardo-Palacios FJ, del Risco H, et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res. 2018. https://doi.org/10.1101/gr.222976.117

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
R		R
data		data
transcriptome		transcriptome
vignettes		vignettes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Single-cell RNAseq for the study of isoforms: simulation code

.R files

Data

Transcriptome files

Instructions for running the simulations

References

About

Releases

Packages

Languages

aarzalluz/singlecell-isoform-simulation

Folders and files

Latest commit

History

Repository files navigation

Single-cell RNAseq for the study of isoforms: simulation code

.R files

Data

Transcriptome files

Instructions for running the simulations

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages