Skip to content

This repository contains the custom code necessary to recreate the simulation in the manuscript by A. Arzalluz-Luque and A. Conesa, "Single-cell RNAseq for the study of isoforms: how is that possible?", which can be found here: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1496-z

Notifications You must be signed in to change notification settings

aarzalluz/singlecell-isoform-simulation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Single-cell RNAseq for the study of isoforms: simulation code

This repository contains the custom code used to produce the simulation that is shown in our manuscript "Single-cell RNAseq for the study of isoforms: how is that possible?", by Ángeles Arzalluz-Luque and Ana Conesa [1].

.R files

The prefixes sr and lr are short for short and long reads, respectively. They designate the simulation_function files in the R folder, which contain custom functions that implement some of the main steps of the simulation, and the vignette files.

Data

The data folder includes the necessary data to run the pipeline from the beginning, in the form of .RData files. These files are also designated whith appropriate prefixes where needed.

  • sr_trancsriptome.rda: contains the transcript sequences from Tardaguila et al. [2], upon which we based our simulation, and their fasta headers.
  • gene.isoform_table.rda: contains the isoform name and the gene they belong to, also obtained from the Tardaguila et al. [2] transcriptome.
  • transcript_expression.rda: contains the isoform expression table (bulk RNAseq data) produced by Tardaguila et al. [2], for the two samples (neural stem cells and oligodendrocytes) and their two replicates.
  • sr_sim.isoform.results.NSC.rda contains the output of RSEM+STAR, and can be loaded into R to avoid running this part of the pipeline (see next section).
  • The rest of the files containing the sr prefix include intermediate step data. What they correspond to and how they can be loaded is specified in the short read simulation vignette.

Transcriptome files

Found in the transcriptome folder:

  • transcriptome.fasta is the transcriptome fasta file from Tardaguila et al. [2], necessary to simulate reads from full-length transcripts using the polyester package.
  • annotation.gtf is the annotation generated by Tardaguila et al. [2] for this transcriptome. Although it is not necessary to run the R code, it is required to run RSEM + STAR.

Instructions for running the simulations

Download this repository, and change your working directory in R to the corresponding folder. Then, source the .R files containing the custom simulation functions and load the data. For instance, to start running the short-read simulation, execute the following in the R terminal:

source("sr_simulation_functions.R")
load("data/sr_transcriptome.rda")

as specified in the frist lines of the short read simulation vignette. Then, follow the vignettes.

References

  1. Arzalluz-Luque A, Conesa A. Single-cell RNAseq for the study of isoforms-how is that possible? Genome Biol. 2018;19:1–19. https://doi.org/10.1186/s13059-018-1496-z

  2. Tardaguila M, de la Fuente L, Marti C, Pereira C, Pardo-Palacios FJ, del Risco H, et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res. 2018. https://doi.org/10.1101/gr.222976.117

About

This repository contains the custom code necessary to recreate the simulation in the manuscript by A. Arzalluz-Luque and A. Conesa, "Single-cell RNAseq for the study of isoforms: how is that possible?", which can be found here: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1496-z

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages