Replies: 2 comments
-
Hi Bina, it's best to ask the Authors of that pipeline directly. I am not familiar with it, unfortunately. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am trying to follow the pipeline from this article:
Yu, H., Li, M., Sandhu, J. et al. Pervasive misannotation of microexons that are evolutionarily conserved and crucial for gene function in plants. Nat Commun 13, 820 (2022). https://doi.org/10.1038/s41467-022-28449-8
which uses junctions separately collected from STAR first mapping combined with OLego mapping (for junction detection using very small seeds), to perform the the second pass mapping.
The SJ.out.tab files from STAR are supposed to be combined with the .sam files from OLego, but I'm not sure how to go about doing this. Any ideas would be helpful or maybe I am misunderstanding something?
Here is the excerpt from the methods section of the paper:
"First, STAR genome indexes were generated from the reference genome fasta file and the GTF file of annotated transcripts. The clean RNA-seq fastq files were used to perform first pass mapping with STAR (parameters: --alignIntronMin 20 --alignIntronMax 20000 --outSAMtype None --outSJfilterReads Unique --outSJfilterCountUniqueMin 10 3 3 3 --outSJfilterCountTotalMin 10 3 3 3). Independently, OLego was used to map the same set of RNA-seq data to reference genome without transcript annotation provided (parameters: -e 3 -I 20000 --max-multi 5). The junctions were separately collected from the STAR first mapping and OLego mapping. These junctions were added to the STAR genome index for second pass mapping (parameters: --alignIntronMin 20 --alignIntronMax 20000 --limitBAMsortRAM 5000000000 --outSAMstrandField intronMotif --alignSJoverhangMin 20 --outSAMtype BAM SortedByCoordinate). Transcripts in each sample were assembled using StringTie with the reference annotation as a guide (parameters: -j 5 -c 5 -g 10 -G annotation.gtf) and merged together from all samples using transcript merge mode in StringTie (--merge), and the read coverage tables of introns, exons and transcripts (parameters: -e -B -G merged.gtf) were loaded into R package ballgown. For internal exon identification, only the exons with at least 5 junction reads on both sides of flanking introns in at least one sample were considered (Supplementary Fig. 1a). If alternative splicing occurs for one intron around the exon, the intron that has the most average junction reads in a population was used. Only introns with canonical splice sites (GT-AG or GC-AG) were considered in this study. The smallest microexons (1–15 nt) were identified in each species separately."
Yu, H. et al., 2022.
Thank you!
Bina
Beta Was this translation helpful? Give feedback.
All reactions