Merging junctions from 1st pass with output from OLego #1671

binasu · 2022-10-05T13:28:40Z

binasu
Oct 5, 2022

Hi,

I am trying to follow the pipeline from this article:
Yu, H., Li, M., Sandhu, J. et al. Pervasive misannotation of microexons that are evolutionarily conserved and crucial for gene function in plants. Nat Commun 13, 820 (2022). https://doi.org/10.1038/s41467-022-28449-8

which uses junctions separately collected from STAR first mapping combined with OLego mapping (for junction detection using very small seeds), to perform the the second pass mapping.

The SJ.out.tab files from STAR are supposed to be combined with the .sam files from OLego, but I'm not sure how to go about doing this. Any ideas would be helpful or maybe I am misunderstanding something?

Here is the excerpt from the methods section of the paper:

"First, STAR genome indexes were generated from the reference genome fasta file and the GTF file of annotated transcripts. The clean RNA-seq fastq files were used to perform first pass mapping with STAR (parameters: --alignIntronMin 20 --alignIntronMax 20000 --outSAMtype None --outSJfilterReads Unique --outSJfilterCountUniqueMin 10 3 3 3 --outSJfilterCountTotalMin 10 3 3 3). Independently, OLego was used to map the same set of RNA-seq data to reference genome without transcript annotation provided (parameters: -e 3 -I 20000 --max-multi 5). The junctions were separately collected from the STAR first mapping and OLego mapping. These junctions were added to the STAR genome index for second pass mapping (parameters: --alignIntronMin 20 --alignIntronMax 20000 --limitBAMsortRAM 5000000000 --outSAMstrandField intronMotif --alignSJoverhangMin 20 --outSAMtype BAM SortedByCoordinate). Transcripts in each sample were assembled using StringTie with the reference annotation as a guide (parameters: -j 5 -c 5 -g 10 -G annotation.gtf) and merged together from all samples using transcript merge mode in StringTie (--merge), and the read coverage tables of introns, exons and transcripts (parameters: -e -B -G merged.gtf) were loaded into R package ballgown. For internal exon identification, only the exons with at least 5 junction reads on both sides of flanking introns in at least one sample were considered (Supplementary Fig. 1a). If alternative splicing occurs for one intron around the exon, the intron that has the most average junction reads in a population was used. Only introns with canonical splice sites (GT-AG or GC-AG) were considered in this study. The smallest microexons (1–15 nt) were identified in each species separately."

Yu, H. et al., 2022.

Thank you!

Bina

alexdobin · 2022-10-11T15:34:16Z

alexdobin
Oct 11, 2022
Maintainer

Hi Bina,

it's best to ask the Authors of that pipeline directly. I am not familiar with it, unfortunately.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging junctions from 1st pass with output from OLego #1671

{{title}}

Replies: 2 comments

{{title}}

Select a reply

Merging junctions from 1st pass with output from OLego #1671

binasu Oct 5, 2022

Replies: 2 comments

alexdobin Oct 11, 2022 Maintainer

binasu
Oct 5, 2022

alexdobin
Oct 11, 2022
Maintainer