Skip to content

SQANTI3 isoform classification: categories and subcategories

Ángeles Arzalluz-Luque edited this page Apr 26, 2022 · 5 revisions

SQANTI isoform classification: categories and subcategories

SQANTI3 classifies each isoform by finding the best matching reference transcript, in the following order:

  • FSM (Full Splice Match): meaning the reference and query isoform have the same number of exons and each internal junction agree. The exact 5' start and 3' end can differ by any amount.

  • ISM (Incomplete Splice Match): the query isoform has fewer 5' exons than the reference, but each internal junction agree. The exact 5' start and 3' end can differ by any amount.

  • NIC (*Novel In Catalog): the query isoform does not have a FSM or ISM match, but is using a combination of known donor/acceptor sites.

  • NNC (*Novel Not in Catalog): the query isoform does not have a FSM or ISM match, and has at least one donor or acceptor site that is not annotated.

  • Antisense: the query isoform does not have overlap a same-strand reference gene but is anti-sense to an annotated gene.

  • Genic Intron: the query isoform is completely contained within an annotated intron.

  • Genic Genomic: the query isoform overlaps with introns and exons.

  • Intergenic: the query isoform is in the intergenic region.

Some of these categories are further divided into subcategories (specified in the subcategory field in SQANTI3 classification output). These will be explained in the sections below.

sqanti_explain

FSM subcategories

ISM subcategories

ISM_subtype

NIC subcategories

Novel isoforms are subtyped based on whether they use a combination of known junctions (junctions are pairs of , sites), a combination of known splice sites (the individual donor and acceptor sites are known, but at least combination is novel), or at least one splice site (donor or acceptor) is novel.

NIC_subtype