Skip to content

Conversation

semenko
Copy link

@semenko semenko commented Jul 14, 2023

This PR updates and expands the list of serial numbers of Illumina 2-color SBS sequencers for poly-g trimming.

These values come from: https://knowledge.illumina.com/instrumentation/general/instrumentation-general-reference_material-list/000003880

This expands the previous 2-color list by adding:

  • NovaSeq 1000/2000 (@VL @VH)
  • NovaSeq X Plus (@LH)

This also broadens the NovaSeq 6000 serial from (@A0 --> @A) per Illumina's doc.

(I do not see @NDX documented by illumina, but this might be their NextSeq 550Dx FDA-regulated sequencer.)

Expanded parsing of Illumina 2-color SBS definitions for poly-g trimming.

These values are via: https://knowledge.illumina.com/instrumentation/general/instrumentation-general-reference_material-list/000003880

This expands the previous 2-color list by adding:
Novaseq 1000/2000 (@vl @vh)
Novaseq X Plus (@lh)

This changes the Novaseq 6000 header from (@a0 to @A) per Illumina's doc.

(I do not see @ndx documented by illumina, but this might be their NextSeq 550Dx FDA-regulated sequencer.)
semenko added a commit to semenko/liquid-cell-atlas that referenced this pull request Jul 14, 2023
OK for our data so far, and I submitted a PR to fastp:
OpenGene/fastp#508
@dlaehnemann
Copy link

I would like to rely on the automatic setting of the --trim_poly_g, and it would be great to see this pull request here, as well as #598 merged. Is anything holding this back?

Also, the above Illumina page is gone, which they do regularly on their docs. 🤦
But here's a page that at least mentions which models should be 2-channel (and also the 1-channel iSeq):
https://web.archive.org/web/20250701072529/https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology/2-channel-sbs.html

And it seems like 10X put in quite some effort to collect all of Illumina's machine codes right here, although this seems to be 8 years old, so there will also be stuff missing:
https://github.com/10XGenomics/supernova/blame/b82c3d8efa68bda2d95f30621cd6d91308ce11a2/tenkit/lib/python/tenkit/illumina_instrument.py#L12-L45

So maybe these pull requests could be merged into one and amended if any of the other machine codes are missing.

@dlaehnemann
Copy link

Ah, and I finally also found this, where the author actually got info from Illumina support (this seems to be the only way of getting useful and somewhat structured info from them):
https://github.com/nickp60/fcid/blob/04bd2e6aab1979a6902a4470cc82c574991242a0/fcid/run.py#L5-L123

@dlaehnemann
Copy link

OK, I couldn't help myself and went to figure out which Illumina machine models will have the polyG issue. The list is (TL;DR):

  • iSeq 100
  • MiniSeq
  • MiSeq i100 Series
  • NextSeq 500/550
  • NextSeq 1000/2000
  • NovaSeq 6000
  • NovaSeq X

And here is the full story, with receipts:

Illumina Instrument Imaging Channel Systems

Instrument Model Imaging Technology Channel System Details Documentation Link Checked manually
iSeq 100 One-channel one dye, compact system Chemistry and Imaging on iSeq 100 Done.
MiniSeq Two-channel red and green Chemistry and Imaging on MiniSeq Done.
NextSeq 500/550 Two-channel red and green Chemistry and imaging on NextSeq 500/550 Done.
NextSeq 1000/2000 Two-channel blue and green, standard or XLEAP (A blue) Chemistry and imaging on the NextSeq 1000/2000 Done.
MiSeq Four-channel oldest SBS (Sequencing by Synthesis) chemistry Chemistry and imaging on MiSeq Done.
MiSeq i100 Series Two-channel blue and green, XLEAP (C blue) Two Channel Chemistry and Imaging on the MiSeq i100 Series Done.
HiSeq 1000/2500 Four-channel oldest SBS (Sequencing by Synthesis) chemistry Chemistry and imaging on MiSeq (also mentions HiSeq series) Done.
HiSeq X Four-channel oldest SBS (Sequencing by Synthesis) chemistry HiSeq X System Guide (15050091 v07) (mentions "the four color channels") Done.
NovaSeq 6000 Two-channel red and green Chemistry and Imaging on NovaSeq 6000 Done.
NovaSeq X Series Two-channel blue and green, XLEAP (A blue) Chemistry and Imaging on the NovaSeq X Series Instruments Done.

Channel System Summary

one-channel

Good overview: https://web.archive.org/web/20250701121553/https://knowledge.illumina.com/instrumentation/iseq-100/instrumentation-iseq-100-reference_material-list/000008434

Machine series:

  • iSeq 100

General setup:

  • one dye
  • each sequencing cycle has two rounds of chemistry + imaging

Color scheme:

  • adenine: first image only
  • cytosine: second image only
  • thymine: both images
  • guanine: permanently dark

two-channel

Good overview: https://web.archive.org/save/https://knowledge.illumina.com/instrumentation/novaseq-x-x-plus/instrumentation-novaseq-x-x-plus-reference_material-list/000007970

General setup:

  • two dyes (different colors, different base associations)
  • each sequencing cycle has two rounds of chemistry + imaging

two-channel: red and green

Machine series:

  • MiniSeq
  • NextSeq 500/550
  • NovaSeq 6000

Color scheme:

  • thymine: green
  • cytosine: red
  • adenine: both
  • guanine: dark

two-channel: blue and green

Standard reagents

Machine series:

  • NextSeq 1000/2000

Color scheme:

  • thymine: green
  • cytosine: blue
  • adenine: both
  • guanine: dark

XLEAP Reagents (A blue)

Machine series:

  • NextSeq 1000/2000
  • NovaSeq X

Color scheme:

  • thymine: green
  • adenine: blue
  • cytosine: both
  • guanine: dark

XLEAP Reagents (C blue)

Good overview: https://web.archive.org/web/20250701122436/https://knowledge.illumina.com/instrumentation/miseq-i100-series/instrumentation-miseq-i100-series-reference_material-list/000009348

Machine series:

  • MiSeq i100 Series

Color scheme:

  • thymine: green
  • cytosine: blue
  • adenine: both
  • guanine: dark

four-channel

Good overview: https://web.archive.org/web/20250701122141/https://knowledge.illumina.com/instrumentation/miseq/instrumentation-miseq-reference_material-list/000003757

Machine series:

  • MiSeq (except the i100 series)
  • HiSeq Series (docs mentioning HiSeq 1000/2500 and HiSeq X, but not other HiSeqs)

General setup:

  • four dyes
  • each sequencing cycle has four rounds of chemistry + imaging

Color scheme:

  • thymine: green
  • cytosine: yellow
  • adenine: red
  • guanine: blue

Data compiled from Illumina Knowledge Base documentation as of July 1st, 2025. The initial table was created by asking Claude Sonnet 4, to aggregate the relevant info scattered across Illumina Knowledge Base pages. But all entries and linkouts were checked manually, especially those for the HiSeq series were adjusted to point somewhere with a useful citation, an all pages were archived on the Wayback Machine (as Illumina often changes their links). Finally, I made the table much more concise by giving more detailed channel system descriptions below, which I compiled during cross-checking.

@dlaehnemann
Copy link

dlaehnemann commented Jul 4, 2025

I tried to compile a list of identifiers for Illumina machine models with two- or one-channel imaging (the ones with the polyG tail issue).
Theoretically, one should be able to identify them from the Illumina Serial Number (ISN) in the fastq headers, and most of those seem to be known from Illumina or other sources (I didn't find anything for the recent MiSeq i100 Series machines).
But it seems like this instrument name that contains the ISN can be changed in the machine setup.
So this is probably actually not the most reliable way to determine the machine model.

Instead, it probably makes more sense to use the flowcell ID, which also contains codes for the machine models.
This supposedly gets generated automatically by the machines and cannot be altered.
And people have gotten to the Flowcell ID patterns by emailing Illumina support and have a documented this for most models.
But again, information on the MiSeq i100 Series is missing.
I guess, one would have to email again for a current list, and maybe suggest they put a comprehensive list somewhere, and keep it up to date...

Here's what I could find.
An x would be represented by the regex [A-Za-z0-9] for pattern matching.

Illumina machine model Instrument ID Sources Flowcell ID pattern alt codes Sources
iSeq 100 @FS ISN, iSeq PR BRBxxxxx-xxxx BPC, BPG, BPA, BPL, BNT, BTR fcid
MiniSeq @MN ISN, 10X, fcid 000Hxxxxx fcid
MiSeq i100 Series ? ?
NextSeq 500/550 @NS, @NB ISN, PR #508, 10X, fcid xxxxxAFxx BG, AG fcid
NextSeq 1000/2000 @VH, @VL ISN, PR #508 xxxxxxxM5 HV fcid, 10X
NovaSeq 6000 @A, @NA ISN, PR #508, fcid xxxxxDRxx DM, DS fcid, 10X, ICTN62, ICTN63
NovaSeq X @LH ISN, PR #508 xxxxxxLTx fcid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants