Wednesday, September 30, 2020

Keeping an Index on a Subtle Difference in Illumina Chemistries

I like to pretend in this space that I catch all the little details of the different sequencing platforms.  Well, at least over time I try to do that.  But ego aside, that is often a mark not made.  A bit of a year ago I discovered that there's a small difference across the Illumina family that is completely separate from how clusters are generated (Bridge Amplification randomly arrayed or Exclusion Amplification in nanowells) or the wavelengths of light used in the fluorescence microscopy (now blue on the newest NextSeqs, with superresolution microscopy coming soon)  or 4 color vs. 2 color vs. 1-color (well, really staged 2-color) chemistry for the reversible terminators.  There's a subtle difference in how the second index is read.  I'm not spilling a deep secret: it's right out in plain sight within an Illumina technical document


One of the excitements to me about joining the NGS team at Ginkgo was the opportunity to get closer to the sequencers.  At Warp we did enormous amounts of Illumina sequencing, but it was via various outsource providers.  Most times, I was perfectly happy to send them samples and then get back one pair of FASTQ files per sample.  I didn't really think about how the dual indexing had worked to actually perform the feat of multiplexing my samples on a single flowcell.  But particularly with our SARS-CoV-2 diagnostic work, I've gotten intimately involved in designing and testing very large barcode sets to enable very deep multiplexing.  We also have MiSeqs, NextSeqs and NovaSeqs, and it turns out they don't all read the barcode in the same way.

In all of the chemistries, the Read 1 primer is introduced to the clustered flowcell and used to read the library inserts from one direction.  That strand is melted off and a new primer added to read Index 1, also known as i7 and it is on the P7 anchoring sequence side of the library fragment. Then things diverge.

On platforms using Forward Strand sequencing of index 2, a primer is now annealed to read the index 2 sequence So this primer anneals on the insert side of the index and reads across it towards the P5 flowcell anchoring sequence.  Then the second strand synthesis -- which I always mentally pair with a swimmers flip turn -- occurs and finally Read 2 of the insert is generated with a fourth primer.     As pointed out by a reader via Twitter, I summarized this in the wrong order -- so trying again.  For forward strand reading the resynthesis occur -- the swimmer's flip turn in my mind -- and then the Index 2 primer is annealed to read from the adaptor towards the P5 binding sequence.   If I had drawn this out -- or paid closer attention to the diagrams by Illumina, the getting the strandedness right would have pointed out my error.

But in the Reverse Strand chemistries the flip turn occurs and then the P5 flowcell graft is used to generate the Index 2 read after first running some dark (not imaged) cycles to advance the 3' end of the graft to the Index 2 sequence.  After that is melted off the Read 2 primer is used.

So a key effect of this is depending on which platform you are on, the Read 2 -- i5 -- will be read in either forward or reverse orientation.  Now back at Warp I didn't care as long as the provider got it right, but now that I'm looking at these in detail and digging into things like the reads that didn't demultiplex and may represent index hopping events, it's painfully important to keep track of which strand the index 2 was read on, particularly if you are trying to compare results between instruments which do this differently.  Or indexes vs. the way I designed them.  It's also apparently important for the operators to keep straight: the samplesheets specifying how to demultiplex the reads require Index 2 to be specified based on what the sequencer actually reads, not some canonical orientation.

So which is which?  It turns out that isn't always consistent on a particular instrument line.  MiSeq just operates with Forward Strand workflow and iSeq, MiniSeq and NextSeq use Reverse Strand workflow, but in the HiSeq family 2000 and 2500 were forward workflow but X, 3000 and 4000 used reverse workflow.  And now with the new NovaSeq v1.5 reagent kits announced last month by Illumina, NovaSeq now goes from forward (v1.0 reagents) to reverse (v1.5 reagents).

Is there a real advantage of one over the other?  I suspect there is, given that newer Illumina systems seem to all be getting the Reverse chemistry and the new NovaSeq kits change that instrument to it, but other than a seemingly trivial elimination of one primer.  But it also means those dark cycles, which you might think would throw a touch of dephasing in before the actual index reading even starts.  

But it does illustrate the importance of never assuming you know everything -- there could always be an important twist lurking right in plain sight and if you fail to keep open the possibility of that occurring, you might never see that critical detail.

1 comment:

James@cancer said...

I made a Gif for the clustering and sequencing process. Check it out at https://twitter.com/coregenomics/status/732594604474785797