One of the excitements to me about joining the NGS team at Ginkgo was the opportunity to get closer to the sequencers. At Warp we did enormous amounts of Illumina sequencing, but it was via various outsource providers. Most times, I was perfectly happy to send them samples and then get back one pair of FASTQ files per sample. I didn't really think about how the dual indexing had worked to actually perform the feat of multiplexing my samples on a single flowcell. But particularly with our SARS-CoV-2 diagnostic work, I've gotten intimately involved in designing and testing very large barcode sets to enable very deep multiplexing. We also have MiSeqs, NextSeqs and NovaSeqs, and it turns out they don't all read the barcode in the same way.
In all of the chemistries, the Read 1 primer is introduced to the clustered flowcell and used to read the library inserts from one direction. That strand is melted off and a new primer added to read Index 1, also known as i7 and it is on the P7 anchoring sequence side of the library fragment. Then things diverge.
But in the Reverse Strand chemistries the flip turn occurs and then the P5 flowcell graft is used to generate the Index 2 read after first running some dark (not imaged) cycles to advance the 3' end of the graft to the Index 2 sequence. After that is melted off the Read 2 primer is used.
So a key effect of this is depending on which platform you are on, the Read 2 -- i5 -- will be read in either forward or reverse orientation. Now back at Warp I didn't care as long as the provider got it right, but now that I'm looking at these in detail and digging into things like the reads that didn't demultiplex and may represent index hopping events, it's painfully important to keep track of which strand the index 2 was read on, particularly if you are trying to compare results between instruments which do this differently. Or indexes vs. the way I designed them. It's also apparently important for the operators to keep straight: the samplesheets specifying how to demultiplex the reads require Index 2 to be specified based on what the sequencer actually reads, not some canonical orientation.
So which is which? It turns out that isn't always consistent on a particular instrument line. MiSeq just operates with Forward Strand workflow and iSeq, MiniSeq and NextSeq use Reverse Strand workflow, but in the HiSeq family 2000 and 2500 were forward workflow but X, 3000 and 4000 used reverse workflow. And now with the new NovaSeq v1.5 reagent kits announced last month by Illumina, NovaSeq now goes from forward (v1.0 reagents) to reverse (v1.5 reagents).
Is there a real advantage of one over the other? I suspect there is, given that newer Illumina systems seem to all be getting the Reverse chemistry and the new NovaSeq kits change that instrument to it, but other than a seemingly trivial elimination of one primer. But it also means those dark cycles, which you might think would throw a touch of dephasing in before the actual index reading even starts.
But it does illustrate the importance of never assuming you know everything -- there could always be an important twist lurking right in plain sight and if you fail to keep open the possibility of that occurring, you might never see that critical detail.
I made a Gif for the clustering and sequencing process. Check it out at https://twitter.com/coregenomics/status/732594604474785797
ReplyDelete