Wednesday, February 20, 2019

Beyond Generations: My Vocabulary for Sequencing Tech

Many writers have attempted to divide Next Generation Sequencing into Second Generation Sequencing and Third Generation Sequencing.  Personally, I think it isn't helpful and just confuses matters.  I'm not the biggest fan of Next Generation Sequencing (NGS) to start with, as like "post-modern architecture" (or heck, "modern architecture") it isn't future-proofed.  Not that I wouldn't take a job with NGS in the title, but still not a favorite.  High Throughput Sequencing feels a little better, but again doesn't leave room for distinguishing growth -- and HTS as an abbreviation is already going to confuse anyone in Biopharma who thinks about High Throughput Screening.  Massively Parallel Sequencing sort of works, but my late father had a real pedantic objection to using "massive" for anything that lacked mass, and while I don't subscribe to that view such uses just don't sit well with me.  Worse, as I'll explain, trying to divide sequencer technologies into Second and Third generations creates more heat and smoke than light.  On a number of Twitter threads I've tried to launch my own terminology, but probably haven't been terribly consistent.  So here is an attempt at that.

NGS got the "next" because it followed the highly successful fluorescent Sanger as an automated sequencing technology.  All the electrophoretic technologies get lumped into First Generation: Maxam-Gilbert and Sanger.  I'll again repeat the claim I am the King (or at least High Steward) of Maxam-Gilbert generated sequences in Genbank, having over 1.5 megabases (generated by George Church's multiplex variation)- which was really impressive in the mid-90s but now is the warm-up period on a MinION.

The first NGS methods were George's ligation-based polony method and 454.  Both of those eliminated electrophoresis and cloning fragments in bacteria, replacing them with defined clonal populations of DNA arrayed on a solid surface.  So these were Second Generation.

It's Third Generation that really is problematic, and I'll point out the two cases that really underscore why it's a bad concept.  So authors make Third Generation synonymous with single molecule methods.  The problem is that in the timeline Helicos (now seqLL and Direct Genomics -- just Helicos for the rest of this essay), a single molecule method, preceded Ion Torrent, a polony method.  If all goes well, Genapsys will launch (at least in beta) this year with a non-single molecule method, seven years after Pacific Biosciences.  So trying to group these into generations is a poor metaphor, as human generations don't skip around like that.

So instead of trying to group into odd categories that just generate arguments over how to define them, let us instead focus on key attributes that distinguish sequencing platforms.  Now, that can go on for many levels, but I'm going to focus on a small number that work pretty well in the NGS space.  Some work for prior technologies, but often not very well.  I'll focus on commercially launched technologies, with some comments about widely-known technologies that are either incubating or were never quite launched.

Single Molecule vs. Clonal

This is the first really important split, if for no other reasone than clonal methods require a template generation step to create the polonies or clusters, whereas single molecule methods don't.  

Three single molecule methods have launched: Helicos, Pacific Biosciences and Oxford Nanopore.  Genia is believed to still wait in the wings.  Many proposals have failed, including Starlite, ZS Genetics, Lightspeed, Seq Ltd.  Lots of single molecule methods are being attempted at various startups.  Single molecule methods have a hard road for single-to-noise, because you don't get the amplified signal of a clonal population.  

Many clonal methods have launched: Polonator, 454, SOLiD, Illumina, Ion Torrent, Complete Genomics, BGI/MGI. QIAGEN.  Genapsys and probably OmniOme.  In the failed category goes GnuBio. 

We could attempt to break down clonal methods further by type of template preparation: PCR or RCA (I think that covers it).  PCR could be further broken down by whether 1 or 2 primers are fixed to a solid support -- emulsion PCR or Genapsys' proposed methods or exclusion amplification represent 1-fixed primer whereas bridge PCR represents two (I think I have that right on ExAmp -- please correct me in the comments if that is incorrect).

Beads, Wells, Surfaces, or Membranes

Where are the molecules to be sequenced?  Some methods have them on beads, which in turn can be just randomly packed or attached to some sort of microwell or other microstructure.  Or we can have microstructures or microwells but no beads.  Some just bind to a prepared surface, which may have a pattern to guide where the active sequencing elements will form.  Nanopore methods use membranes.  ONT is a special case relative to Genia -- and anybody else -- in that the sequencing sites can "reload" a new DNA molecule.  This is critical to high throughput on ONT: an iSeq sequences molecules with a 1:1 ratio of active locations on a given flowcells, whereas each active pore on a MinION can sequence thousands of different DNA molecules

Most platforms stick to one scheme, but that's not strictly necessary.  In particular, some Illumina models use attachment to a flat surfaces whereas others use microwells.

Cyclic vs. Continuous

This is another big split.  Cyclic chemistries require valves and pumps to deliver reagents.  To date all clonal methods are cyclic (and probably must be) -- but so is Helicos!  Another example of a cyclic single molecule sequencer is NanoString's proposed Hyb & Seq

A key qualifier for cyclic methods is whether the signal is stable or transient.  Stable signals can be scanned patiently; transients ones must be read immediately.  454 and Ion Torrent are the two cases of cyclic chemistries that can't wait.

Starlight didn't have a cycle to the main chemistry, but the system could go through rounds of restarting.  A hard edge case for this classification.  PacBio's discontinued strobe sequencing meant cycling the light source between off-and-on, but that's a detection issue not a chemistry one.

Continuous methods deliver longer reads -- PacBio, Oxford Nanopore.  They also don't require valving or pumps.  Data comes in constantly and with no sense of being under the control of a metronome.  Genia's technology would be continuous as well. 

Another form that has been proposed, but never executed successfully, are snapshot methods -- the chemistry all happens and you simply image.  Electron microscopy methods would fall into this category.  GnuBio doesn't get imaged all at once, but there's no cycles to processing -- each droplet would have been interrogated once.  But their scheme did require moving liquids, albeit as microfluidic droplets.  

Signal: Optical vs. Electrical 

This is another big split.  Optical implies some degree of lenses and filters and perhaps light sources -- though the latter is unnecessary if the sequencing chemistry itself provides the light as in 454.  To date only two non-optical NGS methods have hit the market: Ion Torrent and Oxford Nanopore.  If we wanted to subdivide optical, then we have a single chemiluminescent method (454) and everything else is fluorescence. Fluoresence platforms have been launched with 1 (Helicos & Illumina iSeq), 2 (Illumina NextSeq. MiniSeq and NovaSeq) and 4 colors (everybody else, including most Illumina boxes).   Genapsys hopes to be the third electronic detection sequencer to launch.  

Other modalities are certainly possible.  A number of proposals for electron microscopy - transmission, Auger transmission, scanning tunneling and atomic force have all been proposed.  Mass spectroscopic detection of labels is another never-quite-launched idea, though Sequenom did use it for more of a genotyping than sequencing application.  

Synthesis vs. Ligation vs. Pore Passing vs. Digestion vs. Hybridization vs. ....

Now things really blow out.  There are a large number of systems that rely on DNA polymerase to synthesize DNA (or to be frozen in the act of trying to do so): 454, Illuitmina, Helicos, Ion Torrent, BGI/MGI, QIAGEN, PacBio and maybe soon Genapsys. Often known as SBS for Sequencing-By-Synthesis.  Genia's technology is based on a single molecule sequencing-by-synthesis but with nanopore detection. Note that this is one place we really see why something with more resolution than two numbers are needed: on this score PacBio is very different than Helicos, but each has other technologies that are close to it.  

Synthesis has an important qualifier that splits these further: terminated (Illumina, Helicos, BGI/MGI, QIAGEN) vs. non-terminated (454, Ion Torrent, PacBio and probably Genapsys).  Terminated sequencing-by-synthesis chemistries tend to deal with homopolymers better than unterminated ones, but using native nucleotides can lead to higher accuracy because you aren't feeding weird nucleotides to polymerases.

I'll point out that one quirk of Helicos relative to other terminated chemistries is that it is single color, and so only one nucleotide is presented at a time -- which is the worst case for high accuracy -- though the Genapsys patents have some interesting strategies around that such as including terminator versions of the other three nucleotides.

Ligation has faded from favor, but did power the original Church publication (which became Polonator chemistry), SOLiD and the original Complete Genomics platform.  Ligation was once thought to have an inherent accuracy advantage, but that has been eclipsed by the longer read lengths and less expensive chemistries for SBS methods.  I'll note also -- perhaps to stimulate someone to make a quixotic attempt -- that as-far-as-I-know nobody has created a single molecule sequencing-by-ligation scheme.

Nobody has quite launched a sequencing by digestion method -- using exonuclease to cleave DNA.  That is what Seq Ltd proposed along with an academic competitor and at one point Oxford Nanopore worked on it.

Oxford Nanopore's "strand sequencing" technology really needs a different name -- everybody sequences strands.  Pore passing is my current favorite, but I'm open to suggestion.  Now, Genia's technology also relies on pores, but the signal is really generated by synthesis (tags are released by polymerization and captured by the pores) and in many ways it has more in common with those methods.  But that is a reminder to not take these distinctions too seriously -- or to give up on making the classifications mutually exclusive.  Note also that because ONT is single molecule and not a synthesis method, we could infer it is reading native DNA -- which it is.   But perhaps that should be an explicit flag as well.

Hyb & Seq is strictly a hybridization method.  Of course, anything with DNA is probably using hybridization anywhere, but this method uses only probe hybridization to interrogate the sequences.  GnuBio's proposed method was effectively sequencing-by-hybridization, as each signal simply indicated whether a given fragment did or did not contain a certain k-mer.  An interesting historical footnote is that a lot of work went into trying to develop sequencing-by-hybridization methods in the 1990s and these amounted to very little -- other than driving some of the early informatics work that yielded the de Bruijn type assemblers that were needed for early NGS data.

Could more modalities show up?  I think so.  I'm betting that if I carefully reviewed all the startups out there, I'd find a few more general principles.


Here's what different technologies I've discussed look like in this nomenclature -- I've included all launched technologies and select unlaunched ones
Polonator: Clonal(PCR-1), Beads(?), Cyclic(Stable), Optical(Fluor), Ligation

454: Clonal(PCR-1), Beads in wells, Cyclic(Transient), Optical(Chemi), Synthesis(Unterminated)

Illumina: Clonal(PCR-2 or PCR-1), Surface or microwells, Cyclic(Stable), Optical(Fluor), Synthesis(Terminated)

SOLiD: Clonal(PCR-1, random packed beads), Cyclic(Stable), Optical(Fluor), Ligation

Complete Genomics: Clonal(RCA, random packed beads?), Cyclic(Stable), Optical(Fluor), Ligation

Helicos: Single molecule, Surface, Cyclic(Stable), Optical(Fluor), Synthesis(Terminated)

Ion Torrent: Clonal(PCR-1), Beads in wells,  Cyclic(Transient), Electrical, Synthesis(Unterminated)

Pacific Biosciences: Single molecule, Microwells, Continuous, Optical(Fluor), Synthesis(Unterminated)

GnuBio: Clonal(PCR), Droplets, Single interrogation, Optical(Fluor), Hybridization

Oxford Nanopore; Single molecule, Membrane(Reloading), Continuous, Electrical, Pore passing

Genia: Single molecule, Continuous, Membrane, Electrical, Synthesis(Unterminated)

Hyb & Seq: Single molecule, Surface, Cyclic, Optical(Fluor), Hybridization

Genapsys: Single molecule, Beads in microwells, Cyclic(Transient or Stable), Electrical, Synthesis(Unterminated?)

QIAGEN: Clonal(RCA), Beads, Cyclic(Stable), Optical(Fluor), Synthesis(Terminated)

MGI/BGI: Clonal(RCA), Surface(Patterned), Cyclic(Stable), Optical(Fluor), Synthesis(Terminated)

Well, that's my wordy scheme for classifying sequencing technologies.  It isn't simple and catchy like "N-th Generation" -- but instead it delivers a lot of information.  It's  a crude ontology of sequencer space.  It's also, as I've tried to indicate, imperfect and a work-in-progress.  Suggestions welcome!


Liang Zong said...

Thank you Keith, very comprehensive descriptions, informative and also inspiring.

Duarte Molha said...

Good attempt... but I seriously doubt it will catch on. :)

As for what comes after NGS... I think one technology (maybe 2) will emerge as clearly dominant and all others will fall by the wayside. So ... maybe we don't really have to worry about calling it something like Ultra Generation Sequencing to distinguish it...

It seems to me if the current weaknesses of ONT are able to be addressed there is no future for the current alternatives. As direct reading of native DNA without much (on any preprocessing) is clearly the winning formula.

gasstationwithoutpumps said...

You forgot to include direct RNA sequencing (only on ONT, I think).

Keith Bradnam said...

This reminds me of a series of blog posts I wrote about the problems that have arisen from the various attempts to define next-generation sequencing (as well as next-next generation, 3rd, 4th generation etc.).