While I was not quite getting around to writing this, Nava Whiteford did and wrote a nice piece -- well, it looks good but I only skimmed it so I wouldn't be grossly contaminated (or worse, decide there was no ground left to cover). Also, I do want to laud Genia for publishing the paper with PNAS's Open Access option, which let's me use figures from the paper. It always surprises me how rare this is -- if you are publishing to get publicity, don't you want to eliminate barriers to your publication?
I've written extensively about Oxford Nanopore in this space, which serves as a useful reference on a number of points. There are also conceptual similarities in the Genia work to Pacific Biosciences and Ilumina.
Oxford performs strand sequencing with nanopores: single stranded DNA is driven through a pore and signals are generated from interactions with the pore which enable determining the sequence. As Oxford (and its users) have found, this is a challenging task to do accurately, though Oxford (and its users) continue to make substantial progress.
Genia takes a different tack. As bases are incorporated, the pore captures a phosphate-linked tag from the nucleotide that generates a characteristic signal. Hence it is a sequencing-by-synthesis method ala Illumina, QIAGEN or PacBio, but with electronic rather than optical detection, and like PacBio it is a single molecule method. As shown in the Figure below, a modified (Exo- plus other, unspecified mutations) BstI DNA polymerase is fused to one of the two pore subunits. These pores are monitored electronically by a custom chip, which in the paper's iteration has the potential to track 264 such reactions, though data from only single pores is shown. As with Oxford, there are no valves, pumps or optical components.
A previous paper demonstrated the concept with tags composed of the polymer PEG (polyethylene glycol), but that system required high salt conditions which are undesirable. The new tags are modified oligonucleotides (diagrammed below). Because the alpha hemolysin pore can accept single-stranded DNA, the templates used here were artificially synthesized to contain hairpins, with one hairpin providing the primer for synthesis and the other an unpaired tail. The extent to which the pores would take up double-stranded DNA lacking hairpins is not explored.
This hints at the sorts of library preparation which will be required, which will be to add such hairpins. One property of library schemes is whether the adaptors have polarity and if so what combination of polarities are needed. For example, my understanding is that PacBio simply adds a hairpin at each end which are indistinguishable; these adaptors do not have polarity.
In contrast, Illumina, 454 and Ion Torrent (don't know for QIAGEN, but probably) require two different adaptors, and the only molecules which will be successfully sequenced have one of each type. This is often strongly biased for using Y-adaptors, which cleverly add only 1 type of adaptor to 5' ends and the other to 3' ends. Oxford 1D sequencing simply requires a 5' tag, introduced by the transposase for that prep. Oxford 2D sequencing requires two tags (one which has a 5' overhang and the other a hairpin)(, and 2D reads can arise only from templates that receive one of each. Molecules that receive two hairpins won't sequence with Oxford, whereas those with an overhang at each end will generate only a 1D read.
Conversely, simple linear adaptors can be incorporated during PCR, by including these in the amplification primers, obviating the need for a separate library preparation step. it isn't clear whether this will be possible for Genia; if the pores will somehow take in DNA lacking hairpins, that could be a problem. Perhaps a bulky tag on the 5' end of the oligo could work, and perhaps the priming could be created by a break-apart primer scheme (in which a specific portion of the primer can be degraded, such as by making it with ribonucleotides and applying the proper ribonuclease).
The small amount of data presented makes estimating accuracy difficult. The figures suggest that signals do not occur metronomically in this system, even with simple templates. Spikes in the signal are spaced at varying intervals. That doesn't create optimism for the sorts of kinetic measurements possible with PacBio that have enabled methylation detection. The distribution of signal intensities would also appear to overlap, based on Figure 6 (below) where the last G and the following A have very similar amplitudes. Improving signal discrimination is mentioned in the text as an area for further development.
The authors also note that they observed stuttering, in which the signal from a single tag goes repeatedly goes between a positive and baseline; I believe this can be seen in the top of Figure 5 (below), particularly for the C in that case. They state that using strontium chloride in the reaction can suppress this, and also a thought that mutants may be identified in the future to suppress stuttering without the use of SrCl2. This leads me to wonder how sensitive the system might be to input sample impurities. Stuttering might be anticipated to degrade accuracy on homopolymers, but the longest homopoiymers tested are only of length 4.
What might one hope for in the next publication? I'd really like to see a larger dataset, ideally from "real" DNA (prepared from a biological sample), but at a minimum a diverse set of sequences. Longer homopolymers would be a must; at least in my de novo sequencing work octameric runs are not uncommon. Such a dataset would enable starting to understand the real error rates, as well as offer the opportunity to identify new error modes. Understanding the real read length would be very interesting also. In theory reads should only be limited by the template length, but as Oxford has found there may well be other factors at play that cause these nanomachines to fail.
One other area unexplored in this paper is whether this system will have a duty cycle similar to Oxford, in which polymerases which have completed one template can be reloaded with another template and sequence that. This is particularly important for understanding overall productivity. The current system appears to be sequencing at a bit faster than 1 base per second; this speed will be a key determinant of throughput, particularly if the polymerases can recycle. Whether this speed is conservative or is largely constrained by the dynamics of reading the tags would be something else I would hope a future paper would explore.
Since Genia is now owned by Roche (purchased just under 2 years ago), the can be a bit patient in pushing the system forward, since there is no longer a concern of raising capital. Conversely, the field continues to move forward, and there is always a risk of waiting for the perfect system that is at that point obsolete. Roche/Genia has said their initial target is clinical sequencing, but that term is getting to be rather expansive. Perhaps they mean amplicon panels (which a rough guess at the throughput of a 200ish channel instrument running at 1 base per second per channel would be suited for), in which case really long reads aren't needed. However, I would hope they would test amplicons up to the 5-10kb or so limit of really robust PCR, since there may be amplicons enabled by such long reads that aren't out there yet as tests (or perhaps are being developed on PacBio or MinION).
Beyond the science, there is the question of intellectual property -- the patent space around nanopores is crowded and confusing. Illumina is currently suing Oxford Nanopore (which Oxford is fighting vocally). Oxford often touts a belief they hold a very strong patent portfolio across nanopore space. As a scientist and customer I'd prefer that the companies duke it out only with performance, but the reality is investors wouldn't pour money into these companies if they didn't think the companies could mark out exclusive territories.
You're write up is characteristically better than mine. :) I thought the data shown in figure 5 was quite nice, the data seems very clean, much more so than the convolved signal you get from strand systems. So that's nice, and gives some hope that the system could provide better data quality. But this is a really basic proof of concept, no doubt many other issues before it could be productionized. The take home message for me was "this basic approach works, but it's too early to say anything about data quality, however it has single base resolution which is nice".
ReplyDeleteFigure 6 uses a different set of conditions to get round stuttering, I also didn't find this very convincing, particularly without access to the raw data (grumble). They're obviously isn't much "scale" in this work as they say they are manually base calling everything.
One other statement I didn't understand was this: "The applied voltage is adjusted to ensure that, in a majority of cases, one and only one pore is inserted into the membranes of each well." which seems to imply that they have a way of breaking the poisson limit associated with other nanopore systems and stopping more than one pore inserting. I didn't think this was a known/common technique, so I'd be interested in literature on that.
Anyway, overall I thought it was interesting incremental progress. The system seems somewhat complex compared to other nanopore systems, and I would have thought that would make building a commercial platform challenging, but who knows? I'd guess a lot of it depends on the execution.
Keith, I'd be careful in comparing this to the "Genia" sequencer. This is work done at Columbia with Genia's support. Genia licensed the Columbia patent and this work was done on Columbia's proof of concept system. What the actual Genia sequencer is capable of given the time they've sat with Roche's money and the Columbia patent is anyone's guess.
ReplyDelete