Thursday, March 02, 2017

Catching Up On Oxford Nanopore News: More, Better, Meth & Huge

Oxford Nanopore and its collaborators have shown at least three interesting advances in the last few months which I haven't yet covered; the most astounding of which was announced this week.  I'll take these three in an order which works logically for me, though it isn't strictly chronological plus I'll touch on some parts of their platform which have not made advances which were perhaps expected.

(Morning after: Ugh, ugh, ugh -- I misread an axis, inserting an extra 0 -- so major crossouts in one section; why I shouldn't post late at night during pauses in day job stuff)

MinKNOW Upgrade Doubles Performance

The first bit of news was simply a software update, yet one which roughly doubled performance of MinION flowcells.  Apparently a significant drag on pore lifetime was adapter complexes which would jam in the pore and choke it, preventing that pore from being used again.  In effect MinKNOW performs a nano-Heimlich maneuver: by detecting this situation and reversing the voltage (ala read-until), the problem complexes could be ejected and allow the pore to continue being productive.

Immediately some of the most experienced labs started reporting flowcell yields over 10Gb (I think 12 is the public record), whereas Clive Brown from ONT has shown yields in excess of 20Gb.  Clive's plots also show flowcell lifespans going out to 72 hours, rather than the currently programmed 48, although the yield flattens significantly by about 60 hours.

Consider that throughput range, either the "low" end of 12 gigabases or the high end of 24 gigabases.  MiSeq is rated for 13-15Gb  in 2x300 mode, the top output mode (when the kits are good).  Ion Proton is rated for 10Gb with the P1 chip.  So the highest output desktop sequencer is now MinION!  PacBio Sequel, which also delivers long reads is rated at 8Gb per flowcell.  Of course, there are vast differences in the data quality between these instruments, but there are certainly a number of applications in which the MinION data from the current basecallers is sufficient. I'm also avoiding an explicit cost comparison, as that brings in a lot of complex variables (such as volume discounts from ONT) but these are all in the same rough range -- or perhaps MinION is ahead.  If you buy in bulk, then 1 library on 1 flowcell works out to about $600.

Also just consider the amount of data here: 10Gb is enough for de novo assembly of Drosophila! 20Gb would be enough for Fugu!  An awful lot of genomes can now be sequenced de novo on a single MinION flowcell, albeit with the caveat on final data quality.

All of these improvements should carry over to the PromethION, but as far as I can tell no field site has yet generated data.  Oxford remains bullish on that instrument (as opposed to my less optimistic viewpoint).  If PromethION can really launch with 4X the performance of MinION, then a genome resequencing of human should just about squeeze onto a single flowcell, and Clive has suggested on Twitter that Oxford can flatten coverage variation with read-until and therefore require less average coverage to fully call all variants.

The increased performance, as Clive has pointed out, also has some interesting side-effects.  If someone was happy for their application with flowcell performance before these changes, then MinION is now overshooting.  That means these applications can tolerate more variability in input yield.  It also means that aging flowcells can be better tolerated.  I don't know of any flowcell stability data out there, but my anecdotal observation from some we have (rest of experiment didn't gel in sync with deliveries) is that performance does degrade over time.

Improved performance also suggests that the Flongle and SmidgION could fit an awful lot of applications.  Slated to have about half the number of pores as MinION, the software update means that SmidgION should perform about as MinION did before the update!

First Peek At Scrappie and 1D^2 Basecalls

At the December users' meeting in New York City, Clive mentioned that yet another basecaller, perhaps the third in a year, was under development.  Called Scrappie, this caller is angling to improve on ONT's persistent problem with homopolymers.  Scrappie, in combination with a new double-stranded scheme called 1D^2, appears to be capable of driving error rates down to about 1 in 500

In Jared Simpson's AGBT presentation he showed an example of Scrappie correctly calling a homopolymer.  Oxford will likely show more such data at a mid-March Clive Brown web update and at May's London Calling meeting.  No compute performance estimates out there for Scrappie yet nor a detailed analysis of how much it can improve reads, but it bears watching.

Methylation Update

I didn't include this in counting news items, but Jared's talk did show methylation calling in action, including haplotype-aware methylation calling to show differential methylation by allele. Two methylation detection papers just published in Nature Methods, one from Simpson and Winston Timp and colleagues and the other from the UC Santa Cruz group.

The Huge News from Josh Quick

 Okay, now to the stunner.  I've related before how our first, generally unimpressive MinION run stunned me by spitting out a 48Kb complete covering read from lambda phage.  Clive Brown has long claimed that read lengths were limited by the library fragment lengths.  The longest reads to date were a bit short of half a megabase and quite rare.

Josh Quick (aka @Scalene) tweeted this week an image showing a significant yield of reads in the 50-100Kb range.  of half a megabase (the rescaled plot by Matt Loose shows this better), with the longest reads just shy of 3/4 of a megabase!  Just remember, the second bacterial genome ever sequenced is shorter than many of the reads on that plot!    AM correction: Okay, here's the big correction -- I somehow read an extra 0 on the X-axis.  I have terrible eyesight (my glasses go to 11), but no excuses.  So Josh has a great amount of long reads, but no monsters.

Now, the yields aren't great, but this plot should raise the hairs on the backs of necks at mapping companies such as BioNano Genomics, which just released a new instrument, Saphyr.  Now the yields aren't high enough in the MinION run to threaten Saphyr, which can generate a human genome map a day, but it points at that possibility.  Sequencing always yields richer data than just mapping, and if that data can come for the same price or less (particularly considering the difference in equipment prices), all the better.  AM: Okay, this isn't really encroaching on BioNano (at this time).  Can't help but wonder if even longer tagged fragments exist in this sample but getting them off the rod might not have been successful.

How did Josh do it?  The challenge with really long DNA is it tends to shear in solution, particularly during pipetting -- at the molecular scale even the most gentle pipetting generates significant shear forces.  So Josh ran a standard phenol-chloroform prep of E.coli, which ends with spooling the goopy DNA onto a rod to get it out of an ethanol precipitation.  That's great fun -- I did this back at Delaware once or twice, though it's also a long and tedious procedure once the novelty wears off.  So Josh took the spooled DNA on the rod and ran the rapid 1D (transposase) prep on the rod.  At the end of the prep, the still highly viscous DNA could be dripped into the flowcell.

It's interesting to contemplate these in context of some of the sample/library preparation automation which Oxford has proposed.  VolTRAX might be particularly suitable to manipulating ultra-high molecular weight DNA, since very little shear should be imposed by short movements on the electrowetting surface.  VolTRAX videos have been popping up on Twitter that resemble games, but as far as I've heard no actual libraries have yet been prepared.  The Zumbador integrated sample+library prep device might also work well for preparing high molecular weight DNA, though the extreme viscosity of such DNA (nearly universally described as "like snot") could present interesting challenges for such a device.

Higher yields. Insane read lengths.  Better data quality likely in the near future.  And that doesn't even include ONT's periodic discussion of increasing the pore speeds by another factor of up to 2X.  It's now five years since Oxford wowed AGBT, which was followed by two years of quiet and the initial bumpy launch.  That has led many cynics to ignore Oxford's rapid progress in the last two years.  While the pace is uneven and sometimes dizzying and disorienting, the reality is that MinION is  a very powerful platform for genomic analysis.   Undoubtedly the March update and London Calling will see more vision -- but also more progress -- on expanding the capabilities of this pocket-sized sequencer.

14 comments: said...

3/4th of a megabase? For a moment, I thought I had to take my forecast back -

Speaking of throughput, Sequel will catch up by end of this year, if not earlier. It is just that Oxford does R&D in public and Pacbio releases after doing R&D.

Anonymous said...

To the user above me.

It's hard to compare throughput because a minion takes about 60 hours to get ~15GB while Sequel takes 6 hours to get ~8GB. I'm also skeptical on the throughput numbers from the minion. What do they mean by estimated throughput? Why not plot the real throughput number? ONT also has a history of handpicking the best data points that customers rarely can reproduce. There have been a handful of 10-15GB/Sequel cells runs out there but Pacbio doesn't handpick those and claim that's what the throughput is.

The 1D accuracy of the minion is still lower than Pacbio's. ONT has a lot of work to do to catch up to the data quality Pacbio can provide. And will they be able to improve fast enough to counter the new Pacbio chip that provides 150GB/cell and a runtime of 10-20 hours?

Clive Brown said...

Ill answer this question, coz its reasonable.

" What do they mean by estimated throughput? Why not plot the real throughput number?"

the control software estimates the base count number, based on a quick analysis of the read in 'event space'. The estimate is very accurate. The full on base caller can be calling reads and catches up later. So a post run completion basecall count can also be provided (it does also depend on which base caller is used), but as the run is proceeding the online estimate is usually considered accurate.

Clive Brown said...

Ill answer this question, coz its reasonable.

" What do they mean by estimated throughput? Why not plot the real throughput number?"

the control software estimates the base count number, based on a quick analysis of the read in 'event space'. The estimate is very accurate. The full on base caller can be calling reads and catches up later. So a post run completion basecall count can also be provided (it does also depend on which base caller is used), but as the run is proceeding the online estimate is usually considered accurate.

David Eccles said...

The longest mapped read that I've seen from that run was actually over 750kb, it's just that the graph displayed in MinKNOW doesn't go up that far.

MinION will [probably] not get to 150GB per cell unless there's a change in the flow cell structure. It has 512 channels, so needs about 2000 bases/second throughput to get to that amount in 72 hours. That's not completely out of the realm of possibility, but I expect that solid state pore flow cells or de-muxed flow cells will appear before that throughput is achieved. The current electrical sensor that ONT uses on their flow cells runs at 4kHz, but it is rated at something closer to 40kHz, so there's still plenty of room at the bottom.

The PromethION flow cells with their 4096 channels (and no MUX) should have fewer problems getting to 100GB+ per flow cell, even at 500 bases/second and a 48-hour run (at 28% of the theoretical maximum yield).

I think the main problem will be in sorting out how to store the data at the other end, not in the limitations of ONT's technology.

Anonymous said...

So it looks like the battle will be between a 8M ZMW Sequel chip and the PromethION. I believe Pacbio has the advantage in data accuracy and a broader set of proven applications. Pacbio also has much shorter run times while producing the same amount of data.

Fundamentally the scalability of ZMW's is much easier to do than nanopores. That will change once people can fabricate solid state pores. But that is years out.

Clive Brown said...

"so it looks"

sadly your analysis just isn't correct. a MinION is currently outperforming a Sequel, and more than one run in parallel easily trounces it, even if the sample prep isn't good. A PromethION will outperform a NovaSeq. Often overlooked isn't the number of features on the sensing chip, but the number of bases measured per feature per unit time. Large optical arrays tend to be slower to measure - albeit on more features, but rapidly sampling smaller nanopore arrays, can measure 1000 bases per second per feature, contrast with a 1 base per 20 seconds or even every 3 minutes on the best optical arrays.

1000 * 512 * 60 * 60 * 24. MinION bases per day max
1000 * 3000 * 48 * 60 * 60 * 24 PromethION bases per day max.

Clive Brown said...

"so it looks"

further, the better Sequel runs appeared to take 10hrs per cell, not 6. Most of that his hidden from end-users by service providers, also hiding the failed runs, filtering and replacement for failed reagent assistance from the machine vendor. However, all of those tend to get rolled into the noticeably long turn around times for customer samples, where a sent sample can take several weeks to get run and returned to user, regardless of the per sample run time.

David Eccles said...

> Fundamentally the scalability of ZMW's is much easier to do than nanopores.

... how?

Currently ONT is "scaling" their nanopores by making software and chemistry changes; the hardware component remains the same. You can't get much easier than a software fix.

ONT have already created a few other sequencing matrix patterns (e.g. SmidgION, PromethION flow cells), but are putting effort into getting the best performance out of the flow cells that are being purchased and used.

Anonymous said...


In terms of raw basis then yes the MinION can produce more data than a Sequel cell. But these runs are 60 hours long. Is your average customer getting >15GB/MinION or are these preliminary results?

Aside from pure raw data, the MinION data quality doesnt match Pacbio's. Hence why people use Pacbio sequencing cells for their projects over MinIONs. So I dont think your statement saying the MinION is outperforming a Sequel is fair.

The Sequel chip took all the complex optics from the RS and integrated it onto the sequencing chip. The sequencing chip is built on a camera with millions of pixels. To scale the sequel chip all they need to do is fabricate more ZMW's on a new camera sensor. This is why Pacbio can go from 1M ZMW chip to an 8M ZMW chip in 2 years (as oppose to the 6 years it took to go from 150k>1M). In addition to increasing the ZMW count Pacbio also doubles the readlength every year on their platform. Hence where the 32x throughput by 2018 comes from.

Like you said most of the throughput increases are coming from software and chemistry upgrades and not by increased the number of nanopores. That is difficult to do. We sure won't see a 1M nanopore chip anytime soon.

Clive Brown said...

"In terms of raw"

I don't think you've taken on board the points I've made previously Anonymous - and whilst internal runs beat customer runs, for now, typically customer runs catch up. Remember these are diverse users, diverse sample types, not people running core facility pipelines. You overlook the point that for most people sending a sample to a core, on PacB, has multi week turn around time, regardless of the instrument time, and you're ignoring the point that its still cheaper to run multiple MinIONs in parallel. Also ignoring that MinION thruput is likely to at least double (so 20G/day).

Again, you ignore the point that number of nanopores isn't key, as each nanopore can sequence a molecule at upto 1000 bases per second. Then a fresh molecule is loaded and so on. On the PacB system 1 molecule is bound to 1 feature, and when that molecule is done,, the run is done. So number of molecules == number of features, so you need a lot of features and longer molecules to up the numbers. No such constraint on nanopore. Equally, I think its 1-3 bases per second on the optical systems so far far slower and, yes, less real time. So simply comparing number of features, whilst ignoring sensing speed per feature, and the ability to reload features (sensors) - shows you haven't grasped the fundamentals of how these systems work.

Actually we showed a while back our work on a 1 million sensor FET nanopore system, and that would sequence a Gigabase per second.

Hope that is helpful. Theres a wealth of online information, talks and slides, that explore these points in much more detail.

Anonymous said...

Is Pacbio now also going after 1D^2 reads with their newly added patent?

U.S. Patent No. 9,542,527

They added this patent to their lawsuit case with ONT. I'm no expert in this area so I can't tell if this patent is broad enough to conflict with the 1D^2 method.

Anonymous said...

-- I'm no expert in this area so I can't tell if this patent is broad enough to conflict with the 1D^2 method.

No it isn't. It's broadly similar to their previous ip.

David Eccles said...

That patent has a hairpin loop in its stated claims. It therefore doesn't apply to the 1D² basecalling.

It could possibly be argued that regardless of whether or not the hairpin is included, ONT's method doesn't overlap with this patent because the electrical trace is produced by the MinION sequencing device, rather than the nucleotide sequence. This particular patent doesn't discuss an intermediate step involving data in an alternate form that is dissimilar from the nucleotide model. Even if it did, the conversion from the electrical signal to the base call for the MinION is always carried out in software using a general-purpose computer.