Friday, December 18, 2020

Nanopore Community Meeting: Progress Despite the Pandemic

I realized a few Oxford Nanopore announcements too late that I should have tried to log all their predictions with a date made so I could track carefully any delays or quiet disappearances from the new feature lineup.  If I had done that, this year would have presented an even worse conundrum: how do you score progress in a year of constant disruptions?  Like many companies in the sequencing field, at least some of that disruption has been a diversion of attention and resources to fighting the pandemic.  For ONT that is largely supporting the ARTIC viral genome sequencing and also developing LamPORE diagnostics.

Two weeks ago Oxford Nanopore held their Community Meeting online.  I'm not going to try to cover every detail and announcement -- indeed just putting this together got trapped in a cycle of small increments and long procrastinations.  But there were some notable developments in the continuing evolution of this platform which I would be remiss not to report and opine upon.  I've extracted most of this from the talks by Sissel Juul, James Clarke, Stuart Reid and Rosemary Dokos, but I've rearranged things a bit to fit my purposes.  I did make the mistake of not paying enough attention to the main Q&A session on the Clarke/Reid/Dokos presentation and that doesn't seem to be in the recording.

Accuracy

Stu Reid spent a good chunk of his section on further developments in the base accuracy space.  The Bonito basecaller announced a year ago will become standard  in MinKNOW in February and now has an R10.3 model.  On R9 chemistry it is hitting 98.3% modal accuracy, a statistic ONT favors and isn't a terrible one since it defines the peak, but as you can see from the graph below (from the talk) there's still a lot of variation and there is no shortage of reads that is far, far worse.  If I had a student, I'd probably have them go look at how predictive the Q-scores in Bonito are as to where you are on this curve; maybe something to play with during the doldrums around Christmas

The R10.3 model is comparable though showing a little worse here.
A very interesting plot that Reid showed plotted Bonito accuracy distributions vs. the complexity of the neural network being used for basecalling.  So basecalling that is faster on less beefy GPUs, one can give up a known degree of accuracy.  I'll circle back to this later



For methylation, ONT is sticking with the flip-flop caller.  Interestingly, R10 is reported to be even better than R9 for calling methylation,  


Reid also pointed out that the system can differentiate 5-methylcytosine from 5-hydroxymethylcytosine

Reid also discussed work on consensus accuracy.  The paired model they had developed for 1D^2 can be generalized to more copies and so can be used to generate highly accurate consensus from schemes using Rolling Circle Amplification or Unique Molecular Identifiers to generate many copies of the same insert.

Reid ended this section with a teaser that ONT has a new chemistry and basecaller which will hit Q20 -- 1% error.  Early access to this is expected to be rolled out early in the new year.

After the meeting, Clive Brown has been tweeting out plots suggesting that the best basecallers with R10.3 can achieve better than 1% error rates.


Adaptive Sampling

A pair of publications (long available as preprints) showed up just before the conference on Adaptive Sampling, the ability of Nanopore to "taste" sequences and then reject them.  UNCALLED looks at the signal directly whereas ReadFish leverages fast nanopore basecalling; both allow defining target regions to either enrich or deplete from the data.

Adaptive Sampling for target enrichment / depletion is now fully integrated with MinKNOW for GridION and will soon be available on the MinION Mk1C; those of us with the "classic" Mk1Bs can perform Adaptive Sampling on  Linux and soon Windows, with Mac under development.  PromethION Adaptive Sequencing is also in development.

There wasn't much talk of Adaptive Sampling for barcode normalization, though after I pinged Matt Loose about this (his group developed ReadFish) he pointed out this isn't so simple.  It's not the Adaptive Sampling part that's hard, it's deciding what the rules are!  I was thinking of a couple of recent experiments in which some barcodes came up short -- but how early in the process would I want the system to intervene and how aggressively?  Should it go through phases where it accepts only underrepresented barcodes?  There's also the payoff question -- one of these libraries had amplicon inserts only about 4-5 seconds long, but on the other hand the other set was a beautiful run of genomic libraries with a read N50s mostly well above 30 kilobases and stretching beyond 100 kilobases, so it had plenty of inserts which would take over two minutes to traverse a pore -- that could have benefited from balancing.  Since that was de novo assembly, perhaps a good rule is once 50X coverage of the target genome was reached for a barcode that would be rejected -- well, until everybody had 50X coverage and then probably a new target.  That sort of rule writing isn't simple!

Kit 10

The LSK110 kits hit the market a number of weeks ago.  These have the low idle motor proteins so they burn little if any ATP fuel when not sequencing.  A cDNA version is in the works; I don't recall hearing anything about a rapid version of this kit and don't see one in the store.

Ultralong Kit

ONT has released an Ultralong kit which carefully tunes the buffers for the transposase kit as well as uses a number of Circulomics products to deliver crazy long libraries.  A new world record read of over 4 megabases has been obtained this way -- over 2% of the human chromosome it maps to! --  and the claim is that (with appropriate input DNA, of course) libraries with read N50s of over 100 kilobases are now routine and 10s to 100s of megabase reads can be seen.  Combining this with Cas9 capture enabled pulling down a 533 kilobase read!

One catch with the long reads is that they reduce yield, particularly if you wish to run the flowcells without intervention.  Daily nuclease flushes, which require reloading library, are recommended for even libraries over about 20kb and more frequently with ultralong reads.

PromethION Flowcell Improvements

Clarke discussed a number of improvements in PromethION flowcell manufacturing that have been put in place or will be in the near future.  Some of these would appear to translate to the other devices, though he focused on PromethION

A key difference is a change in the membrane forming chemistry which nearly eliminates failed membrane spots.  A new pore insertion process gives 20-25% more pores.  Clarke asserted that pore failure is now nearly entirely blocking of the pore or depletion of mediator. With the new devices ONT has improved  yield by 25% on PromethION versus runs this past summer! Ten terabytes of data from 48 PromethION cells with 80 hour runs, though most data piles in the first 60 hours.

Combining the new flowcells and the ultralong kit has yielded 90 gigabases of data -- 30 X human genomes with read N50s of 100 kilobases!

Further performance improvements are promised.  The gasket sealing the current devices blocks channels on the edge; repositioning this may free as many as 10% more channels for use.  Increasing the mediator supply by either higher concentrations or engineering larger spaces for it are planned as well, which would enhance maximum run time

Voltage Sensing

Clarke discussed progress ONT has made on switching from current sensing to voltage sensing.  Because the voltage signal drops off more rapidly with distance, ONT can envision packing the channels in a much tighter array.  Eye popping possibilities were put on the screen (and have in the past): Flongle footprint flowcells with channel numbers of the current PromethION, MinION footprint flowcells churning out 3.9 Terabases per day and PromethION footprint devices with ten fold higher.

Thinking back to the basecalling work that Reid discussed, as well as a comment in the Q&A that PromethION can't high accuracy basecall all cells at once (indeed, can apparently keep up with only a few in that mode), one can sense the challenge that may have ONT wincing in the future as though they are a victim of orbisculation.

The tricky part is clearly going to be basecalling, assuming that just getting the torrents of ionic data off the chips doesn't thwart them the way them the way it frustrated the scaling of Ion Torrent.  Will you need an expensive mongo GPU in your MinION -- maybe the GPU alone won't fit in your pocket -- to keep up with that mega-dense flowcell?  Would a PromethION with 48 of these monsters be able to keep up with anything more than very low accuracy basecalling?  That wouldn't necessarily kill the viability -- there are definitely many applications out there for lots of noisy data.  But it would be a potential brake on many other applications where high quality is essential.  This is the double-edged sword of outpacing Moore's Law -- it's exciting and amazing but at some point you truly get out in front of computing technology.  Of course, there might be other solutions -- Field Programmable Gate Array solutions such as Google's Tensor Processing Units might replace GPUs as they trade higher compute power by giving up flexibility even more so than GPUs vs. CPUs.

Flongle Production

ONT says they are meeting demand for R9.4.1 Flongle flowcells now and believe they can continue to match growth there.  Clarke showed stills from a new process that replaces silicon manufacturing techniques such as lithography to a molding based process, which will enable higher throughput and much lower costs.   R10.3 Flongles will ship imminently, closing a gap in the product line.

Automation

Dokos mentioned two ONT forays into running upstream preparation on liquid handling robots.  Kit 9 ligation chemistry and the ARTIC SARS-CoV-2 PCR into ligation have now both been automated on Hamilton instruments, and they are working with OpenTrons  to get LamPORE onto that low cost, entry level platform.

VolTRAX continues to develop, with magnetic beads and PCR now available and soon to be accessible via an API.  ONT has enabled VolTRAX for the complete ARTIC SARS-CoV-2 sample-to-library protocol for 3 samples plus a control per run.

Miscellaneous


London Calling 2021 will be virtual, a prudent choice given that full vaccination of the population who can and will take it will probably still be very much underway in May. 



Rosemary Dokos also emphasized the overall ONT value proposition: low capital investment devices and flat, transparent consumable pricing with the performance of the platform improving at a constant clip. That is in stark contrast to the typical equipment sales model: expensive boxes (with service contracts to match in later years) and regular price lifts in the consumables, with most performance improvement requiring new purchases to start the cycle over again.

Some Final Punditry

So a lot of steady movement on a number of fronts and promise of some huge changes in the future.  The overhang of the pandemic, even with vaccines actually being administered now, will probably extend through the London Calling online event.  Some of the bold promises may still have large asterisks; the basecalling accuracy improvements are impressive but I doubt, for example, that anyone is going to stop polishing their de novo Nanopore assemblies with short reads.  Disagree?  Prove me wrong! -- I'd be happy to eat crow publicly.

It still feels downright strange to talk about megabase reads, let alone four megabase reads.  The increasing ability to capture huge fragments, phase them and score them for methylation will undoubtedly see growing use and it is hard to believe that some fascinating biology won't result from this.  

For London Calling I might be tempted to have a preview piece which might well include a tally of what we haven't heard about this time or the previous London Calling.

2 comments:

Anonymous said...

Interesting summary. Since you recently did a post on pacbio, who do you think out of the two is winning?

Hugh Mayne said...

Interesting shift from “the reads are bad” to “there are still bad reads in the tail”. My ILMN experience has always been to use the PF filtered reads on which headline error rates are calculated, a good % is discarded. Nanopore dont discard, but could filter away the tail losing a %of reads, then the mean would tend towards the mode. I understand pacB also filter reads on the platform.