Omics! Omics!: ONT Community Meeting 2021

Oxford Nanopore held their annual Community Meeting online at the beginning of this month. As is typical for this stage of the ONT news cycle, most topics were confirmations and updates of earlier projections, with little brand new material. There was one surprise, a new concept for running nanopore with little to no auxiliary lab equipment. Oh, and perhaps in the surprise category is Oxford appears to be finally moving away from the R9 pore which has been their mainstay for many years now.

I think the most online buzz is around ONT delivering on their promises to upgrade accuracy, with further progression still in the future. The Q20+ simplex kits are now available to everyone, and this chemistry is getting rave reviews from academic thought leaders who had early access. Oxford claims that the residual error in these reads is primarily random or homopolymers longer than 10 and that Q50 (1 error in 100K bases) consensus bacterial genomes are being routinely assembled. I haven't had a chance to independently confirm these claims and I would suggest that a great project for an energetic undergraduate would be to go hunting for systematic errors. Some cases to start with based on past history or that just look like potential trouble would be the immediate neighborhood of homopolymers and to look at other classes of simple repeats such as dinucleotide repeats or say tri- or higher repeats in the form XY* (e.g. CAACAA or TGGGTGGG).

In the future are the Q30 duplex kits, in which both strands of a double stranded molecule are read. Right now field users are getting about 10% of their reads as duplex reads, but Oxford is promising a near future update that will raise this to 40%. The biggest advantage over their long read competition is that both the Q20 and Q30 chemistry are fragment length independent, so visions of 100 kilobase reads with a few hundred random errors are quite enticing. The duplex scheme Clive describes looks like the "1D squared" effect of just catching the second strand without tethering it directly to the first strand; I'd need to dig through notes but I thought his London Calling talk described some sort of tethering (though perhaps they are around the same with a non-covalent tether between the 5' and 3' adapters?)

An interesting tweak on their Bonito basecaller is a version running in WebAssembly, so in theory you can run Bonito in your browser with all the computation occurring locally. I only say in theory because my attempts to do this on my PC have consistently yielded a frozen browser window stuck at "Configuring" and causing some degree of January molasses behavior to my other browser windows. I would love for this to work, though a pre-configured AWS machine image for Bonito and the other tools would be even better for large-scale re-calling.

Clive Brown also gave a bit of an update on the "outy" sequencing scheme under development, with the alluring promise of being able to select what to sequence based on both size and content, and then repeatedly reading the same DNA to achieve a desired accuracy. Curiously he didn't use the typical term for this back-and-forth pulling of a strand through a pore, which is "flossing" (though "shuttle" that Clive offered does have a nice weaving image). Outie is currently running at 200 bases per second. A new "re-reading" enzyme is in use that doesn't actually require "destalling" at the end of each shuttling -- though Clive was vague (probably deliberately) on some key details. Clive showed summary data for short fragments, showing single molecule accuracies of Q27 with 25 re-reads of the same fragment. Clive also made a plea to switch -- quite reasonably -- from talking about mean accuracies to variant calling accuracies.

ONT also highlighted advances in their methylation calling. Their new Remora caller, currently working only for 5-methylcytosine and 5-hydroxymethylcytosine, runs as a pass after basecalling instead of being its own basecaller. This is said to generate higher quality basecalls as well as higher quality methylation calls -- which ONT believes are actually more reliable than bisulfite calls. They also told an interesting story of a plant genome giving poor basecalls on native DNA but good ones on PCR amplified DNA -- retraining the basecaller for that genome proved that accuracy was possible and that methylation of the DNA had been fouling the calling initially. Not much was said about how context-independent these calls are -- of course the biggest interest is CpG calling and this anecdote suggests that there is a context dependence.

Clive Brown also announced that direct support of short reads will be supported, though he did this with all the enthusiasm of someone preparing their tax return. Mostly the tweaks are in software -- a bit of different signal processing and chunking the reads differently to deal with the large number of reads possible -- 250 million from PromethION. But his lukewarm endorsement of this new feature came via (and go to 10:00 in the talk if you doubt my transcription) "All it means is if you wanted to shear the hell out of the library, and look at short bits of DNA rather than long bits of DNA, that will be fully enabled in the New Year". Shortly thereafter he has a curious pair of statements "I'm going to talk about short fragment applications later" but "There probably are applications where that is important, counting applications for example". And indeed, he does later talk about a short read application without ever saying it needs short reads, which is most curious.

Which is pretty damn surprising that there is such apathy from the top around short reads, given that most users aren't "shredding the hell out" of their DNA for amusement but because somebody already did this for them. For example, Formalin Fixed, Paraffin Embedded (FFPE) samples are the standard for pathology samples because it preserves morphology and molecular content at ambient temperature for decades -- but by preserve molecular content I don't mean without damage, and FFPE is notorious for giving fragments only a few hundred kilobases long and people design their PCR amplicons for such material even shorter. Or I could point out microRNAs, which as the name suggests are quite tiny, on the order of 30-40 bases. Or forensic (or anthropologic samples), also shredded by the fact that murderers tend not to dunk their DNA into a preservative before leaving the crime scene. Or analyses such as ChIP which gain resolution with shorter fragments. Or ATAC-Seq and its cousins such as Cut-and-Tag or Cut-and-Run, where fragmenting the hell out of your sample is an inherent part of the method. Or of course cell-free DNA, which Clive did acknowledge is very short in the talk on Outy

It does appear that ONT is enamored enough with the latest R10 chemistry -- R10.4 -- that they will be looking to phase out the R9 chemistry. The huge upside for ONT would be simplifying their lineup and their logistics -- they've never excelled at clearly communicating which is for what and have been even worse with consistently delivering whichever is the less preferred chemistry. These are areas where ONT has never really excelled. I'm not directly involved in ordering consumables the way I was at Starbase, but I hear enough from my lab-side colleagues to be convinced little has improved. The online purchasing still has the strange "order on one site, track your orders and history on another" architecture, nobody tries to order just-in-time because lead times are so stochastic and one of my co-workers just acidly remarked about a customer service telephone number that is never answered.

Anyway, to step away from the kvetching there is one downside with R10.4 -- it runs only at 250 bases per second, about half the speed of R9.4.1 (450 bps). This cuts into yield and of course time-to-result. ONT is proposing that they may start playing with the running temperature - something I had on my list of things to try back at the MAP launch in June 2014 -- with a trade between speed/yield and accuracy. So if you want to pile up the data but can trade some accuracy, go for it -- but if you want highest accuracy run cool and slow. That to me seems like a wise path -- Oxford does much better with software than inventory management.

There were of course many other announcements. The two smaller PromethION instruments are being targeted for late Spring. The P2 runs two flowcells and is self-contained; the P2 solo lacks the compute (GPU) hardware but is ready to be docked to a GridION. PromethION is also getting newer "Marathon" chips with longer runtimes thanks to deeper wells holding more of the precious mediator compound.

Oxford is also moving more analysis into their own software. Remember, this is the company whose original MinKNOW launch did not include any tool to convert their proprietary FAST5 content to industry-standard FASTQ! There has long been the WIMP (What's In My Pot) metagenomic software, but now they are proposing that a complete solution for variant calling of barcoded samples will be a future releases of their platform

Another new product announcement is plans to replace the integrated MinION Mk1C unit, which has compute, networking and cellular capabilities, with a new version built around the iPad Pro. Not nearly as compact, but with two slots for laboratory gear. This Mk1D is slated for late next year.

I've skimmed over a lot -- there's development of pores with even longer tunnels to handle longer homopolymers for example. If I was really energetic I'd try to catalog all the things from previous LC and NCM that were invisible here -- but I'm not and I won't (sorry!).

Omics! Omics!

Monday, December 13, 2021

ONT Community Meeting 2021

No comments:

Google meta tag

Get new posts by email: