Tuesday, June 06, 2023

London Calling 2023

It was indeed wonderful to return to London Calling this year; I enjoyed the conference immensely.  The only real issue is that it is over so quickly; there were many people I wanted to connect with and failed to.  Maybe next year I should propose a meetup at a pub on Wednesday night (though unless I win the lottery - and I never play the lottery - I won't be crazy enough to offer the first round of drinks).

I'm going to focus on what ONT presented; I'm still behind on watching the presentations I missed due to simultaneous sessions or the fact there's always something interesting going on and one needs time to chat and eat.    Mostly this will focus on Clive Brown's talk - ONT went back to having Clive give a marathon presentation rather than split it amongst three senior executives.  Clive kept things lively -- first by listing the three things he wouldn't talk about (protein sequencing, Field Effect Transistor sensing and Outie chemistry) and later by making a claim and then remarking "Keith will correct me in his blog".  

Don't Wave A Red Flag and Expect Not To Get a Response!

The Direct Challenge

Okay, let's get the demanded -- as well as some additional -- nit picking squared away.  Clive's call-out was after claiming that ONT is the only sequencing vendor with complete control over their sensing; Clive parenthetically referenced others having to buy third-party cameras.  At one level this is mostly some corporate puffery which I might have ignored, but since the glove was thrown in my direction I must oblige.  And it is a bit of an interesting space historically.

PacBio in the Sequel/Revio generation has embedded CMOS light sensors in the flowcells; to me that is as much control over sensing as ONT has.  It certainly isn't a third-party camera.  That's the only real competition that this is definitely true.

Illumina's iSeq also uses optical sensing embedded in the flowcell.  But iSeq is a lonely, neglected branch of Illumina's product family, having received no updates since its original launch.  Commercially this probably makes sense, as iSeq is a low end device that probably makes Illumina very little money when it is used, and to be honest it is rare to see much evidence of it being used.

Another example would be Ion Torrent, which does actually still sell sequencing devices and consumables, though that is easy to forget since it has been years since they innovated on the actual sequencing sensing or chemistry.

Finally, there was Genapsys, another non-optical electronic sensing (like Ion Torrent) platform but one that never got any traction due to low throughput and to be honest poor execution -- it still blows me away that they never seemed to generate any SARS-CoV-2 sequence data when this would have been an easy way to generate good press.

While I'm Kibitzing...

I might as well get my other two hair splitting exercises out-of-the way now. 

Clive stated "we're the only ones to read native DNA" and "everybody else reads copies".  This is at best leveraging some semantic trickery and to me just incorrect -- while most systems do indeed read copies, PacBio generates the data in the process of making the copy - which is why it too can directly pick up base modifications.  And indeed, it potentially has an advantage over ONT in that it can read over a given modified base multiple times.

The other case was when Clive omitted some history.  In introducing duplex sequencing, he gave a historical overview and talked about the original duplex scheme that involved a hairpin adapter at one end of the sequence library fragment.  He said that ONT dropped this scheme because the second strand had poorer quality than the first strand due to effects of the hairpin -- but neglected to mention that there was also a PacBio patent lawsuit over this.  On the other hand, he also omitted any mention of the "1D Squared" duplex scheme -- I think many would like to understand how (or if) the current duplex chemistry differs from the 1D squared scheme -- but the truth is ONT is playing the biochemical details of their duplexing very close to the chest.

Okay, enough picking fights with ONT.  At least for now - I did actually make a figuratively incendiary comment in front of one ONT employee and there is another idea for a post that might not be received well either.  But no more of this poking today - let's dive into the big announcements

Core Chemistry & Informatics

The changeover to Kit 14 dominated the schedule of rollouts for this year (which was titled "the next 12 months" -- perhaps 7 months feels like 12 with the pace of change at ONT!).  In addition to various kits going out (and later this year, finally 384 barcodes for native ligation barcoding!) there were big announcements around duplexing and the key support software.

High Duplex Flowcell Launch

The biggest announcement around Duplex chemistry was that the "high duplex" flowcells would be available "today" under an early access mode -- so less warranty for users but for those eager to try them a boon.  Previously they were only available under a developer license.  These flowcells can deliver duplex rates of over 80%, and as Clive reminded the non-duplexed simplex data should still be dominated by Q20+ reads - with 100Gb duplex and 50Gb simplex per PromethION flowcell being hit internally (though that would seem to imply 66% duplexing, not 80+%).  Interestingly, ONT had a number of "naïve" users with minimal molecular biology experience try the kits and found that they too could get high duplex rates.  It wasn't clear from Clive's announcement which of the form factors (PromethION, MinION, Flongle) high duplex would be available in, though since "Flongle" was rarely said I think we can assume it won't be there.

Duplexing does reduce unique molecular yield, so it won't be ideal for all applications.  Clive did state "we do have a way to turn duplex off – but not releasing now". He also mentioned that some samples will have lower duplex yields due to blocking - basically pores being jammed - and ONT is working on a fix for this that may involve tweaking the buffer on either or both of the cis and trans side. One hypothesis is that secondary structure formation on the trans side, particularly G4-quadruplex structures, is the issue. I wasn't the only one a bit jarred by Clive saying that if the buffer fix is found that ONT would release it "without telling anyone" - for anyone with a production mindset unannounced changes are unwelcome.

In terms of software, with the High Accuracy Model (HAC) on PromethION the duplex base-calling can keep up with data production with 24 flowcells running.  This was another major announcement (more details below) - the current multi-step duplex informatics workflow will be replaced with a fully automated one.

One other interesting tidbit revealed by Clive - photons hitting the sensor may be causing undesirable effects.  Many of the ONT devices have very transparent, open designs - we may see more numerous opaque lids in the future (MinION, of course, has had one from the start). 

Ultralong duplexing at high yield remains "an unsolved research problem" - the challenge of getting high efficiency of adapting both fragment ends while not handing the DNA excessively.

Dorado / MinKNOW

One more kvetch (can I ever stop?): the version numbers for MinKNOW given on the release slides and in the Data For Breakfast talk on Dorado are completely incompatible.  I thought for a moment that the breakfast numbers were Dorado version numbers, but no, that can't be because there was a discussion in a Q&A about when Dorado would have a version numbered 1.0.

Anyways, there's two MinKNOW releases slated - one middle of summer and one in the autumn -- which will fold in increasing amounts of Dorado which in turn will displace Guppy.  Dorado basecalling will be in the first release and all the remaining Guppy functionality (such as adapter trimming and demultiplexing) will be replaced in the second one.  5kHz sampling will become the norm, generating Q32 data for duplexes.

Run-until will be enabled seemingly across-the-board - this is setting a yield limit for a run and having the sequencer automatically stop when that many gigabases have accumulated.  

Adaptive sampling will be enabled on the P2 instruments.  As an aside, finding information on ONT's website on items such as this is a descent into madness -- my incendiary comment was "the Community: burn it down!" to which a high use customer across the table nodded vigorous assent. 

Another very welcome feature upgraded is that all context 5-methylcytosine and 6-methyladenine modified base calling will become available in MinKNOW and also (I think) CpG-context 5-methylcytosine and 5-hydroxymethylcytosine calling.  So productionizing what had been very thinly documented research workflows.

As far as I can tell, there is not a separate stereo duplex modification calling -- of course anything like that gives up detecting hemimethylated sites but might boost accuracy.  On the other hand, ONT is reporting modified base accuracies in the mid to high 90 percent range, so how much more accuracy do you need?

Opening Up EPI2ME

Users will be able to run their Nextflow workflows on EPI2ME for free on a new version called EPI2ME ONE, free at least in the near term.  ONT is betting the cloud charges they incur can be made up later.

Skim-Seq

ONT unveiled a very clever use of adaptive sampling called Skim-Seq to generate short read sequences by just setting the adaptive sequencing to reject everything.  Since it takes about 400-500 bases to decide whether to reject, this generates short read coverage across a sample.  So it is first clever by not requiring a different library prep to get such short reads.  Skim-Seq is intended for applications such as genotyping-by-sequencing.  It was pointed out that here it is more important to get many short reads rather than the same total sequence data in long reads, as the odds of sampling both (for a diploid) alleles is greater.  It wasn't clear from the presentation whether there is any yield penalty for this scheme; whether adaptive costs yield is always a question on my mind  -- particularly since we saw a precipitous drop in our one adaptive experiment.  

But another clever bit is that one can create some regions to enable long read sequencing  -- and again, the library is really a long insert library.  So if there are some regions of particular interest, such as some SNPs you definitely want to see both alleles or a suspected structural variant, then those can be programmed to not be rejected.  And of course (more cleverness) if no amplification went into library preparation, then you can get methylation information too.  So methylation islands of interest might be another region to hammer a section by not rejecting that.

One can imagine even more complex with the pattern -- for example by layering on adaptive barcode balancing atop the specific sequence region selections. .  Indeed, I've suspected for a while that what will be needed is a complete Domain Specific Language (DSL) for programming adaptive sequencing schemes.  Such a language should be tightly coupled to the ONT API - at the moment, for example, it isn't clear (and I talked to perhaps the top expert on this) whether ONT supports using sequence read directionality -- e.g. I want reads originating from region X only in one orientation - as a criterion.

Homopolymer Bashing

Clive noted that fixing the homopolymer accuracy issue is something he would like to solve by year's end, and so after a bit of parking the idea of using base analogs to solve this ONT is working on it again.  The idea is to synthesize a new strand with analogs of the four bases doped in at some level so that long homopolymers would nearly inevitably have one or more analogs in them.  Since an appropriately trained basecaller can pick up all sorts of chemical structure differences (I am wondering whether it could tell fully deuterated bases apart from typical ones).  I do feel a certain compulsion to point out that when this idea was first publicly fronted, it was in this space and there was great pooh-poohing (but not by ONT!) that it could ever work.  Sadly, I don't see any way to extract a royalty on it - silly me for not patenting it! 

In the currently envisioned form as diagrammed in Clive's slides, the resynthesized DNA would be sequenced in simplex mode.  Perhaps even more powerful algorithms could leverage duplex sequencing where one strand is native DNA and one resynthesized, with the adapter sequences marking which is which.



Direct RNA

Direct RNA sequencing, which remains an ONT exclusive in the marketplace, is getting a major makeover.  Whereas in the past the direct RNA workflow used a different motor protein but the same flowcells, a dedicated RNA flowcell will be released with a pore tuned for this molecule.  Higher accuracy and a faster translocation speed (and therefore more data) of 125 bases per second (79% faster!) will be part of this planned August/September rollout.  RNA flowcells will be available in the MinION and PromethION formats.

ONT is training their RNA modification models on a variety of modifications, including pseudouridine and M1-pseudouridine used in many mRNA therapeutics to reduce immune response to the mRNA itself.

Hardware

P2

A lot of people have been excited to receive the P2 Solo instruments, but many teething pains have arisen.  

First, there's the terrible name - the Solo is the instrument that can't fly solo but requires a supporting computer; it's the "i" model that is "independent".  So solo isn't independent but independent isn't solo!  I have yet to run into a customer who likes the branding.

More critically, ONT confessed that early P2s shipped with a bad computer board, which they say has been replaced.  But I also heard from a customer that they had serious MinKNOW versioning conflicts when trying to run a P2 Solo off of a GridION.

On the plus side, many like the the stylish skins that Oxford has been offering for the P2.  ONT doesn't lack for fashion sense.

MinION Mk1C

The MinION Mk1C trades the boxy MinION MkI form for something more to the profile of a smartphone -- or perhaps a hand phaser?  With the iPad Pro case, Oxford plans to make this the leading edge of field sequencing -- and from what I hear the earlier Mk1B unit will not be missed (a key opinion leader described it forcefully in scatological terms). 
.


TurBOT

Oxford gave more details on TurBOT, a sample and library prep instrument they had started soliciting interest in previously.  While the design hasn't been nailed down yet -- only images were presented and no hardware on site - it would be some sort of conventional liquid handler capable of sample extraction, library preparation and finally would load a sequencer on the robot's deck.  This would be a MinION in early incarnations but with a P2 the ultimate version.  



Response seems mixed. One tweeter noted that it is not a good fit in throughput (what the engineering geeks call an "impedance mismatch") - the liquid handler will often be sitting idle while waiting for the sequencer to finish.  But ONT is showing a strong interest in industrial production markets such as fermentation or biologics QC and betting that in these areas a "fire-and-forget" mentality will rule and keeping everything constantly humming will not be a priority.  One other bit of support for less sophisticated users is coming - a choice on MinKNOW to run with a much simplified UI that doesn't offer as many options and will therefore greatly simplify training and documentation requirements in non-academic settings.


2024: Year of the New ASIC? 

Even before the pandemic, ONT was touting a new Application Specific Integrated Circuity (ASIC) design which would lower costs for making flowcells and enable new applications.  ASICs are the electronic gadgets that convert the tiny raw analog electrical signals into digital data to be processed further downstream.  After many years of development, new year ONT expects to finally start rolling out these ASICs.  They even had a sample -- somewhere -- but never could track it down myself. 


MinION Mk2 

One device ONT unveiled is the MinION Mk2, which will be a small device that plugs into a USB-C socket and has a vertical loading flowcell.   With low power and 10Gb output, this will be ideal for field applications.  Clive also claimed ONT can make these very cheaply - "Gordon is going to kill me for saying this".

It will be interesting to see how ONT ends up pricing these.  Could the Mk1C Mk2  end up killing off Flongle, for example, by offering higher throughput at too close a pricepoint per gigabase -- and with ONT's ability to stop and reuse flowcells, that could be a serious constraint.  Or perhaps the Mk1C Mk2 ] flowcell won't allow reuse?

Grongle

Just has MinION begat GridION -- same flowcell but more per instrument -- Mk2 is projected to have a companion called Grongle.  Except instead of a big box, it looks more like a USB hub and will sport eight not five flowcell slots.

TraxION

Clive unveiled a different concept for a fully integrated "load-biomass-and-go" instrument besides TurBOT.  TraxION would essentially be a VolTRAX with an embedded MinION Mk2.  ONT also envisions putting the reagents on a card, so in theory a person will very little training could run this device.  While that's a great vision, there is a real risk in pulling it off -- VolTRAX has taken over a half a decade to develop to its current state and its hard to detect much in the way of paying customer enthusiasm for it.




MinION Mk2 family. Top: Flowcell L-R: TraxION, MinION Mk2, Grongle (Photo from an Oxford Nanopore tweet)


MinION Flowcell Redux

MinION flowcells will be made-over with the new ASIC.  Current MinION flowcells have four muxes - each electrical sensor has four different pores it could monitor and during a run there are periodic changes as to which mux echelon is being used.  The new design will have only two muxes, but just as many total pores per flowcell, so the data will arrive twice as quickly.  The cheaper ASIC can mean either ONT gets a higher margin on each flowcell or they can price them even lower (or some split between these).  

New ASIC No Shows

There was no mention of the new ASICs coming to PromethION format cells.  Nor was a 96-well format flowcells (aka Plongle) discussed.  Doesn't mean these won't happen -- just they are clearly not in the first wave.

More Speed in 2024?

While 2023 was devoted to consolidating the wetware one Kit 14 (albeit now with direct RNA going off on its own), don't get too comfortable with it.  ONT has an aggressive protein engineering effort on pores and motors and Clive suggested that higher translocation speeds might be around the corner.  Of course, this has been an expressed desire for a long time, but with the many foundational platform improvements -- 5Khz sampling, POD5 data format and a very well oiled experimental and computational machine for training basecalling models - perhaps in the next year or so ONT can make higher speeds very real.  Clive spoke of 1Kb/s which could yield 2X the data per flowcell, but even a 50% improvement could be very attractive -- though as always a chemistry change will be disruptive for a growing number of large studies, diagnostic users and industrial users.

Singapore & Houston

ONT is expanding their regular set of meetings with a September meeting in Singapore.  I am sure Nanopore fans in Asia and Oceania will be welcoming having a more geographically and time-friendly meeting and I'm happy for them -- but dreading the effects on me if ONT schedules a big announcement late in the day local time -- Singapore is exactly half a day shifted from my local time zone.

This year's US meeting in December will be in Houston.  You'll need a car or arrange a tour, but if you  have time Johnson Space Center is a serious technogeek destination and Galveston is also very interesting with the sidewalks you must ascend to on steps. 


[2023-06-07 10:29 EDT -- corrected the two Mk1C when I meant Mk2 errors -- thanks Eyesgack for catching this]

12 comments:

Eyesgack said...

Excellent write-up as always. Just a heads up, I think you might have typed Mk1C where you meant Mk1D or Mk2 in a couple of places.

Anonymous said...

When it comes to puffery there was also some mumbling about the ASIC costing them $200M to develop... or at least that is what it sounded like on the recording. Take a zero off, cut in half and you are starting to get closer to the truth.

Anonymous said...

I wonder if the other sensors you mention are designed from the ground up or adapted off the shelf chips. I’m not sure its the same.

Anonymous said...

I think they have several ASICS in or product and/or under development.

Anonymous said...

ONT have invalidated all of pacbs prosecuted patents, the ones you anre referring to, and won their cases against PacBio patent claims including at the ITC. He didn’t mention all that either.

Keith Robison said...

WRT to the PacBio patents, the situation is more complex than the commenter has suggested.

In Europe the two companies agreed to a settlement, with ONT agreeing to not sell the "2D" duplex method until after 2023 GenomeWeb article on European settlement

In the US ONT did prevail -- but not until 2019 (GenomeWeb) -- and ONT discontinued the kits in 2017 while the outcome was still in doubt (see my post then)

So I stand by my comment - the PacBio patent situation certainly contributed to ONT discontinuing the 2D kits.

Anonymous said...

https://nanoporetech.com/about-us/news/jury-invalidates-pacbios-patents-us-patent-trial-against-oxford-nanopore

And yet they haven’t brought it back, perhaps it never performed. I also see they have their own hairpin patents now granted.

Anonymous said...

Arguing PacBio isn’t measuring a copy is a stretch to say the least.

Anonymous said...

I think, from memory, Keith, their current ‘simplex’ data is looking better than their old ‘2D’ data no ? Something they’ve accomplished since 2017.

Keith Robison said...

Yes, I believe the current Kit14 Q20+ simplex data is much better than the old 2D duplex data - ONT's ability to improve the basecalling via protein engineering and model training cannot be described as anything except amazing

PacBio's signal originates in the copying process itself - they are not measuring from copied DNA and it is silly to talk about it in that way. And most importantly for balancing ONT's hype on this area, that means they can also measure DNA modifications

Anonymous said...

I must admit I haven't watched his talk but was it in relation to RNA? As with RNA it is certainly true, as all other is just cDNA sequencing. Why people will bother with cDNA soon (and now) is a mystery to me when real RNA sequencing is now here?

Keith Robison said...

While Direct RNA sequencing is exciting, don't throw out your cDNA protocols just yet!

1. Direct RNA doesn't yet have barcoding; 1 sample per flowcell (or even washing flowcells) just is impractical for big projects
2. Direct RNA can't use amplification; unsuitable for small input amounts
3. Direct RNA is unsuitable for single cell and spatial sequencing
4. Direct RNA has about 1/4 the translocation speed and therefore 1/4 the throughput.