Now to run through the long term technical projections from Oxford Nanopore's Tech Talk. Well, projection is probably not the correct word - hints and teasers might be closer to the mark. None of these pronouncements came with any hint of timing for when the community might be able to access them - indeed, CSO Lakmal Jayasinghe made it very clear that this section would have no timing information. Here is a link to the presentation: what is covered in this piece starts around the 44:40 mark.
For new flowcells, there was quick mention that ONT is looking at serving both lower data requirements and greater ones than are currently available. The lower requirement slide did have something that looked like a MinION flowcell but not the current one; perhaps AI prompted by "make it look the same but different"? A nominal X-axis suggested that small devices might have 100s of channels (though MinION being plotted at thousands must be multiplying by the mux factor) and future high end devices with possible 100s of thousands of channels. Jayasinghe declared that all of these future devices would be designed with "scalability, manufacturability, and automation in mind". I'm of course particularly happy about that last bullet point, since my desire to have a sequencer fully integrated into our autonomous laboratory platform is quickly sailing in the Moby Dick direction.
There was no direct talk or slide images of the previously highly-touted next-generation, low cost ASICs or the devices that had been previously envisioned using them, such as sequencers that looked more like USB photo card readers than laboratory devices. Are these in long term hibernation or in a permanent sleep?
Engineering New Pores & Motors
Much of the presentation had familiar looking plots of performance characteristics of new pores emerging from ONT's protein engineering group.
New pore with lower accuracy, enabling better Q-scores and lower indel rates, enabling higher accuracy assembly and variant calls at lower coverages. This is attributed to lower noise from the pore, making the basecalling model's job easier. Mark Bruce spoke of ongoing efforts to engineer in both better run performance and better shelf life.
Faster motor proteins would mean greater output and faster time-to-result in time-critical applications. Bruce spoke of R&D DNA motors which improve on the current 400 bases per second by running around 800, 900 or even in excess of 1000 bases per second. One candidate motor generated 250 gigabases of data from a single PromethION flowcell, with 100 gigabases coming in the first 24 hours.
I suppose the big challenge is combining all of these into one optimally running package. A pore excelling on one metric might lag on another. Bruce emphasized that overall performance comes from a complex orchestration of sample prep, motor protein, pore, membrane tether, running buffer, and other components, which must not only provide speed and accuracy but also minimize blocking events.
Direct RNA has an even starker need for faster pores; the slow speed of the current pore severely crimps output - something I mentioned in the prior post in the context of the new direct RNA barcoding kit. RNA motors with much faster speeds have also been developed, some five times faster than the one in current direct RNA kits.
Improved Sensitivity
After a long pause, ONT is again talking about increasing the sensitivity of the device. The grand vision would be that every library molecule applied to a flowcell would generate data. That's an audacious goal - with the potential benefit of eliminating any discussion of amplification by enabling even the most dilute samples to be deeply sequenced.
For direct RNA, Bruce showed plots demonstrating that R&D conditions can generate as much output from several nanograms of input RNA as the current kits yield with 100 nanograms, and 10s of thousands of reads from picogram levels of RNA. Gemini tells me that 1 picogram of UHRR poly-A+ RNA, which is the standard ONT uses for testing, has around 1.5 to 3 million molecules. So this is suggesting that ONT is achieving about 1 in a few 1000 input molecules being sequenced - quite impressive.
An interesting plot was shown that demonstrates that not only can ONT identify phosphorothioate bonds, but it can distinguish the two stereoisomers of these bonds. I should set an AI to catalog all the variations on DNA which nanopore sequencing can detect. Could a modern day Meselson-Stahl experiment be run with hairpin adapters and nanopore sequencing to distinguish nucleotides with heavy nitrogen from those with light nitrogen? Gemini says that deuterated nucleotides can be distinguished, so why not?
Peptide & Protein Sequencing
One of the most anticipated segments of the future-looking portion is progress on protein sequencing. ONT has presented on this at prior meetings and there are many companies tackling the problem of new generation protein sequencers, but so far only QuantumSI has actually launched a product - and not one that has been a barnburner of a success. This section was presented by Jayne Wallace.
ONT says they routinely generate millions of peptide reads from a PromethION flowcells using pores engineered specifically for peptide sequencing. Since the peptides are tagged with DNA adapters, for which the modified pore generates high quality signals, it is possible to have many multiplex barcodes for sample mixing.
Last year ONT showed that they could distinguish a set of known peptides from each other. ONT is now pushing harder on decoding peptides - an amino acid caller. This is clearly a challenging task. ONT has not disclosed how many amino acids fit in a pore, but Gemini claims that each amino acid lengthens the backbone about as much as each nucleotide lengthens that backbone. R10 pores fit about 10 nucleotides, so let us assume 10 amino acids as well. Ignoring the huge number of modified bases, that means there are 20^10 different peptides one wishes to distinguish - 1e13 different peptides. That's an impossibly big space to completely sample - only by a clever sparse sampling scheme could one train a model. It is likely that not all ten amino acids would have similar weighting in the signal, so it may be possible to contemplate smaller training sets - but even 20^6 is still 6.4e7 which is probably impractical to exhaustively enumerate into training peptides.
Wallace says they've created a way to generate very large peptide libraries where each peptide has a tagged DNA of known sequence. My first guess is this is a series of pool-split cycles ala DNA Encoded Libraries; I don't know that space well but likely there are peptide variants of it. At this time, their library has 150K distinct peptides.
ONT says they've built their first amino caller by extracting specific SDS-PAGE bands for 22 different proteins, digesting them to peptides, making a barcoded library from each and then pooling, generating squiggles, and then calling with the amino caller. Each set of predicted peptide sequences was aligned back to the complete E.coli proteome to generate a score, which was then used to choose the most likely protein. In every case the top scoring protein was the correct one, and only for one protein was the confidence score for the second highest protein even above baseline. It would be interesting to explore that case closer - the protein was CspF and there are paralogs in the proteome.
I can't help wondering which amino acids have the most similar effects on peptide signal - my usual "can you tell leucine, isoleucine and valine" apart question.
ONT hasn't discussed the peptide prep in detail. Adapters must be ligated to the N-terminus and C-terminus at high efficiency, without accidentally adding these to chemically similar side chains such as lysine, aspartate and glutamate. There's also presumably a step to remove N-terminal acetylation - or just lose these peptides? What happens to only partially adapted peptides? Can these foul pores? What about branches in the chain such as ubiquitination? Must glycosylated proteins be stripped of sugars first - hard to see those snarls not blocking a pore.
Wallace also presented a protein barcoding approach in which the coding sequence of a gene is engineered to append the protein barcode. With a set of 1000 barcodes attached to proteins of different expression levels, a very tight linear (in log-log space) relationship is seen between expected and observed barcode count. No details were given on how the expected barcode count was generated - if these are from actual quantitation of the proteins or more likely from reference data or from RNA expression levels. An important concern about any protein tag is whether it affects the protein in some way - such as by accidentally destabilizing it.
One promise of nanopore peptide sequencing is the possibility of distinguishing post-translational modifications, with perhaps the most valuable one being phosphorylation, since it is so important to intracellular signaling in human health and disease. Wallace presented data on 100 peptides that were processed both in unphosphorylated and phosphorylated forms, with a clear change in signal between the two. A machine learning model achieved well over 99% success in classifying non-phosphorylated vs. phosphorylated.
Wallace also gave a short update on ONT's effort to run intact proteins through pores, showing squiggles generated from full length proteins. Full length proteins would be highly differentiating from all the non-nanopore protein sequencing competitors, including mass spectrometry. Avoiding digestion would also eliminate the many blind spots created due to very short digestion products being lost.
While there had been solicitations of interest for alpha users in the past, no breath of any such program was made this year.
An Product Announcement I Would Like to Wish Into Existence
Perhaps I should have put this in the prior piece, but I forgot, but there is one long term product launch I really wish Oxford Nanopore would have announced: the date of next year's London Calling. All the other recurring conferences I've attended or thought about have dates for at least one year forward, often two. It does make long-range planning practical - what dates should I steer family events away from or what conferences might I chain together? Last year ESHG was immediately on the heels of London Calling, which was quite nice - a short hop from Gatwick to Milan and I got a twofer from the transatlantic crossing. It's always been odd that the sign-off from London Calling has never been a "save the date" for the next London Calling.
No comments:
Post a Comment