Thursday, June 27, 2024

ONT T2T Genome Bundle: Hot New Thing or Flash in Pan?

Last month at London Calling, Oxford Nanopore announced a consumables and reagent bundle which enables generating six telomere-to-telomere (T2T) human genomes for $4K each.  Even in the very friendly audience at London Calling, there was some skepticism over the market viability of this offer - how much would it really drive sales?  T2T human genomes really only became possible in this decade.  The first examples of T2T chromosomes generally used a mix of different technologies, often including PacBio HiFi, ONT Ultralong and BioNano Genomics mapping information.  What ONT is proposing is the ability to routinely generate T2T genomes using only ONT data.  

ONT-Only T2T Genome Bundle (Where's My Royalty Check?)


This is accomplished using three libraries and four PromethION flowcells, along with a complete software workflow.  The most important library, which will consume two flowcells, is an ultralong library. This provides a strong foundation for structural variant detection as well as detecting methylation.  

To improve the overall accuracy, one flowcell is used on a "6b4" library, in which DNA is resynthesized using a nucleotide mix that includes two nucleotide analogs, one for A and one for T - hence the moniker "6b4" as there are six bases (6b) resolving to 4 states. By doping in a base analog, homopolymeric sequences aren't quite as uniform - they will have scattered positions occupied by the base analog, and ONT has basecalling models which can call the base analogs. It's an idea that I'd be very interested in whether there are any published descriptions of which pre-date my 04 November 2016 blogpost proposing such an approach (which encountered immediate extreme skepticism from some raving PacBio fans).  If ONT filed any patents, that's definitely prior art to reference!  Only A and T are substituted because those are the dominant long homopolymers in human DNA - but perhaps for other DNAs C and G analogs might be added as well to yield 8b4 libraries.  

The final flowcell is used for a Pore-C library, ONT's version of HiC.  This provides contiguity where the ultralong libraries cannot and assures phasing of haplotypes.

Of course, that $4k/genome number isn't complete - the standard omissions of labor costs, real estate and so forth are left out. Numerous additional reagents are required to extract high molecular weight DNA and build the libraries.

Are Three ONT Libraries Necessary?

ONT realizes that making three separate libraries has many drawbacks, so they are working towards a vision which has only a single library.  

Getting rid of my contribution, the 6b4 library, would require further improving homopolymer calling.  One described path to this is further engineering of the pores to have a longer reader region. Of course, every pore change necessitates redoing not only models, but actual flowcells and triggering yet another big chemistry shift that nobody enjoys.  An intriguing alternative path mentioned by Clive Brown is to oscillate the voltage in the flowcells.  In the current scheme a constant voltage is applied to each pore, but ONT believes that a regular modulation of the voltage would provide a metronome-like component to the signal, enabling new models to provide better basecalling - particularly across homopolymers.  By the way, in the spirit of "inventing" 6b4 I will soon throw out in the near future. yet another idea for how to improve homopolymer calling on ONT.

For avoiding the need for Pore-C, Clive went back to a "seems impossible but ONT's changed the rules before" idea: why not just sequence chromosomes end-to-end?  Of course, that would require successfully isolating intact chromosomes, adapting them for sequencing and getting them into the sequencer - all without shearing.  Clive reported that ONT has successfully sequenced more than one entire yeast chromosome - which is a start (and an exciting one for me since my day job often involves yeast sequencing).  ONT also described a new Telo-Seq method for adding adapters to the telomeres themselves - which is slightly different than a method developed by Nobel Laureate Carol Greider's lab which had a splashy Science paper on human telomere length variability  & Dr. Greider  herself fittingly started London Calling's presentations this year.

Clive also threw out the idea of protecting entire chromosomes by reconstituting chromatin around them (which Dovetail Genomics' original ChiCago method used) or by not even stripping histones off the DNA prior to sequencing.  ONT's redesign of the MinION/GridION flowcell case includes a single port which is designed to accept very gloopy libraries - preparing for such a world.  I'll have another post soon exploring the "so old it's new" way to extract entire chromosomes - Clive pointed me at a key author - one of the giants of early molecular biology.

Clive also mentioned a number of ways in which ONT is driving for even higher yields per flowcell, including a buffer change which will be rolled out soon and perhaps future increases in translocation speed.  For a single flowcell T2T genome, most likely a 2X improvement will be required - certainly on Clive's plots but making that real will be much trickier than just drawing possibilities.

But Will They Sell?

That's all good, but how important are T2T genomes?  Who will use them?  As with every other shiny-and-new genomics approach, the question always arises whether the cost of the new approach really justifies it over less expensive alternatives. 

For example, suppose you are running a population genomics program for a human population that hasn't been explored previously with long read sequencing.  If you can generate four HiFi genomes on Revio (perhaps via outsourcing) or one pure ONT T2T genome each time - or a mix?  If you have $200K to cover sequence generation, should that be 200 HiFi genomes, 50 pure ONT T2T genomes or something in between like 20 pure ONT T2T genomes and 120 HiFi genomes?  Or do you start thinking of ways to further reduce costs by perhaps using short read Hi-C in place of Pore-C and generating that data very inexpensively on a NovaSeq X?  But are there pipelines available to combine different data for a unified whole the way ONT is providing?

In the rare disease space, I'd frame the question as to how many cases unsolvable with HiFi + BioNano will be solvable with a T2T genome?  If I were at ONT, I'd be trying to line up an effort to ask exactly that - get access to samples from one or more consortia that have a multitude of cases still awaiting resolution.  Of course, it is a gamble - many of those may be unsolvable because even with an absolutely perfect genome sequence with perfect methylation calling it may still not be possible to resolve a case - either our knowledge to interpret is still too puny or the existing samples are just not sufficient - perhaps the patients are mosaic and we have the wrong samples or perhaps the disease isn't actually Mendelian at all.

Of course, some new start labs may be very keen to focus on a single sequencing platform and due to various constraints unable to outsource to other providers - for such labs the T2T bundle will be very attractive, though the question will persist of how often to use it vs. how much can they get from just running ultralong libraries at half the cost and so twice the number of samples.

Ultimately we are faced with a standard question in science: how important is it to measure something which we couldn't measure before?  Population scale T2T sequencing might prove to be resolutely uninteresting, but until we perform it we can't know.  Prior advances in resolving the human genome have proven valuable; I'm willing to bet having thousands of T2T human genomes will prove valuable also.

Non-Human The Real Market?

Of course, human isn't the only game in town.  There's nothing specific to human in the kits, so if you can purify uHMW DNA from your species of choice, it is fair game for the bundle.  One catch is that ultralong libraries, if I recall correctly, cannot be barcoded.  So unless you play with flowcell washing, that component of the process has economic units of roughly 0.5 human genome equivalents.  But that is probably workable: many economically important plant species are similar in scale to human and if the project is valuable enough perhaps you don't care if it's ~$4K to get a near-perfect assembly of a 0,8Gb crop genome.  I also don't know if the downstream pipeline can deal with higher ploidies than two; cultivated genomes are so often polyploid.  But given the large number of agronomically important traits that are already known to be controlled by structural variation, having the highest resolution genomes for important crop species seems like a very good value proposition.

As an aside, a really interesting phenomenon came out in a London Calling presentation on grapevine (well worth a watching!).  The green parts of grapevine consist of three distinct tissue layers, and as far as anyone knows these never switch (no known equivalent of EMT or MET (E=endothelium, M=mesoderm, T=transition) in plants.  Since grapevine is propagated vegetatively by grafting, and in some cases has been propagated that way for centuries, not only are different grapevine genomes of the same varietal diverging from each other, but the three different genomes in the plant (one for each layer) are diverging.  That would give one more use for Hi-C/Pore-C - it can inform which chromosomes are in the same compartment.

Having T2T genomes of vertebrates or complex invertebrate genomes (some insects such as Anopheles have very messy genomes) might be yet another academic market.  Perhaps major livestock species such as beef cattle and chickens will provide a bigger market  - though ONT has notoriously struggled on chicken genomes,

Parting Thoughts

Back to T2T human. One interesting aspect of the bundle is what is not in it - duplex libraries.  While ONT is still discussing these and high duplex flowcells are on the product map, there was definitely much less emphasis on this approach than the last few London Callings and Community Meetings.  It's a more complex workflow and user yields miss by a large amount ONT's internal claims, and the basecalling accuracy out of the newest models just keeps making simplex pretty good.  It's definitely too soon to say ONT will abandon duplex, but I wonder if it will become a minor product aimed at niche markets (such as protein engineering) where single molecule accuracy is critical - situations where every DNA molecule is truly an independent datapoint and consensus generation isn't a viable option.  

The next big ONT event will be the US Community Meeting, which is in mid-September in Boston.  Somewhat ironically, I have an invitation to be elsewhere, so sadly I won't be able to take advantage of the proximity and mingle with the amazing people who attend these meetings.  That isn't much time to get a read on the T2T bundle or anything else, so we might be forced to wait until next London Calling to see if the T2T bundle is grabbing market traction.

2 comments:

  1. A former colleague proposed exactly the same thing circa 2014 or 2015. It was deemed not important enough to patent by the legal beagles.

    ReplyDelete
  2. RE: 6b4. Yeah I'm afraid your about 5 years too late 😜https://patents.google.com/patent/US11261487B2/

    ReplyDelete