Wednesday, February 24, 2016

Amplification-free, library-free sequencing? NanoString wants to be It

Perhaps the most unusual new technology to be unveiled at AGBT16 is NanoString's new approach to sequencing, which is in very early stages of development.  Called Hyb And Seq  the process is remarkable in being a purely hybridization-based single molecule method -- absolutely no enzymes are harmed during the operation of the system. That's remarkable -- the only enzyme-free (or nearly so) sequencing approaches to deliver serious amounts of data into Genbank are Maxam and Gilbert approaches (including Church's genomic sequencing and multiplex sequencing), and even those typically required restriction digestion of the target.

This announcement showed I clearly haven't been following NanoString carefully enough.  Their nCounter system uses some clever labeling to analyze nucleic acids without any amplification or processing.  I was most familiar with it back in Infinity days, when it was starting to emerge as a means for detecting gene fusions.  

Sequencing by hybridization had been explored extensively before as a high-throughput sequencing approach, proposed by independently by  Mirzabekov and Drmanac and colleagues; Drmanac pursued the approach through a series of companies which have now been largely forgotten, the last of which I think existed to just before the advent of 454 sequencing.  In those days, the idea was to spot lots of short clones on filters and then probe them with a series of probes which would essentially ask "yes or no" to a set of short oligos (I think in the range of hexamers).  

Algorithmically, a bit challenge is reconstructing the sequence from these binary answers, which is one of the paths which set Pavel Pevzner into thinking about graphs and led to de Bruijn graph assembly methods for short reads (this reference is one of my favorite papers, a tour de force in lucidly explaining a new approach to a long-standing problem).  There was also the severe problem of the answers being binary; repeats of the same kmer in a clone could be a serious problem.  

On the practical side, the short effective kmer lengths used meant that the inserts to be probed could not be very long, as short as 200bp in one of the abstracts I found.  Ultimately, Drmanac ended up going a different path as well, being one of the founders of Complete Genomics. 

Sequencing by hybridization has pretty much laid dormant for quite a while, and then GnuBio in a sense revived it.  Gnu's chemistry again was conceptually asking sequentially if each sequence contained a given kmer, though using an enzymatic system for the detection and really cool picodroplet technology for generating all the necessary partitions.  Supposedly the GnuBio technology, now under BioRad's umbrella, is still on track for an amplicon-sequencing system, but the lack of reports of progress (such as no trace at AGBT, according to one attendee I spoke with), lead to a wonder whether this technology will ever see light of day.

I think I've worked out an accurate outline of Hyb And Seq from the company's video, James Hadfield's nice blog summary, the tweets I collected and chatting with Dale Yuzuki, and it involves a Wallenda-esque four-component hybridization. DNA is hybridized to capture probes on the flowcell.  Probes are then hybridized to the captured material.  Each probe has a region designed to bind a specific target (I think; little fuzzy here but it seems required), a region which will bind the bases to interrogated plus a tail which effectively expands the sequence in the target-seeking region.  A series of labeled probes are used to read out this tail.  In other words, a position in the unlabeled probe is represented by one of four sequences in the tail, corresponding to the base at that position in the probe.  The labeled probes are designed so that hybridization, imaging and removal are very fast.   After fully interrogating the unlabeled probe, these are removed and a new set of probes washed in.
It's a pretty amazing bit of hybridization trickery, carefully building up a four-layer cake and selectively removing either the labeled top probe or the unlabeled layer 3 probe.  Between 1 and 4M locations are interrogated with each probing.

That sounds almost crazy, a similar 3-deep hybridization (with 4 components) is the basis for NanoString's user-customizable content scheme, so they have experience.  This is a huge asset to NanoString getting Hyb And Seq to work -- it is a logical extension of technology they have already developed.  Furthermore, the accuracy appears to start out relatively high (2%) and be improved by repeated reprobings, going to 0.1% after two passes. Also, in theory, multiple interrogations of the same molecule can be performed, serially with probes that will hybridize in the same neighborhood or potentially in parallel with long DNA molecules stretched out on the flowcell.  Note that probe sets would be designed to prevent two probes from hybridizing too close to each other; otherwise the signals from those distinct probes can't be distinguished.

Conversely, it would seem a little complicated to deliver reagents -- there's all the labeled oligo mixes and then potentially a very large number of unlabeled probe sets if you want long "reads".  That's an engineering problem, which means it can be solved, but suggests an instrument with a lot of expensive moving parts.  I don't know how much NanoString's existing instrument sells for, but that can't be inexpensive to start with.  

Still, for some applications Hyb And Seq might be very attractive.  For example, targeted sequencing of oncogenic hotspots in clinical material -- the lack of any library preparation, amplification or enzymes during reading implies a system that should be very robust to the battered state of FFPE-extracted DNA, as well as potentially very fast (projected speed for a 100-gene panel is around 60 minutes).  The system can read RNA -- or both RNA and DNA simultaneously (or even, apparently, RNA and DNA and protein).  Could this make it very interesting for viral pathogen searching, avoiding both the problems created by reverse transcriptase as well enabling simultaneous querying for both RNA and DNA viruses?  Circular RNA or DNA?  No problem -- the system doesn't need an end (though with DNA keeping the target naked enough for the probes without the complement snapping on is probably a serious challenge).  

There's also the possibility of contact-printing nucleic acid samples from a tissue slice onto the capture probes, retaining the two-dimensional information from the sample -- NanoString is promising showing this with their standard hybridization probes at AACR in the spring.

The biggest question mark for Hyb And Seq is when it might actually see real daylight; NanoString's Joe Beecham apparently stated that release would be no earlier than 2017.  That's an eternity in sequencing technology land, but wait we must.

No comments: