Saturday, February 27, 2010

Last Day of Eavesdropping on Marco Island

Today was the last day of the Marco Island conference, so I won't be hammering Twitter again for quite a while. The afternoon session focused on emerging technologies.

Complete Genomics appears to have dispelled the skepticism they had been met with last year. It certainly helped that two customers presented data (Anthony Fejes' notes on CG workshop). Apparently they hinted at some additional technological improvements coming down the pike to get even more data out.

Life Technologies presented on their single molecule system, which they hope to get to early access customers by the end of the year. It's a single molecule system with many similarities to Pacific Biosciences. One interesting twist is that they can add new polymerase when the old ones die, so in theory they can keep sequencing to extremely long lengths. This could be a huge plus for the system in de novo and metagenomic settings.

One other neat PacBio tidbit, thanks to Dan Koboldt, is that the polymerase reaction rates are so uniform that fragments can be sized (and therefore structural variants detected) by the time required to go from end-to-end.

Ion Torrent presented and apparently was received well, though the amount of detail available remotely is still frustratingly thin. A lot of key questions I have don't seem to have been answered, which I'm guessing is due to limited information in their presentation (though one can't rule out blogging fatigue hitting my sources). It also isn't helping that Twitter seems to be experiencing difficulty, perhaps because of the traffic due to the natural catastrophe in Chile & curiosity about tsunamis in the Pacific.

Ion Torrent's general scheme is to trap DNA (single molecules or clusters?) in wells in a micromachined plate (much like 454, though apparently no beads) and detect the release of a proton each time a nucleotide is incorporated. Detection is via a proprietary semiconductor detector built into the bottom of each well.

It isn't clear, for example, whether each of the micromachined wells in the system is watching a single DNA molecule or some sort of cluster of molecules. If the latter, what is the amplification scheme? The run times described seem incompatible with amplification.

How much sample goes in? What preparation is needed upstream? What sort of tagging is needed? Can, for example, the Ion Torrent machine be used to resequence (or QC) libraries from the other systems? Does the sample need to be linear, or can you sequence plasmids directly (I doubt it, due to supercoiling, but it's worth asking).

Ion Torrent is making several bold assertions. One is "The Chip is the Machine", which decodes to the fact that the chips (now seen on the website) determine the key performance attributes of the system; the box (reputedly $50K) is simply interface, data collection and reagent fluidics. Another bold claim is that the chips can be fabricated in any CMOS fab in the world. Of course, that presumably leaves out the specialized microfluidic setup on top. Still, that is an impressive supplier base.

Somewhere I saw a throughput of 160Mb per 1 hr experiment for $500 in consumables. The Ion Torrent website's video hints that part of their business model will be selling different chips of different densities for different applications. One nice feature of the consumables is that they should be just standard polymerases and unlabeled nucleotides. Of course, there could easily be some magic buffer components, but one part of the cost of many of the other systems is the need for either labeled nucleotides (everybody but 454) or complicated enzyme cocktails (454). Furthermore, it is the presence of unlabeled nucleotides in the reagents that are a major contributor to loss-of-phase in clonal systems and probably to "dark bases" in single molecule systems. Simple reagents should translate to low costs, and perhaps to high reliability and long reads.

How long? That's another key attribute I haven't seen. Again, knowing whether this is a single molecule system (in which case what would kill reads?) or clonal (with the dephasing problem) would be informative. How many reads per run? For some applications, getting lots of reads is more important than long reads -- and of course for others length is really important.

Error rates or modes? I haven't seen anything beyond an apparent bulletpoint that Ion Torrent sequenced E.coli (in a single run?) to 13X coverage, 99+% of genome in assembly and 99.9+% accuracy. Supposedly homopolymeric runs can be read out, but how accurately? Is there a length beyond which things get confusing?

One more neat aspect of the Ion Torrent system: no images. Sure, the traces from each pH run (the world's smallest pH meters, according to the website) should be much more compact, but not nothing -- though it is implied that the signal is sharp enough that there is no need to store them. Hence, unlike all the other systems there's no need for beefy on-board computers and no headache of storing enormous numbers of high resolution images.

A final thought: $500 per 1 hour run is attractive, but if you really kept one instrument going quite a tab would run up. Suppose one got in 10 runs in a day (does it have any autoloading capability?) -- that's $5K/day. Even keeping that up in the approximately 200 business days in a year is $1M in chips -- something Ion Torrent and their backers are licking lips over but will have to be faced by those who get the machines. Of course, you don't have to run the system constantly (and that's hardly constantly!) -- but if I had one, I'd certainly want to!


Kevin Davies said...

The lack of detail on the sample prep was perhaps the only detraction from a wonderfully exciting and typically flamboyant presentation from Rothberg. There was speculation afterwards that perhaps it involves emulsion-PCR, which wouldn't be ideal. In any event, Rothberg said it was reason to come here him give his next talk, whenever/wherever that may be. We're hoping it's at the CHI XGen Congress in 3 weeks in San Diego...

Jack Leonard said...

I attended this very exciting talk. Based on discussions with reliable sources at AGBT, I believe it relies on sequencing amplified clones on beads. Presumably you need to change the H+ concentration enough to have a good S:N ratio, and single molecules in a 1.3 micron well just wouldn't accomplish that. The output per run was unclear. I heard 100 M reads per run, but this output might still be on the drawing board. Combined with 100 base reads/run that would output 10Gb, or 50 Gb/run, if you can push this to 500 base read length as is possible for the 454 platform. So this platform appears to have a lot of runway left. Yes, the front end seems unresolved, but the future seems very bright (actually dark, since it is light-free) for Ion Torrent.

Jack Leonard said...

Ion Torrent raw output probably will be influenced by a number of factors including chip size, well size, well density, and active bead occupancy.

Some reasonable assumptions:
Chip size (postage stamp size) ~ 2 cm^2
Well size ~ 1.3 um diam.
Well density/spacing = 1.95 um on center distance.
Active bead occupancy (Wells with beads which give clonal reads) = 50%

So you could fit about ~105 M wells onto a 2 cm^2 chip. If the average read length was 100 bases and half of the wells gave usable reads, then one could expect ~5.2 Gb/run. If the average read length was instead 500 bases, then the output would be close to 26 Gb/run.

It might be a tricky business getting a 1 um bead into a 1.3 um well, but if Complete Genomics can get a 200 nM DNA nanoball to stick to a 250 nM spot on their arrays (also made photolithographically?), then I suppose that it must be possible. From a technical perspective, it seems they should work together, even though their business models seem completely at odds (i.e., CG is trying to industrialize genome sequencing, while Ion Torrent is trying to decentralize it). If Ion Torrent could work at Complete Genomic's densities of 750 nm on center distance spacing as described by Rade Drmanac, then by my calculations Ion Torrent would have approximately 711 M potential reads per run which would output 36Gb @100 base read length or 178 Gb @500 base readlength (at 50% active occupancy). Just a thought--- Ion Torrent might want to consider a nanoball-like RCA approach which would support a higher density while still producing enough protons for detection.

neekoh said...

Maybe this patent application describes some of your questions:

Unknown said...

Jack - at the meeting an Ion Torrent engineer sketched out similar factors for me on the back of a business card -- which I can't find at the moment, of course -- showing how to calculate estimated run yields for the 2 chip sizes they're currently planning to offer. I think the current numbers for those factors are lower than your assumptions, which may explain why per-run output is projected in the few hundred megabase rather that gigabase range. Although he did say there's lots of headroom for fairly large yield increases with relatively simple engineering improvements.