Sunday, January 02, 2011

First of a Torrent?

For the New Year I've resolved to be a bit more regular in posting here, and lik all New Year's resolutions it is easy to start out big, so there may be a flurry of posting this week. Of course, the real challenge will be to maintain that energy across an entire year. But, to kick-start things I spent the holiday weekend drafting nearly a week's worth of output.

Ion Torrent continues to attract a lot of attention, though its launch last year hasn't yet resulted in my getting hands or eyes on one. Ideally, an evaluation machine would show up but that's happening only in my dreams. Nor did my attempt to win a free one succeed, though one winning entry was of very similar concept (and both were from Massachusetts!). Most of the press has continued to edge towards breathless and unthinking hype, but the counterpoint is in Nick Loman's well-thought bit of exasperation with that hype.

My own thoughts continue to lie in between. I continue to be frustrated by the absurd hype in various tech press outlets, but I also see this as a useful machine. There's a number of interesting angles, which I've decided to tackle with a small series rather than one big lump. Ideally this splitting will result in more coherent arguments on my part, but that's for you to decide.

To me the most frustrating angle is the view that sequencing is a monolith and a single race, with one winner. For Sanger sequencing, this tended to be the case because the underlying technology was so similar and the various platform makers didn't separate much. ABI took the lion's share of the market, Amersham was a distant second and that was almost it. LiCor had the one somewhat differentiated entry, with a different dye system yielding longer reads, but at the cost of greatly reduced throughput. Even these reads were not so much longer (I think they claimed just over a kilobase, whereas ABI routinely got about 3/4 kilobase) to really drive a big niche.

But second-gen has evolved in a very different way. Speed, upfront cost, running cost, library prep, pre-sequencer prep, accuracy and read length are multiple variables in which the different platforms have landed in different boxes. Some of this is inherent in the technologies, whereas others are due to simply design choices or intellectual property positions. An example of the latter is Illumina's patent lock on bridge PCR, whereas other amplification-requiring platforms appear to nearly all use emulsion PCR (Complete Genomics uses rolling circle).
So, to me the question in evaluating a platform and where it is going depends on looking at that particular combination and asking what applications work best. Once that's worked out, the size of the market can be speculated on as well as who else might be bumping elbows in that space.

Now, Ion Torrent has a number of operational features worth noting. First, it has the lowest upfront cost of a sequencer at around $100K fully loaded (sequencer, server & emPCR robots). This is a first point of my annoyance with many glowing articles: they parrot the "$50K" price which buys you just the sequencer. Even worse are the ridiculous claims of Ion Torrent being 1/10th the price of the competition; this is comparing only to the highest price alternative offerings and not the likely alternate choice.

Second, the run times are quite fast. But again, many of those enamored with the device mindlessly spout the time to acquire data and not all the up-front prep. Some of that prep will depend on the particular application, but it is still on the order of 2-3 days to go from DNA sample to data off the sequencer. Now, I know the pain of anticipating data, having recently gnawed my nails off waiting for a high-stakes paired end SOLiD 4 run (closer to two weeks than one), but the truth is a number of other platforms offer similar speed (more in another installment).

Third, the initial release is claiming about 100,000 reads of 100 bp or more (up to about 200). The chip costs $250 and there is another $250 to prep the sample; it is unclear from anything I've seen what is included in that prep cost and in particular how many runs you can get from one such prep. For example, if that includes library adapters and I'm using a direct PCR approach, then that $250 cost is actually inflated. More importantly, if I need more than 100K reads for an application, does that $250 or prep buy me more than one run (i.e. will 200K reads from one sample cost me $750 or $1000?). Error rate is not clear and homopolymers will be a problem, though the probability of miscalling these isn't well documented.

Given these fuzzy estimates, what sort of applications will be best for the Ion Torrent platform in its initial state? To me, and clearly to others, the sweet spot is sequencing of targeted and easily interpetable regions. The two U.S. contest winners (was the European giveaway ever executed?) are just along those lines.

A group at MGH is planning to perform PCR-based targeted sequencing of cancer. This is a very appropriate application which fits many of properties of Ion Torrent. Many cancer mutations are what we call "hotspot mutations"; the same mutations are seen repeatedly. For example, in the very important KRAS oncogene the vast majority of mutations occur in any of the six nucleotides of two adjacent codons. Design your PCR assay correctly, and all you would need is a six base pair readlength (indeed, several tests approved for the clinic or on their way there could be seen as 1-bp read length sequencing assays). More realistically, you need to set the primers back a bit from the hotspot and read through the primers, but for this the 100 bp reads of Ion Torrent will be quite good. Now, this hotspot behavior governs most, but not all, activating mutations in oncogenes. This can be seen as it being hard to turn something on by tinkering with it, though in a few cases the tinkering is by removing a whole inhibitory exon and there are many ways to do that. On the other hand, many tumor suppressors are mutated in a diffuse pattern. Sometimes there are hotspots due to particular mutation processes or other forces, but these are never as hot.

The other winning entry was from Woods Hole Marine Biological laboratory to rapidly profile to identify bacterial contamination of water. Again, PCR-based and looking at well-defined signatures, in this case ribosomal RNA profiles.

Each example fits well into the Ion Torrent's capabilities. In both cases, you don't need enormous numbers of reads to do a decent job, though more reads would let you either look for rarer species or assay more loci. Since you are looking for signatures, the assays can be calibrated well in advance versus the error and read length characteristics of the platform. For example, you can know in advance where there are homopolymer runs and adapt for them. Offhand, I can't think of an oncogenic hotspot that involves a homopolymer run and in oncogenes the frame must generally be preserved (again, there are those rare non-coding oncogenic changes) so that would help constrain errors.

Given that sweet spot who is going to feel Ion Torrent's elbows? The obvious candidate is Roche. The 454 GS Jr is around 2-3X the upfront cost (again, for a complete infrastructure), around 4X the cost per run and will yield 0.5-1X the number of reads -- but much longer ones. However, for both the applications above long reads aren't really such a great advantage. Again, for many of your signatures you can design the signature around the read length, and really long reads add only a bit more value. For dealing with clinical cancer samples, you really want to keep your PCR amplicons down to 250bp or less because the DNA you get is generally quite fragmented and has other impediments to PCR. With a protocol that can read in from each end of the PCR fragments (perhaps randomly or perhaps in two separate runs, one from each end), Ion Torrent's current length fits well. Short signatures will work better on both platforms in any case, as in the real world you get some reads that peter out much sooner -- short signatures mean more effective reads of a signature per run. 454 has a more established chemistry and performance specs, but I would expect Ion Torrent to be serious competition for the Jr platform, with the 454 family holding on to the applications (such as HLA haplotyping) where length really does matter. Holding on, that is, until Ion Torrent can push their read lengths to similar territory.

That's a pretty big lump. Next installment (not necessarily tomorrow; I might interleave some other topics burning on my desk), looking at the much stressed tie to the semiconductor industry.

1 comment:

BenK said...

Well, here, there's funding to get an Ion Torrent, to supplement other sequencing efforts and genome characterization methods. However, we are currently short a bioinformatician... if anyone has a green card (or is a citizen) and is looking to support epidemiology/infection control on the cutting edge...