Thursday, November 19, 2009

Three Blows Against the Tyranny of Expensive Experiments

Second generation sequencing is great, but one of it's major issues so far is that the cost of one experiment is quite steep. Just looking at reagents, going from a ready-to-run library to sequence data is somewhere in the neighborhood of $10K-25K on 454, Illumina, Helicos or SOLiD (I'm willing to take corrections on these values, though they are based on reasonable intelligence). While in theory you can split this cost over multiple experiments by barcoding, that can be very tricky to arrange. Perhaps if core labs would start offering '1 lane of Illumina - Buy It Now!' on eBay the problem could be solved, but finding a spare lane isn't easy.

This issue manifests itself in other ways. If you are developing new protocols anywhere along the pipeline, your final assay is pretty expensive, making it challenging to work inexpensively. I've heard rumors that even some of the instrument makers feel inhibited in process development. It can also make folks a bit gun shy; Amanda heard first hand tonight from someone lamenting a project stymied under such circumstances. Even for routine operations, the methods of QC are pretty inexact so far as they don't really test whether the library is any good, just whether some bulk property (size, PCRability, quantity) is within a spec. This huge atomic cost also the huge barrier to utilization in a clinical setting; does the clinician really want to wait some indefinite amount of time until enough patient samples are queued to make the cost/sample reasonable?

Recently, I've become aware of three hopeful developments on this front. The first is the Polonator, which according to Kevin McCarthy has a consumable cost of only about $500 per run (post library construction). $500 isn't nothing to risk on a crazy idea, but it sure beats $10K. There aren't many Polonators around, but for method development in areas such as targeted capture it would seem like a great choice.

Today, another shoe fell. Roche has announced a smaller version of the 454 system, the GS Junior. While the instrument cost wasn't announced, it will supposedly generate 1/10th as much data (35+Mb from 100Kreads with 400 Q20 bases) for the same cost per basepair, suggesting that the reagent cost for a run will be in the neighborhood of $2.5K. Worse than what I described above, but rather intriguing. This is a system that may have a good chance to start making clinical inroads; $2.5K is a bit steep for a diagnostic but not ridiculous -- or you simply need to multiplex fewer samples to get the cost per sample decent. The machine is going to boast 400+bp reads, playing to the current comparative strength of the 454 chemistry. The instrument cost wasn't mentioned. While I doubt anyone would buy such a machine solely as an upfront QC for SOLiD or Illumina, with some clever custom primer design one probably could make libraries useable 454 plus one other platform.

It's an especially auspicious time for Roche to launch their baby 454, as Pacific Biosciences released some specs through GenomeWeb's In Sequence and what I've been able to scrounge about (I can't quite talk myself into asking for a subscription) this is going to put some real pressure across the market, but particularly on 454. The key specs I can find are a per run cost of $100 which will get you approximately 25K-30K reads of 1.5Kb each -- or around 45Mb of data. It may also be possible to generate 2X the data for nearly the same cost; apparently the reagents packed with one cell are really good for two run in series. Each cell takes 10-15 minutes to run (at least in some workflows) and the instrument can be loaded up with 96 of them to be handled serially. This is a similar ballpark to what the GS Junior is being announced with, though with fewer reads but longer read lengths. I haven't been able to find any error rate estimates or the instrument cost. I'll assume, just because it is new and single molecule, that the error rate will give Roche some breathing room.

But in general, PacBio looks set to really grab the market where long reads, even noisy ones, are valuable. One obvious use case is transcriptome sequencing to find alternative splice forms. Another would be to provide 1.5Kb scaffolds for genome assembly; what I've found also suggests PacBio will offer a 'strobe sequencing' mode which is akin to Helicos' dark filling technology, which is a means to get widely spaced sequence islands. This might provide scaffolding information in much larger fragments. 10Kb? 20Kb? And again, though you probably wouldn't buy the machine just for this, at $100/run it looks like a great way to QC samples going into other systems. Imagine checking a library after initial construction, then after performing hybridization selection and then after another round of selection! After all, the initial PacBio instrument won't be great for really deep sequencing. It appears it would be $5K-10K to get approximately 1X coverage of a mammalian genome -- but likely with a high error rate.

With the ability to easily sequence 96 samples at a time (though it isn't clear what sample prep will entail) does have some interesting suggestions. For example, one could do long survey sequencing of many bacterial species, with each well yielding 10X coverage of an E.coli-sized genome (a lot of bugs are this size or smaller). The data might be really noisy, but for getting a general lay-of-the-land it could be quite useful -- perhaps the data would be too noisy to tell which genes were actually functional vs. decaying pseudogenes, but you would be able to ask "what is the upper bound on the number of genes of protein family X in genome Y". if you really need high quality sequence, then a full run (or targeted sequencing) could follow.

At $100 per experiment, the sagging Sanger market might take another hit. If a quick sample prep to convert plasmids to usable form is released, then ridiculous oversampling (imagine 100K reads on a typical 1.5Kb insert in pUC scenario!) might overcome a high error rate.

One interesting impediment which PacBio has acknowledged is that they won't be able to ramp up instrument production as quickly as they might like and will be trying to place (ration) instruments strategically. I'm hoping at least one goes to a commercial service provider or a core lab willing to solicit outside business, but I'm not going to count on it.

Will Illumina & Life Technologies (SOLiD) try to create baby sequencers? Illumina does have a scheme to convert their array readers to sequencers, but from what I've seen these aren't expected to save much on reagents. Life does own the VisiGen technology, which is apparently similar to PacBio's but hasn't yet published a real proof-of-concept paper -- at least that I could find; their key patent has issued -- reading material for another night.

1 comment:

Anonymous said...
This comment has been removed by a blog administrator.