Friday, February 26, 2010
PacBio's big splash
The Pacific Biosciences instrument is officially unveiled now, with those lucky/smart (or SMRT?) enough to go to Marco Island filling in all of us not in that position. Sounds like a great lot of hoopla, though they didn't drag the Hornet for the splashdown.
First of all, it's a beast. "In this corner, weighing in a nearly an imperial ton...". Too bad their marketing picture has nothing good for judging the scale --
it's apparently 6.5 feet wide.
Kevin Davies at Bio-IT World has a wonderfully detailed article and there is a lot of nuggets in the Twitter feed. Anthony Fejes has two different sets of notes out -- one from a workshop and one from another speaker; Dan Koboldt has some good notes too (and if I haven't shouted out your notes, it's probably because I'm oblivious -- leave me a comment pointing to them). There was also a little bit of PacBio science in Elaine Mardis' talk (she's on their SAB) -- Anthony's notes & the twitter feed.
Okay, besides worrying about the capacity of floors & freight elevators, what's new? Well, not much on error rates from PacBio (apparently in the Q&A their presenter executed a jig, tango, waltz & rumba when asked) -- though the Mardis talk described resequencing samples of PacBio that had been done before by Illumina -- and the results are quite good. Another important note is that their system doesn't seem to have much bias in terms of composition -- bias against hi/lo %GC has been noted in all of the amplification-based systems and can be a serious problem.
There's also a lot of talk about being able to distinguish various modified bases by their effects on polymerase kinetics. PacBio has also demonstrated direct RNA sequencing (substituting a reverse transcriptase for DNA polymerase) and is talking about watching proteins being made. I haven't quite figured out why you'd want to do that last one, but presumably it's for more than a cool Nature cover.
Read lengths decay exponentially -- but with lots around 1Kb and quite a few around 5K. The big problem is apparently oxidative damage to the polymerase triggered by the laser -- so they are working on both getting the oxygen out of the system and engineering hardier polymerases (the sort of biz I used to be in). Their strobe sequencing mode -- in which the laser is turned off to enable elongation in safe darkness -- enables multiple reads separated by long gaps.
The instrument definitely raises the bar on sample prep -- it's apparently entirely automated within the monster. YEAH! A machine I can delude myself into thinking I could run it! One drawer takes the SMRT cells and another the DNA samples -- 500 ng of each. That doesn't sound like much (it's at least better than the 5-10ug most library prep protocols call for -- except the ones looking for 20-30ug), but it seems you don't get a lot from each sample.
The number of reads per cell isn't huge -- but you're still getting about 2 E.coli genome equivalents by my calculation. This is a bit undersized for a lot of applications -- but grand from many others. Mardis' talk discussed using PacBio for sequencing PCR amplified resequencing samples -- this would appear to be right in the PacBio sweet spot. Perhaps a few hundred long PCR products could be packed into one SMRT run and still get many hundreds of reads per sample -- well, maybe pack fewer amplicons.
What might be other good uses? Clearly metagenomics and similar. I just saw a posting on a professional board of someone pondering multiplexing hundreds of samples for an Illumina run (the current barcode schemes are for a few orders of magnitude fewer samples). Blitzing each sample through the PacBio instrument would seem to be obvious -- if the error rates are acceptable. Folks doing whole genome sequencing of small genomes will love having PacBio to generate scaffolds. For bigger genomes, it may just still be too expensive to get much coverage ($100 a SMRT cell sounds cheap, until you start multiplying that out for the numbers you need) -- but perhaps not (much too fried to do that calculation at the moment).
RNA-Seq might be a bit trickier. If you need 500ng of input material, that's an awful lot of ribosome-depleted or poly-A RNA. Plus, getting only tens of thousands of reads, making it hard to see lowly-expressed messages -- but very long ones, perhaps priceless. But, if you can get tons of RNA, then 100 SMRT cells would be about $10K and offer similar depth to what you can get today with Illumina but with those super long reads.
Now, who is this going to crimp the most? The instrument is clearly a ways from really threatening Illumina & SOLiD for the large genome market. 454 is a likely candidate to see growth pressured -- though between the new lower-priced "junior" and both PacBio's $700K price tag and their inability to flood the market with instruments, this will be ameliorated.
PacBio might have almost as much effect on the surrounding sequencing ecosystem. Making library prep reagents for this system is not going to make you lots of money! But, there will be a serious niche for targeted sequencing -- though with the scale it will probably require some rethought. Stuffing the whole exome into this doesn't really make sense -- if there are ~250K segments of the genome to read & you want 40X coverage of each, that's a lot of SMRT cells. But, intelligently chosen gene sets totaling about 500 regions (or around 20-50 genes) with pre-validated reagents -- now that might be a market (though one which might have 1-2 years of life -- better get cracking!). Simpler library prep will also go nicely with some of the enrichment systems -- a bugaboo of hybridization systems can be "daisy-chaining" of fragments via the amplification adapters -- but, on the other hand you don't get 500ng off an array or in-solution system without amplification. As with many disruptive technologies, it won't fit a lot of bills but will nibble off various parts of the business that are individually small but significant in aggregate. As noted above, RNA-Seq might be an initial success story for PacBio -- when RNA is abundant.
IMHO, PacBio does need to get some papers out on applications (Mardis' group apparently is close to having one) and make sure that the next tranche of installations not only includes the Sanger & BGI, but that there are also some core labs or commercial providers. Also, they need to start pumping data into the public domain -- while they signed a bunch of commercial software providers up, it is definitely out of academia that you find the most radical advances. Plus, there are a lot of now well-entrenched open source tools that need to be tested with the new kid. Even simple things like the semi-standard SAM/BAM format are going to need tweaking -- SAM/BAM stores all sorts of information on read pairs, and the strobe sequencing can generate many more than 2 tags per DNA fragment.
Of course, we have to wait another half day plus to find out what Ion Torrent is really delivering. That could really shake up the landscape -- at least the mental one.
A huge thanks to all the bloggers & twitterers for pouring out so much information. I'm still getting used to scanning past the retweets (is there a way to condense them) and there is the occasional shock-to-the-system (how could anyone in the field not have heard of Rodger Staden?!?), but that's a tiny price to pay for such fascinating stuff.