Monday, May 02, 2011

How Many More Machines will PacBio Sell This Year?

Amongst last week's news is the item that Pacific Biosciences has officially launched their SMRT sequencing platform.  I'd too eye-deep in various projects to figure out how off schedule that is (I think nearly a year from their original target), but now it is launched.  
Beta testing of the instrument involved 11 machines in the field, and apparently another 44 are on order.  Clearly a key to potential commercial success is to empty out the backlog and start landing new customers.  The next key will be identifying those customers.
I'll confess I'm a bit of a bear on the commercial success of PacBio.  It's not because of skepticism of the technology itself; SMRT is truly amazing stuff.  Real-time, single-molecule imaging of polymerases in action, tens of thousands of such systems at a time.  The demonstrated ability to potentially extend this to other enzyme systems such as ribosomes (though with the attendant challenge of finding a profitable application for such ability).  But the test for any company is to sell a product in a competitive marketplace, and a $800K pricetag doesn't make that easy.  So, I love the machine but am skeptical of the market: perhaps this is the panda bear view of PacBio.

A first challenge is that PacBio must get machines out to meet demand; during their going-public documents this was identified as a key hurdle and the potential that even modest demand could outstrip manufacturing capacity.  

But after that, the critical bit is to identify markets which can support the machine and to then demonstrate utility in those markets.  The system has three salient positive characteristics which can sell machines: extremely long reads, relatively fast sample-to-sequence time, and the ability to directly detect modified or non-canonical bases.   Then it has two huge handicaps: the pricetag and the low (~80%) single-pass accuracy (the monster size and weight aren't a plus, but if you really need a machine you'll create the space!).

The extremely long reads, around 5Kb currently according to one source, put it far ahead of any other commercialized sequencing system.  LifeTech has the StarLight technology which can apparently go farther, but it's doubtful that is progressing based both on rumor and the fact that Life is pouring so much sequencing resources (money & expertise) into pushing Ion Torrent ahead.  Given the huge potential there and the murky commercial waters for PacBio/StarLight class machines, that's a reasonable decision in a zero-sum game.  Still, one wishes Life wasn't the master of three different sequencing technologies (SOLiD, Ion & StarLight); if the other two were elsewhere they wouldn't risk being starved for corporate attention.

The clear market for ultra-long, but low raw quality, reads is in de novo genome sequencing.  I'd expect pretty much every academic and commercial genome center to get one SMRT sequencer.  But, even with the ongoing up-K-manship of the genome projects (1K human, 5K arthropod, 10K vertebrate), there's only so many genomes to stitch together.  Looking another way, if you really could keep one machine busy 90% of the time, my math comes out to $700+K in reagents a year ($100 per SMRTcell * 24 SMRTcells per day * 365 days * 90%), plus library construction costs.  For that cost, on the order of 400Gb of data (at 50Mb per run).  If I've done that right, that's around $0.50 per Mb (or $65+K for a human genome at 30X coverage).  So, how many centers will really need two instruments to feed their appetite?  For a genome center, that will tend to be the key question: their metric is sustained throughput.

PacBio could do itself a huge favor in this market, and pretty much all others, by getting more papers out.  So far the only real application paper published is the cholera sequencing (more on this below), with the demonstration of circular consensus sequencing paper sort of in a grey area.  They really need to show what you can do for genome assembly with 5Kb reads.

An interesting twist on the long reads is the strobe sequencing mode; turn the laser on and off and acquire islands of sequence separated by probabilistic gaps.  But, again this is an approach not well documented in the literature, and nor the informatics to profitably utilize such information. Finally, this application will probably require very pure populations of long DNA molecules, or else there is a risk of "rounding the horn" during a dark phase (going around the terminal hairpin) and acquiring data on the opposite strand from the original islands.  In any case, PacBio really needs publications to demonstrate this aspect.
What other applications could ultra long, noisy reads be put to?  Broad haplotyping is one possibility, though the error rate could really hamper this.  Haplotyping out of strobe sequencing might be even trickier.  Another application would be sequencing complete (or nearly so) long mRNAs, but with the limited throughput of the instrument in terms of molecules (about 15K per SMRT cell), without a good enrichment protocol this might not be cost effective.  Again, publications could show the way.

As PacBio had demonstrated last year, because their templates are closed circles one can continuously read around the end of a molecule (what I called "around the horn" above but PacBio calls circular consensus sequencing) and keep going in the other direction.  For molecules known to be shorter than the read length and with no dark phases, this allows the multiple reading of the same regions, enabling high consensus accuracies.  The catch is that the cost per basepair goes up with the redundancy.  However, for applications such as short range haplotyping or finishing, this mode could give some real competition to Sanger sequencing and the 454 platform -- if you have an instrument already and a need to do this en masse (or from populations, in the case of Sanger).  Again, an application without documentation in the refereed literature.

What PacBio has seemed to focus on is sheer speed; this was the thrust of the cholera paper and surrounding publicity.  Such speed could be very valuable in either medical settings or for monitoring of pathogen outbreaks.  However, is PacBio really a speed demon?  In the cholera case, they had the luxury of starting with purified material from cloned organisms.  In a real medical or surveillance situation, the time to generate that material must be counted.  Systems such as 454, Ion Torrent and MiSeq could well beat PacBio in a true environment-to-informatics race (though curiously Roche has not put much marketing muscle in this direction), as the PCR used to amplify the targeted sequences can incorporate the sequencing primers in the design.  It isn't obvious how the SMRT hairpins can be so easily attached, though again if PacBio has a clever solution they need to get it in the literature.  Ideally, a lab with multiple such systems could run a John Henry-style head-to-head race (ideally with no casualties) or an international sequencer racing association could form.

Getting a slice of the speed market could be valuable, as this is a market where burst capacity (maximum) matters more than overall capacity in designing a facility.  Hence, a really serious facility might consider more than one instrument to sustain high short-term sequencing rates, as well as to ensure availability.  But, they are going to have to beat the competition by a huge amount.  If you want a large burst capacity, would you rather buy multiple $60K setups or $800K ones?  PacBio's price tag would suggest that only the largest medical centers, public health labs and central testing facilities would build out for this instrument.

What about the ability to directly read methylation?  Given the cost of sequencing an entire human genome, this will only be used widely if there is a means to enrich for methylated sequences.  So PacBio really needs to prove they can successfully convert ChIP-seq or related protocols to their platform, eschewing the PCR which commonly shows up in them (since PCR will destroy the methylation marks).

Will PacBio succeed commercially?  I hope so.  They have already roughly tied the first single molecule company (Helicos) for instruments in the field.  Helicos had trouble in many areas, including documenting multiple interesting applications in the scientific literature.  But, like Helicos the machines come with a hefty price tag and the company has already gone public, with all the attendant distractions such as quarterly reports and millstones such as Sarbanes-Oxley.  I'd love to see PacBio succeed and put 100+ more instruments out (as I some analysts are suggesting) in the field this year, but to me that is an unlikely outcome.

3 comments:

Anonymous said...

I have yet to see evidence of the really long reads PacBio claims.

How do they keep from damaging the polymerases with the high-intensity UV they need to light up the fluorophors?

Anonymous said...

They use visible light, not UV. That helps a lot.

Anonymous said...

Getting ready to use PacBio for 1kb reads/targeted sequencing. I've had enough problems with RainDance that I pray sequencing will work without too many issues. How is PacBios software? The issues with RainDance were more related to lost orders and customer service debacles. Wonder if I'm alone in this ??