Monday, March 04, 2013

PacBio Back of the Envelope Numbers

Back-of-the-envelope calculations can be quite useful, but also quite dangerous.  They are meant to be quick estimates, but can't be taken too seriously.  Still, I try to get them right & deeply regret overestimating recently on Twitter the cost of a human genome on PacBio by 3X.  Twitter is particularly dangerous: tempting to fire off a note, but impossible to pack in the full calculation
I've been running a couple such calculations over in my head which I think are mildly interesting, though they are again just rough cuts & improvements PacBio says are imminent will shift these.  I am thinking about that platform a bunch lately, both because it can theoretically tackle some important problems I have and my first dataset suggests that it can really bite into them (even if not quite completely; first try got down to 10 not 1 contig).  

Right now, it takes around 10 SMRTcells to generate 1Gb of data.  Compare this with 454, which approaches (but does not quite hit).  Using the UCSD page for PacBio pricing, that means about $4100/Gb on PacBio, ignoring library cost.  If you are running bacteria, the library cost does play a significant role in the cost, about $600/sample when you throw in all the key extras (such as MagBead).  Looking at another academic site, 454 runs about $10.8K/run.  So, 454 is not quite 3 times as expensive per gigabase.  Of course, 454 gives higher quality data.  Still, it would seem that even now a 1Kb circular consensus sequencing library on PacBio is starting to be competitive with 454 in cost; someone with experience in that space would have to give quality.  If PacBio can really deliver an optics upgrade this spring that doubles the number of ZMWs imaged in a run, then PacBio would probably surpass 454 on price/performance in a core facility setting (where you can ignore the amortization cost of the instrument) for 1Kb reads.  That means that in the near future, if not already, the 454 may have no applications in which it is clearly the best choice (a potential topic for longer treatment in the near future).

Looking at this another way, if it takes 10-12 SMRTcells to generate a gigabase on PacBio using 120 minute movies, than each machine can generate around a gigabase per day, ignoring downtime.  I'm on the mailing list Amanda Murphy of William Blair, and if I've read her last few notices correctly the worldwide installed base of PacBio instruments is about 70.  So the entire PacBio community, ignoring downtime, can generate around 70-80Gb  per day (if need only 10 SMRTcells/Gbp, then high end of estimate).  The current specification for a dual flowcell HiSeq 2500 2x100 Rapid Run, which takes 27 hours, is 100-120 Gbp.  In other words, at this time the entire PacBio installed base delivers somewhat less than the throughput of a single HiSeq 2500, though 1 of each instrument would cost around the same, somewhere in the $600K-$800K range (I don't have any quotes to go by), not including any floor reinforcing required for the PacBio.

This isn't meant to be a knock on PacBio.  Depending on your application, as I have seen, those long reads off a SMRTcell are far more valuable than cheap reads off an Illumina.  But it does put things in perspective. A huge investment in RS instruments (somewhere around $50M) has yielded a very special capability, but not very much of it.  But, perhaps enough of a capability.  For some applications, getting such high quality assemblies is critical.  How big an appetite exists remains an open question.  For common bacterial genome sizes, each instrument can crank about 2 per day (at the high end of the 50X-100X coverage suggested for HGAP assembler), meaning the world base is capable of about 40K microbes per year, assuming 80% instrument utilization (anyone know if that number is realistic?).  Of course, not all instruments are used for microbes and some microbes are much bigger (indeed, from my viewpoint it's either big or boring), but again it is one way to look at the available capacity.  Still, are there really research budgets out there for 40K high quality microbial genomes in the next year?

Given the positive buzz from AGBT on using PacBio for microbial sequencing, some have asked how many more machines will they sell.  I hope the focus changes instead to higher sales of consumables.  Now, all those instruments are not equally available to everyone in the research community; a number are owned by companies that reserve them for internal use.  But, given sequestration and overall tight research budgets, a very apropos question to ask is whether the world of RS instruments available to academics in general are being utilized to their fullest.  Given the excitement, it is reasonable to expect a pop in consumable sales as more researchers jump on this bandwagon.  The upcoming instrument upgrade and continuing refinement of library preparation should enable the existing machines to do more. Until machine utilization goes so high as to create undesirable wait times,  justifying the purchase of new instruments is difficult.  So, for the company that means a focus on consumables.  It's a very expensive razor, but PacBio needs to focus on selling the blades, not the handles.  

5 comments:

contig said...

Nice post! You may have seen I am thinking along the same lines: http://flxlexblog.wordpress.com/2013/02/11/applications-for-pacbio-circular-consensus-sequencing/

bioduediligence said...

Nice analysis. Your price range for a PacBio instrument is a bit low, if anything. Your data also explains why PACB has no real path towards profitability without a massive increase in the installed base, which just seems implausible. RS is a nice tool and can be used alone or paired with ILMN for certain applications, but will never find much of a home outside of large academic sequencing centers.

Wynajem kasyna warszawa said...

Masz dobre zdanie na różne tematy.

Anonymous said...

Seems like official stats are often less than what people yield in practice. e.g. Duke tweet with 658 MB from a single cell with P4/c2. So even though the official throughput of the new P5/c3 chemistry - with N50 read length of 10kb - is only 350 MB per cell I would expect actual yields to be higher. As high as 1 GB per cell even?

Keith Robison said...

I agree, PacBio is actually conservative on what is achievable on their chemistries. However, the variance can be quite large between samples due to differences in insert length or getting the loading concentration right, so there is good reason to not rely getting monster runs every time.