I was relying on memory for my numbers & in the course of writing things I think I also revised those downwards, not out of malice but rather an attempt to be conservative. As a correspondent pointed out, the first pass accuracy on the first commercial system is claimed to be 85%, not 80% as I stated. On the number of reads, I failed to update for the newer SMRT cells which are out; I said 10K and it's probably at least 3X that. However, I do have a bone to pick.
Getting actual numbers on a lot of these new platforms is very challenging; the PacBio press release is effectively devoid of performance statistics. An In Sequence article on the first commercial shipments mentions a total megabase yield per SMRT cell as 35-45 Mb, but doesn't report the read length or number of reads. This is a real pain for those of us trying to either masquerade as journalists or design experiments. While for genome or exome sequencing (and to some degree transcriptome sequencing) that total sequence generated number is important, for a lot of other applications (and particularly the ones I'm contemplating) the number of reads is more critical. Now, of course if I know the mean read length I can back-calculate the number of reads. Or, if I know the loading efficiency I can back-calculate. But PacBio (and Ion as well) love to tout the number of sensors on the chip (probably because it is a big number and fixed) but that efficiency number is harder to find.
Calculating from read length is tricky just since there are several estimates floating around. If I assume 1000bp reads, then we're talking 35-45 thousand reads per SMRTcell. If 1500 is the right number to use, then that changes to 23-30. Both are about 3X higher than what I reported; very stupid I failed to account for the new chips.
Of the two mistakes, the quality one has a lot less impact. It's hard to envision applications for such long reads which are really enabled at 85% but not at 80%; indeed I am convinced the reads could much worse and still be useful. But the number of reads was a painful mistake; this really affects the economics of some projects. For example, it means that about 3 SMRT cells are equal to the original Ion 314 chip in terms of number of reads generated (yes, that is a comparison of sequencers with very different read length and accuracy numbers), though Ion now in some of their materials is claiming 400K reads per 314 chip.
So, unlike Goldman's two memoirs, not funny at all. I slipped up & need to do better -- and particularly if I am deriving numbers I should show my work, so that wrong input values are easy for others to spot. On the other hand, the 3X-4X better read counts than what I described still don't really change the landscape enough that I would change what I wrote in the large. Getting high coverage of a mammalian genome is still going to be pricey; 40Mb is about 1% coverage so ~4000 SMRT chips would be needed for 40X coverage; this just won't be economical to do routinely.
Anyone have some hard numbers from real datasets? Or better yet, real datasets they're willing to share? Those would be far better than trying for finagle the right numbers out of some press releases.