First, a small confession. I sometimes worry I'll be labeled a PacBio partisan, because I've been a strong proponent of using their technology for microbial sequencing. Their regular progress of improving performance by 2X at regular intervals, along with key informatics improvements, have made this a powerful platform for microbial genome sequencing and have really raised the bar on what one should expect from the typical microbial genome project. If you are sequencing a few bugs and can generate high quality, long DNA, then I think it is clearly the best way to go and at prices that are quite reasonable. But, I think it is important to not dismiss short reads in this area.
One key issue that has been raised by many, and particularly recently by Mick Watson in his blog, is that the cheapest short read platforms deliver bacterial genomes at a far lower cost than PacBio. He estimated the difference at 10X, using publically available pricing at core facilities. Now, the important caveat here is that this statement is whether the difference between the finished products matter. For most microbial genomes, as shown by the recent Koren et al paper, current PacBio reads can close every replicon (chromosome or plasmid) into a single contig. In contrast, many repeats in microbial genomes are longer than what can be resolved with standard short read sequencing, and even with mate pairs it is my experience that there are regions which are difficult to resolve. So, how much do those areas matter? For some types of study, the answer is they are critical but for others more of a nice-to-have. Or, more importantly, for some studies getting a pretty good sequence (lots of contigs) on many isolates is more important than near-finished sequence (contiguous, but with some indels remaining) on a smaller number of isolates. The a few good references and many resequencings model has a lot of life left in it.
The nature of PacBio's remarkable march is also going to limit the price/performance, at least until some other problems are tackled. With the exception of the doubled imaging area in the RS I to RS II upgrade launched this spring, essentially all of the performance improvement in PacBio's platform has come from pushing the read lengths to distances which still boggle the mind. Thirty kilobase reads???!?!?!? Clever informatics, such as the HGAP algorithm and BLASR aligner, can leverage that length to overcome the high (~15%) error rate. Those rare mega-reads illustrate what the platform is capable of; much of the further development is in increasing their frequency.
But pushing in this direction has its challenges. Back-of-the-envelope, PacBio is currently about 7-8 doublings away from the mythical $1K human genome. If the reads now are averaging 10Kb, and at least 2X improvement is gained by loading the flowcell at better than Poisson (which has been discussed openly but is not yet announced for a delivery timeline as far as I know), that would require reads over a megabase in length. This is certainly not the plan (presumably another optics upgrade will come at some point), but realistically there are huge challenges even getting into the 50-100 kilobase range.
A key truism is that it is impossible to read novel sequence longer than the insert size. As PacBio pushes into the >20Kb range, they will be outside their current technology for shearing DNA (Covaris' clever G-tube devices). Shearing above that will probably mean going to devices such as HydroShear, which have been used extensively for cosmid library construction. But nobody made cosmid libraries on the scale of a serious sequencing operation; that particular device is notorious for clogging. Going beyond about 50Kb means going above the size in which DNA starts shearing from simply being a long molecule. Again, techniques were developed to generate and handle DNA in this size regime (to make YAC and BAC libraries), but it is another leap upwards in difficulty and labor. No more simple spin column kits! Finally, for some key samples, such as most clinical cancer samples (and archival pathology samples of all types), going long is useless because the formalin fixation / paraffin embedding process shears DNA.
For small genomes, these future performance improvements may hit a wall: the flowcell and library costs. Barcoding on the RS platform is still bleeding edge, and my one experience with it did not go well. The challenge is that if the polymerase "jumps the gun" and begins polymerizing before imaging begins, then the barcode in the initial adapter will be missed. PacBio has worked on a "hotstart" technology to minimize this, but in our experiment we had many reads lacking barcodes. One solution is to read the barcode also at the other end, but that requires ensuring that reads get that far, which means giving up some effective output as some reads "turn the corner" and read back into the same insert. If output is huge, giving up a bit there might be tolerable, but otherwise for most projects PacBio runs the risk of having a floor price of 1 library construction plus 1 flowcell (if you have samples you are certain you can tell apart, one could just throw them together and decode each contig at the end). Any genome you want for $500 sounds great, but if you have a lot of small genomes to sequence you might want to do better.
Library construction cost is another potential speed bump. Mick Watson was careful to use published prices that can actually be had, but a number of papers have claimed generating short read libraries for $1-$5 rather than a few hundred that a core will charge. Now, this requires making lots of libraries, just as getting really dirt cheap read costs per samples requires piling many samples into a short read flowcell. But, if these library costs can really be reached, then getting a draft genome on Illumina could in theory be had for <$10/sample (and perhaps closer to $5), as long as you are sequencing hundreds or thousands of isolates.
The silliest (in my opinion) perceived issue with PacBio, at least in the short term, is the cost of the instrument. Yup, it's a beast on the budget. But as I have emphasized before, there are many good core labs and commercial providers which offer it as a service, and based on my experience the capacity of that installed base is not fully utilized.
I've written all of the above assuming no major shifts in the sequencing technology landscape. Given the amount of investment in new technologies, such as electron microscopy and nanopores, it will be surprising if no new technology emerges over the next few years. I really hope to witness another technology eclose, but the when of that and what performance will follow remain a mystery.