Wednesday, May 13, 2015

PacBio's New Sample Prep Plan: Too Late to the Dance?

Pacific Biosciences had a string of announcements around its earnings release last week.  Of particular interest is a collaboration with RainDance to develop a new sample preparation system for generating long synthetic reads from minuscule inputs.  If some of that sounds familiar, the loose outline in the press release suggests an approach similar to that of 10X.  But is this proposed system arriving too late to the party?
PacBio has been incredibly important in the resurgant interest in high quality genome assemblies driven by long reads and the value of true de novo sequencing of human genomes rather than just resequencing them.  PacBio has driven this by both enabling the generation of reads which are tens of kilobases long as well as releasing highly innovative bioinformatics approaches to applying these reads to solve important problems such as microbial and diploid genome assembly.  

Unfortunately, like many pioneers PacBio may be rewarded for their pathfinding by being run over by later entrants to the market.  By demonstrating the scientific and business value of long reads, PacBio's success has encouraged others to follow.  There are two approaches which represent different threats to PacBio's dominance in long reads

One threat are companies such as 10X which enable Illumina sequencers to generate long synthetic reads. This enables the existing Illumina sequencer user base to access long reads after only a relatively modest investment (modest, that is, compared to acquiring a PacBio instrument). 

The other serious threat is Oxford Nanopore, which is only in a very limited release through their early access program.  While Oxford users are still seeing a wide spread in performance, the most experienced labs appear to consistently achieving yields of 2Gb of raw data per flowcell.  At present, the yield of high quality reads may be only 10-30% of that, yields continue to go up, and those "2D" reads (mostly) have accuracies as good or better than PacBio.  Pricing and timing of the full launch of MinION remain murky, but with those yields MinION is perhaps a few fold more expensive per high quality long read than PacBio.  Good MinION libraries also deliver very long reads more closely approximating the input size distribution, rather than the exponential decay in read lengths seen with PacBio.  Oxford reads over 100kb have been mentioned in public, and ones far longer hinted at.

Two announced extensions of Oxford's platform cast a darker shadow over PacBio.  Oxford has begun talking about a fast mode, in which DNA transit speeds are substantially increased but (it is claimed, with no public data yet) sequence quality is preserved.  With fast mode, Oxford is suggesting a MinION could generate 40Gb of data in a single run.  With the sort of pricing Oxford has discussed previously (on the order of $1K per MinION flowcell), this would have a cost per base far lower than PacBio, even before calculating the extra overhead and capital costs of the bigger instrument.  Even greater economies may be delivered by a second generation MinION, with more pores per flowcell, and the PromethION which gangs 96 flowcells in parallel, but (suggested) at less than 96X the purchase price.  Oxford's London Calling user meeting is later this week, and is anticipated to be a forum for further detail on these refinements, as well as perhaps library preparation methods that eliminate the need for significant molecular biology prowess.

The PacBio and RainDance announcement was short on details, but would use picoliter droplet partitioning (RainDance's strong suit) to perform sample preparation on very long input fragments.  The ideas is akin to 10X, but with much longer reads, attempting to reconstruct 100kb+ input molecules by creating many barcoded libraries.  Input material will be amplified in what is claimed to be an unbiased manner. Each library can be error corrected and assembled independently, generating a set of high quality long synthetic reads for assembling the entire genome.  This will presumably make the bioinformatics significantly less resource intensive; instead of having to compare every read in a sequencing run to every other one, the problem is subdivided by the number of libraries.  Inputs are projected to be as little as 1 nanogram, extending PacBio's reach into biopsies.

PacBio is pitching this solution as superior, given that historically PacBio has shown very little sampling bias.  However, since the new system will employ amplification (though not PCR) and some sort of in-droplet fragmentation, whether those processes will remain unbiased remains to be seen.  Furthermore, whether 10X and similar approaches really have an issue with bias also remains to be seen -- and most importantly whether the market is bothered by any bias that is present.  To date, Oxford appears to exhibit little or no sampling bias either.

However, very few details are in the press release, and in particular any sort of idea of timing or pricing.  Since both 10X (as well as Dovetail) and Oxford are already being trialed, this is an important issue.  If PacBio and RainDance had announced this collaboration a year ago they would have been leaders; now they appear to be playing catch-up.  Pricing is another key factor, both for the actual instrument and reagents, but also to factor in the required oversampling in sequencing, which increases the number of expensive PacBio reads (relative to Illumina per basepair) which must be generated.

Can PacBio preserve their leadership in long read sequencing?  It is far to soon to conclude that they will be passed by the new entrants, but unless this new collaboration can start delivering very quickly, it is difficult to see how it can help them hold their lead.


Brian Krueger said...

So you think synthetic long reads that still can't get at certain important SVs are going to be enough to dethrone PacBio?

I think most of us are just going to wait for true cheap long reads and the synthetic stuff is going to fill a small niche market only in the short term.

I too look forward to better data output from long read sequencers that don't require multiple micrograms of input material for human scaffolding. Nanopore does seem closest to that if they really can deliver 40GB+ of PacBio equivalent data.

Geneticist from the East said...

We need true synthetic long reads that can resolve repeats. The synthetic long reads we have now are only good for phasing.

Mark Chaisson said...

It wasn't worth it in terms of throughput to go to that dance in the past. With an RS targeting 16Gbp/day this year, that may change. If the cost is low and the protocol is robust, it definitely changes how I think about the future of de novo assembly.