Thursday, April 04, 2013

For What is a 454 Still Great?

I've been mulling this item ever since AGBT, but have struggled with the title.  I don't want to sound like I have a grudge against 454 ( truth is I just got some good datasets off this technology), but I do believe that they are few papers away from being stampeded.  Or perhaps not; perhaps the community is really wedded to this platform.
Now, wedded (or welded) to a sequencing platform is an issue I don't have.  Since all our sequencing is outsourced, I have the flexibility of choosing platforms freely.  Yes, there are informatics differences that mean it is not cost-free to try a new platform, but I certainly never have a risk of falling for a sunk cost fallacy.  As a result, between Infinity and Warp I've run data on virtually every sequencing platform currently available (Helicos & Ion Proton are the exceptions).  I match projects to platforms, attempting to pair them off in a way that maximizes scientific values (balanced against the required spending).

454 deserves a lot of kudos; it was the first sequencing system to be commercialized in the "next generation" space.  A lot of great science has been run on the platform.  However, in terms of performance it has seen primarily improvements in read length, not density.  As a result, it has not seen the geometric productivity gains of other platforms, and the per cost run is quite high.  

What is particularly dangerous for the 454 platform is a two-pronged attack on what has been its defining characteristic: read length.  The full size instrument reads around a kilobase now, with the benchtop GS Junior topping out around 400-500.  

The problem is that on the low end Ion Torrent and Illumina keep moving upwards.  I've heard from others that the Ion 400 bp kits work (whereas the 300bp kits were atrocious; I actually tried that one to dismal results).  MiSeq now routinely operates in 2x250 mode; for a 400 bp fragment (such as an amplicon) that means the weakest data is read twice, enabling correction & merging.  The buzz now is that Illumina's new basecalling software doesn't choke on low diversity material such as amplicon libraries, solving a major problem for Illumina -- and removing one possible justification with running Junior.  On the other hand, a MiSeq run generates mountains more data than GS Junior for similar cost -- actually generating far more than the big 454 system generates in a run.  Roche had a poster at AGBT on modifying Junior to handle the kilobase reads, but that can buy only so much breathing room..  But it is notable that a regular stream of announcements of clinical sequencing assays seem to roll out daily, and they are dominated by Ion and MiSeq (off-hand, actually, I don't remember any recently mentioning 454).  Illumina & Ion long ago pushed past the size range needed for FFPE sample (e.g. cancer clinical specimen) analysis.

Or should it buy any?  The other assault on 454 comes from above, with the surging performance of the Pacific Biosciences platform.  We recently ran a number of fragment libraries on multiple chips, and 25K-45K reads per chip -- but running a SMRT cell is only about $300.  For $10K an academic can get 1M reads on the big 454 -- so 30 SMRT chips is about the same price as one 454 run and each will deliver about 1M reads.  It also makes correcting PacBio reads with 454 reads seem a little silly, but at least one commercial house touts this strategy as money-saving.

The catch is the low initial quality of the PacBio -- but with the read lengths getting routinely long, their circular consensus sequencing (CCS) mode should deliver similar performance to 454.  If multiple groups start reporting this, then 454 is in deep trouble.  PacBio CCS should (theoretically) have a consistent error distribution rather than a skewed one, and if your amplicon is bigger than 1Kb you're not stuck with a hard limit.  Furthermore, PacBio has indicated that they will roll out an optics subsystem upgrade this quarter which will double throughput (only half the SMRTcell is imaged currently).  So if PacBio is borderline equivalent now, the upgrade (if kept on schedule) should really make PacBio a contender for the long amplicon market.

Not that I expect a rash of RS sales.  If PacBio is going to succeed in pushing Roche aside, then more folks need to hop on the "rent-not-own" a sequencer bandwagon.  PacBio recently started listing prominently available service providers for their platform, which should help get this going -- but it will require a mind  shift for many researchers.  In particular, there are almost certainly owners of instruments who don't recognize them (or refuse to accept them) as sunk costs.  

If Illumina/Ion (for shorter pieces) and PacBio (for long amplicons) can beat 454 on price/performance, then what will slow the shift away from pyrosequencing?  One will be proof: some bold labs will have to stick their neck out and run experiments on the new instruments and publicize the results.  Another will be informatics.  If you are comfortable and cozy with existing software suites designed to deal with the specific error issues of 454 you'll probably want something different to deal with new data.  Finally, there is some argument to be made of  running a single platform across an entire project, though if you have a really long-term project that is hardly realistic.

So, I'd honestly like to hear constructive comments from 454 fans as to why they don't buy the arguments above.  Have I missed something?  For what sorts of projects is having a sequencer in lab really critical, so critical that it worth paying an enormous premium on running costs?


5 comments:

Anonymous said...

Currently 454 is only useful for one thing: genes or parts thereof that are within the 500-700bp read length and for which phase information and accuracy is important (polymorphic, polygenic). So is there still such a market? Yes, there is: every hospital that does tissue (HLA) typing for (bone marrow/organ) transplantation does sequence-based typing, as do most blood centers. Currently the gold standard for this is still Sanger-sequencing. As far as costs go, anything less expensive is still a win in this market. A market Roche has been heavily focused on, and seems to have an edge at the moment. Of course given the amount of money in transplantation/insurances most of the large Centers will switch to any other long-read platform (PacBio) the moment accuracy is good enough for a clinical setting, since price of their instruments will not be an issue, and turn around time is much faster.

Dirk Dittmer said...

We still run the GsJunior for two reasons, and maybe I am simply not using it right.

(1)
When we do De-Novo assembly with viral genomes, only the single long read length gets us through repeats. We have done the side-by-side and Illumina paid end 200 failed in de novo assembly. We used HySeqs and even with 10 fold as many reads we could not across repeats, because crossing repeats don't scale linearly with coverage.

(2)
I love the PacBio (we been mapping both 454 and Illumina back for error correction on 10K and 1 K runs). But the enormous dataglut from the Illumina is maxing out our server (probably should have multiplexed, but then Illumina is cheaper than the manual effort and bioinformatics overhead of multiplexing). It is actually not trivial to align a HiSeq Lane to a PacBio run.

How about this for thought: The driving cost in NextGen sequencing is no longer the instrument, but the amount of server you have to buy to handle each particular platform. For the 454, it is two days from DNA to Figure on a desktop station.

Andrew said...

Thanks for the great analysis. PacBio keeps promising a bunch of great new publications are just around the corner...but as you allude to, I just don't see a path to a profitable business model for them - where are all those incremental instrument sales going to come from? I don't see it, even if they have fixed the performance issues

Anonymous said...

We run a GS Junior and miSeq in my lab. We had our Junior first (by about two years) and developed several good assays that still work well. Many of these have migrated to the miSeq, but there are cases (esp. with viral genomes and highly complex immune genes) where 400bp single-end reads are more valuable than 2x250bp paired-end reads.

I expect that our GS Junior will continue getting occasional use for the next year or two, until it is gradually replaced by improvements in miSeq (and other, newer) technologies. If you have 100 amplicons to sequence, 1000 reads per amplicon is often enough.

If someone is contemplating buying a new machine, I wouldn't advise purchasing a GS Junior if they could get a miSeq instead. But if they already have access to one, or have protocols that are working well on one, it can still be a capable platform.

James@cancer said...

Hi Keith,
Some "ifs", but I don't think they're too big...

If Illumina get to paired-end 500bp on MiSeq and if they can push this out to HiSeq 2500; what will the community do with 500M 1000bp reads for a few $1000?