Thursday, August 04, 2011

Ion Throws A Long Punch At MiSeq

The benchtop sequencer wars are heating up!  Illumina and Life are engaged in a fierce war of pamphlets and datasets to convince the world that they have the edge.  I won't attempt to give a complete play-by-play, but hit on the latest developments, which includes Ion releasing a dataset of 250+ bp reads.

Ion is coming out slugging with a new 314 dataset and application note, showing read lengths of over 250 bases (they assure me that it will work also on the 316, but nothing would beat a dataset!).  This is long awaited; as I've noted before for applications such as amplicon sequencing the current fragment sizes of <150 bases and read lengths mostly in the 70s are challenging to work with.  For my own applications, 250 is actually a very good number. When working with formalin-fixed, paraffin-embedded cancer samples, the recovered DNA is generally already sheared to about 450 bases, so you need to design smaller amplicons for good luck.  A bit of 200 will cover many exons quite well (some common longer ones will require two amplicons; there are of course monsters out there that are special cases requiring many) and still leave room for a barcode.  In my analysis of this dataset, Ion has an aligned read depth of over 200K at position 240 (below).

On the Illumina side, a recent application note took aim at the newly released 316 chip; Ion has now fired a salvo back.  A very noisy part of the debate is around read quality, with the two sides to some degree talking past each other by picking different metrics.  Illumina had compared public Ion runs to MiSeq data for E.coli (either DH10B or MG1655), showing higher rates of error-free reads for MiSeq.  They also point to the higher homopolymer indel rates of Ion and show that the Ion homopolymer issue isn't solved by depth; false indel consensus calls are made.  Ion's note looks at substitution errors, and claims that MiSeq makes consensus errors..  But, Ion doesn't address the homopolymer issue at all, other than to say that they have improved 10X over 6 months and to promise a similar improvement over the next 6 months.

Illumina also goes after Ion's claims for good coverage at all %GC ranges, which has been repeated often and also shows up in the supplementary materials of their recent Nature paper.  In that, for three different bacterial genomes Ion showed very even coverage; for human the coverage was generally even but did suffer at very high %AT.  However, for the public DH10B data run on a 316, Illumina plots Ion as deviating substantially from even coverage, whereas MiSeq is closer to the mark.  On Ion's side, their long read dataset claims better overall coverage of DH10B, 99.98% vs. 94.17% for MiSeq, despite much lower coverage (10X for Ion; 421X for MiSeq)

Things get really hairy when talking phred scores.  Illumina's pamphlet emphasizes higher called quality scores by read position.  Ion's pamphlet compares actual error rates by read position (using, of course, a different Ion dataset then Illumina looked at).  MiSeq's plot has some odd structures to it, suggesting sharp drops at particular base positions which are later recovered from.  I've seen similar wierdness in other datasets; indeed one author suggested that such oddities can be specific to a particular instrument (and perhaps related to age). Ion underscores this by showing deviation of called qualities and actual error rates by 5-10 phred points, which is substantial.  Furthermore, my own analysis shows that Ion is still calling under-calling phred scores.  A observed-called plot (below) for the long read dataset for called phred=17 is shown; other called qualities show similar values: Ion is consistently underestimating phred values by as many as 10 points.  So Ion was smart to show actual qualities.  But, note that the Ion piece does not address homopolymer issues.

Clearly, it is important to have independent eyes looking at these datasets, and ideally a community consensus on what the important measures are.  I'd like to build up such a kit, but it will be slow.  In that light, it's  worth noting in this space some third-party analyses of older Ion data ; newly minted blogger Lek2K takes a look at homopolymer calling on test fragments in an Ion run, while Nick Loman shared data from his lab's initial 316 runs.  Having more independent datasets available will also allow judging the degree to which the manufacturer's datasets are hand-picked.   These analyses also set standards which new entrants in the field should pay heed to.  

For those of us who would like to try these out, both are a bit of a tease.  Ion says the long read protocol won't be released until October-ish, and it isn't clear how much of the improvement is on the wet side and how much are software changes.  Ion apparently just upgraded everyone's instruments with new plumbing to support higher flow rates, in conjunction with release of the 316 chip.  MiSeq units have started shipping -- but to early access customers.  Illumina says it has booked 135 MiSeq orders, but no service provider has announced MiSeq support yet -- and demand is expected to outpace supply at first.  Illumina is also saying they expect most customers to either get MiSeq or HiSeq, with not much demand in between, suggesting that the GAIIx may not be long for this world.

[Fri AM update: added 2  hyperlinks I alluded to but forgot (Nick & Lek's blog entries) and to the Nature paper]
[Fri PM update: fixed switched hyperlink & updated one]

5 comments:

Anonymous said...

Similar to an earlier commenter on this blog, I would like to see analysis of a published dataset by an objective party. Without this information buying a new machine is like buying the emporer's new clothes.

Anonymous said...

Illumina has been promoting MiSeq since before AGBT, very much trying to take the wind out of Ions sails. They have had seven months to finish designing and then building 100 or so instruments to satisfy the early demand. If they dnt meet this then a lot of customers who have delayed entering the quick run sequencer market might be givin Life Tech a call in the new year!

Anonymous said...

Illumina strikes back:
http://www.illumina.com/documents/analysis_of_inaccuracies_in_ion_torrent_long_read_application.pdf

Dan said...

Hey Keith,

I just got re-hooked on your blog!

Keep the posts coming.

Dan

NN555 said...

First I don't know how Ion Torrent is going to cope with read-lenght due to steric hindrance of fragments bigger than 151-181bp, as per 316 protocol.

Second Miseq does do 300bp, with a theoretical good chance at a 600bp pair end... Roche here they come!

The pudding:

http://www.illumina.com/systems/miseq/featured_researchers.ilmn?sciid=2011009IBN8#broad-webinar