Wednesday, September 14, 2011

Illumina Calls for a Flag on the Play

Continuing my sports analogies, but switching games, in my coverage of the benchtop sequencer war today.  Alas, I can't refer to instant replay, as the usual set of procrastination excuses has resulted in this being filed very late after I was made aware of it (first by a comment in the blog, then by a friendly chap from Illumina alerting me).  In any case, Illumina has responded to Ion Torrent's claims on long reads and overestimated MiSeq quality, and mostly done so by crying "Foul!".
The items title starts the barrage: "Analysis of Inaccuracies in Ion Torrent’s Long Read App Note".  Some of the key complaints with Ion's analysis are:
  1. Ion fails to include indel errors in their quoted statistics; including these pushes MiSeq quality well above Ion
  2. Ion compared trimmed PGM data to untrimmed MiSeq data
  3. The analysis of predicted quality score vs. observed errors for MiSeq was miscomputed; computation with GATK shows good agreement between the two
  4. Ion's claim that PGM gives better consensus accuracy than MiSeq was based on a procedure that favored Ion data
  5. Ion's claim of higher genome coverage is based on placement of ambiguously mapping reads; Illumina's aligner discards them whereas Ion's aligner assigns them to a random choice from the candidates

Overall, I'd say the most important lesson to draw from all this is that the broader, independent bioinformatic community needs to step in a draw up the rules of engagement through well-defined open protocols for these assessments. Tools such as Mauve Assembly Metrics are a step in the right direction. Still, that's not a complete benchmark suite, nor is it a set of procedures for evaluating data from each platform. The issue of different treatment of ambiguous mappings is a good reminder of a hard issue. It probably is most informative to use the aligner most tuned for a given platform, but ensuring a consistent treatment of issues such as mapping ambiguity require attention to detail.

The complaint about omitting indels from the quality analysis hits close to home, because I've been guilty of this too. Partly this was it was easy to avoid them at first, and partly it stems from the fact that indels don't really fit into the phred score paradigm I've been using (that's a whole 'nother stalled blog post). I've tried to be upfront about that, but it is certainly an issue. In some applications the homopolymer reads can be seen as just a tax on your data. For example, if I know I'm only looking for activating substitutions in an oncogene, those must be in frame and I can discard the reads with indels in the vicinity of my codon(s) of interest. But, in most cases they really are an issue. On the slightly scary side, the Illumina note cites this site as a source of independent information; if I'm going to have that mantle thrust upon me, I'd better live up to a high standard!

So, the slugging continues. With a little effort by we outside observers, it can stay informative and not degenerate into a pure marketing squabble. Consistent and open protocols (with the intent of the analytic design also made clear) will also assist new entrants in understanding the rules of the game.


lek2k said...

Great summary of the latest developments and major points of Illumina's application note. I like the application note because it is a well structured point by point rebuttal. What let's it all down though is that there are NO FOOTNOTES on slide 13, arguably the most important slide.

Anonymous said...

I hope ASHG lives up to this battle, I want to see fist fights and bloody noses. Women pulling hair and kicking !

Anonymous said...

I just wanted to mention that Illumina was nominated by the Science Advisory Board for various categories in the upcoming Life Science Awards next month.