The items title starts the barrage: "Analysis of Inaccuracies in Ion Torrent’s Long Read App Note". Some of the key complaints with Ion's analysis are:
- Ion fails to include indel errors in their quoted statistics; including these pushes MiSeq quality well above Ion
- Ion compared trimmed PGM data to untrimmed MiSeq data
- The analysis of predicted quality score vs. observed errors for MiSeq was miscomputed; computation with GATK shows good agreement between the two
- Ion's claim that PGM gives better consensus accuracy than MiSeq was based on a procedure that favored Ion data
- Ion's claim of higher genome coverage is based on placement of ambiguously mapping reads; Illumina's aligner discards them whereas Ion's aligner assigns them to a random choice from the candidates
Overall, I'd say the most important lesson to draw from all this is that the broader, independent bioinformatic community needs to step in a draw up the rules of engagement through well-defined open protocols for these assessments. Tools such as Mauve Assembly Metrics are a step in the right direction. Still, that's not a complete benchmark suite, nor is it a set of procedures for evaluating data from each platform. The issue of different treatment of ambiguous mappings is a good reminder of a hard issue. It probably is most informative to use the aligner most tuned for a given platform, but ensuring a consistent treatment of issues such as mapping ambiguity require attention to detail.
The complaint about omitting indels from the quality analysis hits close to home, because I've been guilty of this too. Partly this was it was easy to avoid them at first, and partly it stems from the fact that indels don't really fit into the phred score paradigm I've been using (that's a whole 'nother stalled blog post). I've tried to be upfront about that, but it is certainly an issue. In some applications the homopolymer reads can be seen as just a tax on your data. For example, if I know I'm only looking for activating substitutions in an oncogene, those must be in frame and I can discard the reads with indels in the vicinity of my codon(s) of interest. But, in most cases they really are an issue. On the slightly scary side, the Illumina note cites this site as a source of independent information; if I'm going to have that mantle thrust upon me, I'd better live up to a high standard!
So, the slugging continues. With a little effort by we outside observers, it can stay informative and not degenerate into a pure marketing squabble. Consistent and open protocols (with the intent of the analytic design also made clear) will also assist new entrants in understanding the rules of the game.