But in any case, paired ends are here. But the whole affair has an air of being a throwaway. The datafiles are somehow a bit scrambled in their names (the BAM files contain different reads than the FASTQ files with similar names, though there is a pattern to the mismapping). Ion's TMAP aligner doesn't yet support paired ends, though supposedly BWA will align them. The read datasets are only the short chemistry and only for fully overlapping reads, in which the library inserts are about one read length long. So it's a useful advance for the platform, but one which feels incomplete.
A component of the Ion Server software recognizes these and merges them. Unmergable reads are emitted as an additional FASTQ file. So after this preprocessing, the reads fit into the standard workflow. But, this leaves out the opportunities available for non-overlapping paired ends, particularly for structural variant detection, haplotyping and de novo assembly. TMAP would need to support paired ends to make the first two work; Ion doesn't have a blessed open source assembler (the wonderful MIRA suite is probably the closest to that, though I don't believe it has paired end support for Ion yet).
What Ion shows in their application note is that they are able to reduce indel errors by about 5-fold on an E.coli DH10B dataset using this procedure. That is quite significant, as the indels are the bugaboo of the system. However, only this summary information is given; it would be valuable to have a much richer dataset with a greater breakdown. For example, how does this change with the length of a homopolymer?
It's unfortunate that Ion didn't post a paired end long read dataset. Paired 240 base reads could enable generating ~400 bp merged reads, which could be quite useful for de novo assembly and other tasks. This is still a major domain for 454 sequencing, and Ion could start chewing in at much lower cost. Yes, Roche does now offer much longer reads in the kilobase range, but 400bp Ion reads would eat up the markets where 1Kb is a nice-to-have and not a must-have. However, one issue could be the yield of such reads. I'm not sure what fragment size range the current Ion bead prep supports (and BTW, for anyone thinking they can't go longer should consider that both Roche and GnuBIO are up in the kilobase range with ePCR) -- again, that's information I can't get routinely due to the stupid tiered security system on the Ion Community. Furthermore, perhaps only a fraction of inserts would have high quality long reads on each end. Still, for some applications the price for such data could well be worth it; Ion should be trying to build out in that direction.
There are several reasons the paired ends are worth some attention from their development. First, read lengths are proving more challenging to advance then perhaps expected; one old GenomeWeb quote from Jonathon Rothberg predicted 400 basepair reads by the end of 2011. But perhaps more importantly, MiSeq is potentially poised to make some serious noise in this space. MiSeq supports 2x150 reads out of the gate, with several open source tools supporting merging those if the inserts are small enough. As I noted previously, a number of groups have demonstrated 1x300 runs or even longer; Eric Olivares has estimated that the cartridges are filled with about 360 cycles of reagent.
Now, one commenter previously suggested that MiSeq can already support 300x2 paired ends, but that's a bit of an overstatement. MiSeq, as I understand it, comes with pre-filled reagent cartridges. So to go to 300x2, Illumina would need to make available some super-filled cartridges (I really have no clue what the maximum volume the onboard reagent cooler would support). I think it would behoove Illumina to create an advance access program to get such cartridges out to experimenters unafraid of risk. Similarly to what I described above, 300x2 reads on ~500-550 bp inserts could be very valuable for de novo assembly, and would start compressing 454's space (which is also a bit vulnerable to attack from PacBio, with it's very long & noisy reads or shorter high-quality circular consensus reads).
It's also worth noting that MiSeq has an operational edge in the Paired End department, which is why Ion can't let it get too far ahead in performance -- and really should be shooting to always stay ahead. Paired End sequencing on MiSeq is part of the automated workflow and requires no operator intervention. Paired End on PGM apparently requires the chip to be taken off, some enzymatic steps performed, and then the chip returned to the instrument. Of course, the details of this are far too dangerous to be trusted with the hoi polloi, so I can't tally how much work is really involved. Still, it can't beat no hands-on time.