Saturday, January 07, 2012

Ion Torrent Pairs: To What End?

Ion Torrent quietly released a set of paired end datasets over the holiday break.  This is a bit embarassing for me, as in my last post on Ion I stated the platform "will probably never have paired ends" and in fact Ion had already announced the protocol.  Oy!  I also missed their mate pair protocol being released, though the document itself is another victim of Ion's incredibly counterproductive security policy.  If you don't own a PGM, you can't access the document -- never mind if you are trying to plan for a potential buy or are preparing a library for a friend/collaborator to run.  

But in any case, paired ends are here.  But the whole affair has an air of being a throwaway.  The datafiles are somehow a bit scrambled in their names (the BAM files contain different reads than the FASTQ files with similar names, though there is a pattern to the mismapping).  Ion's TMAP aligner doesn't yet support paired ends, though supposedly BWA will align them.  The read datasets are only the short chemistry and only for fully overlapping reads, in which the library inserts are about one read length long.   So it's a useful advance for the platform, but one which feels incomplete.

A component of the Ion Server software recognizes these and merges them.  Unmergable reads are emitted as an additional FASTQ file.  So after this preprocessing, the reads fit into the standard workflow.  But, this leaves out the opportunities available for non-overlapping paired ends, particularly for structural variant detection, haplotyping and de novo assembly.  TMAP would need to support paired ends to make the first two work; Ion doesn't have a blessed open source assembler (the wonderful MIRA suite is probably the closest to that, though I don't believe it has paired end support for Ion yet).

What Ion shows in their application note is that they are able to reduce indel errors by about 5-fold on an E.coli DH10B dataset using this procedure.  That is quite significant, as the indels are the bugaboo of the system.  However, only this summary information is given; it would be valuable to have a much richer dataset with a greater breakdown.  For example, how does this change with the length of a homopolymer? 

It's unfortunate that Ion didn't post a paired end long read dataset.  Paired 240 base reads could enable generating ~400 bp merged reads, which could be quite useful for de novo assembly and other tasks.  This is still a major domain for 454 sequencing, and Ion could start chewing in at much lower cost.  Yes, Roche does now offer much longer reads in the kilobase range, but 400bp Ion reads would eat up the markets where 1Kb is a nice-to-have and not a must-have.  However, one issue could be the yield of such reads.  I'm not sure what fragment size range the current Ion bead prep supports (and BTW, for anyone thinking they can't go longer should consider that both Roche and GnuBIO are up in the kilobase range with ePCR) -- again, that's information I can't get routinely due to the stupid tiered security system on the Ion Community.  Furthermore, perhaps only a fraction of inserts would have high quality long reads on each end.  Still, for some applications the price for such data could well be worth it; Ion should be trying to build out in that direction.

There are several reasons the paired ends are worth some attention from their development.  First, read lengths are proving more challenging to advance then perhaps expected; one old GenomeWeb quote from Jonathon Rothberg predicted 400 basepair reads by the end of 2011.  But perhaps more importantly, MiSeq is potentially poised to make some serious noise in this space.  MiSeq supports 2x150 reads out of the gate, with several open source tools supporting merging those if the inserts are small enough.  As I noted previously, a number of groups have demonstrated 1x300 runs or even longer; Eric Olivares has estimated that the cartridges are filled with about 360 cycles of reagent.  

Now, one commenter previously suggested that MiSeq can already support 300x2 paired ends, but that's a bit of an overstatement.  MiSeq, as I understand it, comes with pre-filled reagent cartridges.  So to go to 300x2, Illumina would need to make available some super-filled cartridges (I really have no clue what the maximum volume the onboard reagent cooler would support).  I think it would behoove Illumina to create an advance access program to get such cartridges out to experimenters unafraid of risk.  Similarly to what I described above, 300x2 reads on ~500-550 bp inserts could be very valuable for de novo assembly, and would start compressing 454's space (which is also a bit vulnerable to attack from PacBio, with it's very long & noisy reads or shorter high-quality circular consensus reads).

It's also worth noting that MiSeq has an operational edge in the Paired End department, which is why Ion can't let it get too far ahead in performance -- and really should be shooting to always stay ahead.  Paired End sequencing on MiSeq is part of the automated workflow and requires no operator intervention.  Paired End on PGM apparently requires the chip to be taken off, some enzymatic steps performed, and then the chip returned to the instrument.  Of course, the details of this are far too dangerous to be trusted with the hoi polloi, so I can't tally how much work is really involved.  Still, it can't beat no hands-on time.  


Anonymous said...

Nice review!

Is the Ion Torrent protocol really Paired-End when it appears to simply sequence the other strand of the same fragment? I think of Paired-end giving two independent reads from the same fragment, sometimes overlapping on the ends. The Ion Torrent looks to b read one strand, then read the other strand of the same small fragment (i.e. read the strand just synthesized on the bead). Other than possibly increasing the quality scores, it looks to more than double the run time for little benefit. If they really did a Paired-end run, that would be useful. Also sequencing the synthesized strand could introduce errors. Am I missing something here?

ECO said...

Thanks for the plugs Keith.

2x250 PE officially on the MiSeq by midyear with a 500 cycle kit.

Keith Robison said...

Anon: According to the Application Note, the synthesized strand is extended to the end and then a new primer is created by nicking and some sort of exonuclease trick. While different in the details from Illumina's scheme, it should be logically the same -- you get another read of the same fragment coming in from the other end.

TigerSeq said...

Great write up!
I have been running 2x150 PE runs on a MiSeq, but I am interested in how to setup a 1x300 run using the 2x150 kits that we have. Is this possible? If so, how?