Monday, May 23, 2011

MiSeq's First Light

Someone was kind enough to send me a copy of a poster by Illumina reporting results from the MiSeq.  Now, to be very upfront, by someone I mean "a person from a PR firm contracted by Illumina" and "kind enough" that she was doing her job.  I don't have illusions about motive here, the author list is all from Illumina or Epicentre, but this was a poster presented at the recent Cold Spring Harbor Biology of Genomes meeting.  It certainly isn't peer-reviewed data, but it is something. Of course, we can't know to what degree these are cherry-picked results.  If you want to be really cynical, call it messaging and not data.  Yes, I've taken some flak in the comments recently about how favorable my coverage of Ion has been, and I'm trying to adjust.  And don't worry; I have some new bones to pick in that area.
So, with that said it is a chance to see how Illumina is positioning the system commercially. The poster presents a small amount of background material and touts speed and ease of use, but the core is really five different data vignettes.  

Two of these, one presenting a human library and one an E.coli library, are side-by-side comparisons of data generation with MiSeq and HiSeq.  The human case shows off 2x101 base reads on each instrument ,and the numbers are very comparable.  For example, for read one & read two of a pair HiSeq scored 80/73 percent perfect and MiSeq 82/76 percent perfect and 76.9% of bases >Q30.   A 2x151 run of a human library (not shown with similar HiSeq data) achieved 72/50 percent for perfect reads. HiSeq achieved 961K clusters per square millimeter; MiSeq 944K (unfortunately, nowhere was the total number of reads stated; who keeps flowcell areas in their head?).    Both had very similar coverage profiles by GC content; with a small enrichment for reads in the middle %GC range.  
For E.coli, it appears they absolutely oversequenced the poor beast, with 137X coverage for HiSeq and 140X on MiSeq.  When assembled with Velvet, MiSeq's longest contig was 265K and overall an N50 of 132,865 was achieved -- in only 12 contigs.  
Based on a few conversations, this parallelism with HiSeq is striking a chord.  Labs already possessing an Illumina instrument are likely to go to MiSeq for their fast sequencing, as this will mean reusing the same library prep protocols and being able to QC libraries on the MiSeq without any special handling.  Ion has been smart to try to develop library QC as an application, especially since this is a far less quality-demanding task than genome sequencing or variant calling, but once MiSeq is out it will be a tougher sell.
Two vignettes show off direct amplicon sequencing.  Bacterial 16S sequencing with 4x150 is shown as finding different diversity in various samples (I'd like to see more on the set of dog-owner pairs!).  Another is in my neighborhood; amplicon sequencing various tumor and normal samples, including the dreaded formalin-fixed paraffin embedded, for two cancer mutational hotspots (KRAS codons 13/14 and BRAF V600E).  In this assay with 2x77 reads, a background frequency of calling positives was estimated at 0.45%, which isn't bad.
The fifth vignette shows off the Nextera library generation system which Illumina acquired by buying Epicentre.  In this case, a PCR fragment spanning a structural variant in a YRI trio was generated in the paternal, maternal and child samples and then converted to libraries with Nextera (to pick a nit, it most certainly wasn't "pooled, indexed and converted to Nextera libraries" as the poster states; pooling must have come after indexing and indexing after Nextera).  This is the first case I've heard of using Nextera on such small fragments; the long form is about 350 and the short about 150.  
I was at an event last week and met an analyst for a Wall Street house (I honestly forget which one). He felt that Epicentre could have pushed for a higher price than the $90M it commanded, based solely on the value of the kits.  If that is the case, then Illumina truly got a steal, as the strategic value of Nextera will be huge -- not only will Illumina have it, but nobody else will.  
We're just over a month away from the beginning of the third quarter, which is the quarter in which Illumina says it will start shipping MiSeqs.  Now, I have gotten more cynical about ship dates, and figure any marketing person will consider the job done if one leaves the loading dock on a chartered truck at 23:59 on September 30. But, if you are the competition you really can't afford to count on that or on schedule slippage.  Even if bulk shipments aren't until the end of the year, that would give Ion about 6 months to get their performance specs up, their array of applications in place and ideally not have a mob of frustrated bioinformatics folk chasing them with torches.  


Shaun said...

In terms of read numbers: I was at a talk in the Whitehead by Gary Schroth from Illumina today, and he claimed 5 million reads per MiSeq run.

James@cancer said...

The Nextera amplicons are Illumina's attack on Sanger sequencing, bye-bye CE!

I've been comparing MiSeq to HiSeq flowcells over at and it looks like you might ultimately get 1/3rd of a HiSeq flowcell lane from MiSeq. That's a lot of data!