On the genomics front, the outbreak has demonstrated how quickly bacterial genomics can be run on the current class of instrumentation. BGI Europe knocked off the sequence in a series of Ion Torrent runs in 3 days, and a group at University of Muenster worked at similar speed with the Ion setup as well. Later, sequences have come in from the Illumina and 454 platforms. The public release of this data has engendered a number of public analysis projects.
What is surprising is who is missing from the pack: Pacific Biosciences. PacBio made a splash last winter with a quick run at the Haitian cholera bug and talks from Eric
Schmidt Schadt have suggested biosurveillance as a market of great interest to the company. So it is quite surprising that they haven't made an announcement. Now, it is possible that they are holding back with a view towards publishing in a journal which frowns on advance publicity (let's hope PLoS makes sure to encourage the open analysis effort to publish within their pages!), but I suspect not. More likely is the problem that there just aren't enough PacBio RS instruments in the field, and not enough connections to make sure that PacBio headquarters got a sample quickly.
Contrast this with OpGen, which has now generated a physical map of the outbreak bug. OpGen has a cool single-molecule restriction mapping technology, but one which I think is in very great danger of being overtaken by sequencing-based mapping approaches. OpGen's big challenge is convincing researchers to buy an expensive instrument which does exactly one thing. In particular, PacBio could well start threatening OpGen's market if they can straighten out strobe sequencing, and other approaches (such as HAPPy mapping or colony-free large insert cloning) could also push it out of the way. To succeed, they need to claw out a foothold before those other technologies become common. The real nightmare for OpGen would be for one of the nanopore or electron microscopy sequencing companies to start generating long reads with some useful data.
The public deposition of the Ion Torrent runs also gives an opportunity to get more data on the current state of this platform. I've done a quick analysis of the 7 BGI runs and 8 from U Muenster/Ion by mapping them to one of the available assemblies. For a number of applications, this is an important estimate: how many reads can I expect to get. Both groups had averages a bit over the current spec; 109K for BGI and 143K for UM. But, they also had standard deviations of over 40%. BGI actually had only two runs over the 100K spec, but one of these had 218K mapped reads. Their worst run was 72K mapped reads. UM on the other hand showed wider swings; 3 runs generated >190K mapped reads, but one run topped out at 41K.
Useful read lengths, estimated by summing the number of match positions in the CIGAR strings for mapped reads, varied quite a bit as well. BGI had between 40 and 68% of their mapped reads delivering 100+ alignable bases, whereas for UM this range was 27-41%. Neither saw a run where fewer than 60% of the mappable reads had fewer than 80 mappable bases.
I haven't tried to calculate error rates. Yes, that's a huge omission, but the problem is I'm not a specialist in E.coli any more (and haven't been for a long time) and am not sure what to trust in the assemblies. So I'll leave that to others.
Jonathon Rothberg apparently spoke at the Personal Genomics meeting here in Boston (I'm saving my conference time & fees for the big cancer genomics meeting next week, so I must rely on press reports for the Personal Genomics meeting) and sketched more grand vistas for Ion Torrent performance: 400 reads by year end, $1000 human genomes early next year and other tantalizing dreams. But it's too late to dissect those now; perhaps tomorrow night.
[Corrected Jun 16th to fix the stupid name substitution I made in the original, as pointed out by Karol. Very embarrassing, especially with the high frequency that my own name is mutated]