It's always a trick writing pieces about new technologies, and I'm aware of a number of factors which can influence my take on a given story.
An obvious one was Oxford granting me an advance interview with top personnel. Being treated like a real journalist is one thing, and getting to write in advance is a big help. Knowing a big secret, even for fewer than 24 hours, can be a bit heady. There is always a risk that at some level, perhaps unconscious, I'll lean away from offending a company out of misplaced gratitude or worse.
On the other hand, there is a prior that must be assigned to whom I was speaking. I've known Ewan Birney for a long time, having been very impressed with his Dynamite paper at ISMB back in 1997. I even hosted him in my apartment when I was trying to recruit him to Millennium. Ewan has done a lot of amazing stuff and is pretty damn blunt in his speech; he's someone I'd trust to not be a tout. I hadn't met Clive Brown before, but he has a reputation as a straight shooter, and perhaps more importantly he and much of the same team launched the Illumina technology; he 's someone with a favorable track record.
An even more complicated calculus comes from the performance specs. Now, as regular readers of this space may have discerned, I believe that a fair evaluation of any sequencing technology is dependent on the application it is going to be put to. Any given technology will be right for some applications and wrong for others, and so it generally irritates me to see people dismissing a technology because it won't fit their pet application. But, the flip of that is I may go easier on something that looks like it might fit my problem du jour.
If I was still back at Infinity, I'd have a mild interest in Oxford because the claimed error rate is pretty high for reading out clinical mutations, though it must be said that most of those errors are deletions in specific (though unspecified) sequence contexts. Much as with Ion Torrent's homopolymer issue, this can sometimes not be an issue if you are looking for in-frame missense changes; if you get a frameshift you toss the read.
But, for my current shop Oxford pushes an awful lot of buttons. A slightly less enigmatic website has gone up for us, which gives me a better idea of what I should and shouldn't reveal. We're doing a lot of de novo sequencing of small genomes, and with just fragment short reads they just don't assemble well enough. So, one must make mate pair libraries -- which are DNA hogs, slow, labor intensive and sometimes still don't solve the problem. So perhaps you go to longer mate pairs, meaning more DNA, more work and more time. I was going to be thrilled with Oxford if it offered substantially worse than PacBio accuracy; the idea of getting 50Kb reads of any sort from minimal input is just too intoxicating. Throw in an apparent intolerance to DNA purity and you really have something.
There's also, of course, just getting too drawn in by the technical details. Each one of these sequencing technologies is pretty amazing, with all sorts of gadgetry bordering on sci-fi.
But in reflection, I wouldn't change much in the piece, other than to emphasize more "if"s and calls for data release. One person I consulted wondered if Oxford could have potentially overfitted on lambda and phiX, and I wished I had thought of that on my own. I must say I am glad I did not go with my original title. I thought long and hard about the title, as I like something that fits and ideally has some word play. The initial thought was terse (8 printing characters!) and pseudo-alliterative. But, it used an expression someone my age probably shouldn't attempt and would certainly fuel the feeling I wasn't being objective. Yes, it was fun to think up but the wrong title would have been: ONT? OMG!
In any case, Oxford really needs to release data pronto to allay the skepticism. It won't cure it entirely; many will think it has been cherry-picked. This has been a refrain with each new sequencing technology: in previous years it was PacBio or Ion who were slow to release data (which makes Jonathan Rothberg's complaint about no data a bit, well, interesting). The only real solution to that is to let some independent labs generate data. Of course, if you want that data propagated quickly and publically, you might be wise to let a blogger take a crack at it...
There's a second reason Oxford should think hard about getting data out soon, no matter how messy it is. I believe that bioinformatics makes a difference. Support by the academic software community is a critical asset for any sequencing technology. This isn't to say that commercial tools aren't important, but in the end they offer only a narrow spectrum of capabilities.
Now, many tools are relatively platform-agnostic. However, there are a lot that are not, sometimes in subtle ways and sometimes not so much. This was very clearly true with SOLiD, which had a completely different data format (colorspace) which required special handling. Tools supporting colorspace were slower to appear than Illumina-oriented tools, and it can't have helped SOLiD.
Specialized error modes are an even bigger problem. For example, Illumina reads have a low indel rate and are therefore very suitable for de Bruijn graph strategies for sequence assembly. Contrast that with 454 and Ion, with their frequent indels. There is a huge flock of academic (and mostly open source) assemblers tested and tuned on Illumina data; only a handful for 454 and Ion. Worse, what many claim to be the best assembler for Ion data (Newbler) is controlled by 454, and therefore not available to most Ion users.
I believe this story has repeated itself with the other two existing platforms, Helicos and PacBio. Both have tried to have semi-open user communities (as has Ion), but neither has much specific software support. I believe there is a virtuous circle which few platforms have succeeded in: the availability of a wide variety of tools drives more use of the platform, which in turn drives the generation of more tools. Diversity of tools for a given application can be confusing, but can also lead to improvement (competition is a good thing!). Diversity of applications supported is even more important, as someone's minor niche may grow into a major application.
So, if Oxford wants to get this virtuous circle rolling, they need to start feeding it data ASAP. A variety of data, from a variety of organisms. ONT data is likely to fit poorly into most tools; they won't be prepared for the read lengths and the idiosyncratic error profile will require adaptation. But, the ONT long reads will spur novel applications, which will need novel tools to support them. Indeed, I generally feel that a technology like Oxford's can succeed primarily by generating completely new applications that are well-suited to it's idiosyncracies; applications enabled by Oxford's strengths and tolerant of its weaknesses.
A last thought: for any company considering doing this, please learn from the laggards. Ion, PacBio, SOLiD and Helicos all tried to create custom environments for data release and software developers. Areas requiring registration and learning new passageways. Your team might think those systems are intuitive and simple, but they are invariably a pain and off-putting (try using wget to pull data from IonTorrent's site!). Do yourself a favor: build a registration-free data release site and just use SEQAnswers as the discussion forum. Your marketing people will grouse you've missed an opportunity to collect data, but just ignore them. You will have something far more precious: lots of ADHDish programmers trying to play with your platform's data.