Friday, October 25, 2013

Spanish Prisoner, ZX-81 or Turbo Pascal?

In  the movie The Spanish Prisoner, a brilliant inventor possesses a paranoia that "The Process" he has invented will be stolen by deceitful competitors, and everyone speaks with a highly distinctive cadence. The entire movie is suffused with deceit, starting with the title which is a notorious con scheme akin to the modern Nigerian scam. I spent last evening in some of the space in which the movie was filmed listening to a scientist in that mold (& distinctive speech) describe a process his group has invented (indeed, by lucky chance I helped him find the venue). But many remain unconvinced that Clive Brown and Oxford Nanopore are not themselves the puller of ocular wool.

Lex Nederbragt has a good post on the hubbub which likens the MinION sequencing platform to a washing machine.  If a company claimed to have a radical new washing machine which used its own radical new detergent, would you lay out $1K to try it out?  Sure, the deposit is refundable -- just like the deposits made on behalf of the widows of deceased dictators who contact you via email.  That is the über cynical view; that while I and others loaded liquid into a device and saw squiggles appear on a laptop screen, the fact that Oxford Nanopore still hasn't released actual sequences nor error rates -- and that we took none home last night -- means we might have run the most over-engineered buffer detection system ever built. Perhaps the MinION flowcell lookalike USB memory sticks we got will be the only useful tools to come out of Oxford Nanopore.

To me, the chance to wager a bit of time and not a whole lot of money (in the scheme of things!) to try out a potential ground-breaker is a no-brainer.  Whereas other companies, notably Ion Torrent, have talked about rethinking the way product launches are run, Oxford is really doing it, allowing a diverse group of scientists who are not the usual suspects to beta test their instrument. If it really is vapor, we'll find out soon.  Some product launches go well; some are awful.

I'm reminded of some earlier experiences with trying out the bleeding edge.  In the mid-70s, my brother and father assembled a single board computer kit called the DATAC-1000; my contribution to assembly was limited to sorting resistors by their color codes.  Similar to the KIM-1, DATAC never really took off, particularly when a computer was launched with the same microprocessor but built-in video by two guys named Steve.  A few years later, Sinclair Research launched a new computer, the ZX-81, in the U.S. with two purchasing options.  One was to buy it fully assembled, the other to get it as a kit for 2/3 the price.  Dad saw this as a great opportunity for his youngest offspring, so this time I did all the assembly and soldering.   Moore's Law plots have become cliche, but I could see it with my own eyes: whereas the DATAC had a herd of over 16 memory chips and supported neither video nor anything higher than machine language, the Sinclair had the same memory in a single chip and 4 integrated circuits total, yet had both video and BASIC.  My output with that machine never got close to my ambitions; the membrane keyboard was a hassle, the cassette tape program storage a pain and the graphics limited, but just building it was well worth the money.

A few years later, a keyboard showed up under our Christmas tree, to be soon followed with the second IBM PC clone on the market. Dad had invested in an add-on board with an older microprocessor in order to support a Pascal compiler that was available; that compiler turned out to be slow and painful.  So when ads appeared claiming a $30 Pascal compiler that included an editor and which was faster than anything out there, our attitude was that this was clearly a con job.  Better, faster & cheaper? Yeah, right.  But, $30 wasn't within Dad's mad money limits, so he decided to buy it.  Shockingly, Turbo Pascal was everything it was advertised to be and became my preferred programming language for a number of years, and indeed what I wrote my first few bioinformatics programs.

Is Oxford Nanopore's MinION a con? A silly sideshow? A clever but ultimately not terribly useful toy?  Or a revolution in sequencing?  The only way to find out is to invest some time and money, and  with the lucre part being so low, that just means a bit of time. And perhaps Clive Brown is correct, and no matter what data ONT released nobody would trust it anyway? After all, I work in crazy GC space, which most sequencer comparisons have ignored.  Clive claims the device can tease out all sorts of DNA damage ("it's the first sequencer to read the strand you give it" and even abasic sites; if you are in that field, the most some data would do is convince you to run your own tests.  I won't claim to not be a bit frustrated with the lackof information on throughput or quality (among the rare hard details is their transposon-based single-tube library scheme, as detailed nicely by Yaniv Erlich), but would I rather have a lot of claims but no access for months and months?  No doubt about it, Oxford is sticking its neck way out, and should they provedelusional about their performance it will be disastrous to their reputation.

So stay tuned.  A small army of genomicists expect to get devices late this year. Oxford has also indicated that an API to instrument will be made available. Oxford is way behind their original schedule, and so few will cut them slack if they miss their new public promises. 



Anonymous said...

The squiggles have been around for quite some time now. Both the Akeson and Gundlach labs have published the squiggles. Credit to Oxford for putting things in a nice package, but unless I see them convert the squiggles to sequence, I, sadly, will have to assume they are being evasive and just trying to buy time. It shouldn't be that hard to convert the squiggles to sequence, unless there are horrible demons in the details (I can imagine many reasons there would be). It would also explain why there has been no academic reports of sequencing so far. Only squiggles.

Here is another thoughts to ponder. I read in a candid interview that was posted on a blog that they needed to use multiple pores to get the errors down. How can you ensure the same DNA goes through the different pores? PCR? How does their Y shaped adapter survive PCR? Shotgunning with transposomes won't create clonal copies. Only way is to convert squiggles to bases for every piece of DNA and do a sequence level error reduction. This means post-processing (and pre-processing, to figure out which pore has which error profile, maybe with a known piece of DNA, or maybe they do the calibration with known DNA at their site), so no streaming, high-quality-answer out.

This other thought, I keep having because I hear how secretive and paranoid Oxford are. I am beginning to wonder if it is just a rouse to hide serious deficiencies they are trying, or unable to fix (with lots of investors and the world looking at them). If you look at their entire ASHG performance and early access announcements from this perspective, it makes a lot more sense. Whatever it really is, I am just left with a really uneasy feeling. Keith has captured it beautifully in this post Kudos.

Anonymous said...

From what I have seen GC is a problem for nanopores too. And a good deal of ONT published patents are with MspA, which Illumina has an exclusive license too. As they say, where there's some there's fire. If the data is as great as it is there is no reason for ONT to be as coy as they are. They are stalling for some reason, we are yet to find out why.

Keith Robison said...

With regard to the multiple pores, it isn't strictly necessary to have the same DNA molecule go through both pores; for assembly it could be sufficient to see the same stretch of sequence through each pore type.

I think you are correct that running standard DNA spiked-in to identify the pore types could be used to distinguish the pores on a given chip.

There is no PCR in the standard ONT workflow.

Anonymous said...

I think the idea that the reason behind ONT's secretive approach can only be due to quality issues is naive and shows a lack of understanding of inter company politics. They're a tiny UK company trying to unseat a massive US market-dominating company. There is no advantage to public release of data. The idea that investors wouldn't insist on seeing sequence data is naive and ONT just raised 40 million.

Anonymous said...

Interesting discussion here. I too went ahead and glanced through OxNano's patent applications. Based on my very quick review, here is what I found
1. They have done extensive amount of work on mspA, but Illumina licenced the pore now. Not sure what is going on there
2. The closest I saw anything related to sequencing was a bioinformatics patent application that describes how to convert K mer signatures to sequence - there was't any sequencing data in that application
3. They seem to have some cool motors. Helicases from some extremophiles
4. One new pore called lysenin, not sure how it compares to mspA. No sequencing information in the patent application
5. Some algorithms to build consensus among traces (squiggles, if that is the terminology we are using)
6. There was one application where the did some algorithmic learning on a known piece of DNA, I didn't see that extended to an unknown sample

Regarding the comment that investors probably saw sequencing data, that could be true. Wonder if that caused the rift with Illumina. They clearly convinced Illumina with a sequencing scheme, now they have convinced other investors of a different scheme. Time to convince the users.

Anonymous said...

As I understand it, one of the weaknesses of PacBio technology is a high sequencing error rate. Errors seem to offset some of the advantages offered by PacBio's long read technology. How might ONT be impacted high error rate (I have heard 4%). If many researchers require 99.99% accuracy and if errors are randomly distributed, what would be the average depth of coverage required to achieve 99.99% accuracy starting with a 4% error rate? There must be a way to draw a curve showing error rate vs. depth of coverage to achieve a certain level of accuracy. I do realize that some base calling errors are non-random (systematic), however.

Keith Robison said...

Yes, key with Pac Bio is that most errors are incorporated, so can be corrected with depth. This is the crux of the CCS technique (reading same fragment many times by going round& told the SMRTbell hairpins) and HGAP (error correcting with depth). In several recent comparisons for final accuracy & SNP calling PacBio has fared well, despite a raw error rate on the reads around 15%.

ONT can't read same molecule more than twice, without some trickery. The big questions which remain are the raw error rate and the degree too which errors are stochastic.

Anonymous said...

I thought they read the same DNA twice by having a hairpin loop and reading the complimentary strand?

Keith Robison said...

Yes, you are correct -- you can read the same DNA twice by reading both strands. But not more than twice without some ingenuity.

Anonymous said...

MspA is a natural product so can't be patented in its own right. It can be for an application but in order to infringe that patent the exact same, I mean identical, methodology has to be used.

If MspA has been modified in anyway, even one amino acid, and that's not covered in the preceding patent, then ONT would have freedom to operate.

Arch Robison said...

Though the DATAC did not come with video support, I did design and build a bit-mapped graphics card for ours, using wirewrap and a 6845 (