Oxford Nanopore has been stepping up deliveries of PromethION instruments, with quite a few more groups posting pictures of their new boxes -- sometimes still in the crates. So my previous estimate -- which was a few boxes short -- is now quite out-of-date. Perhaps the stock analyst who used that number will pay attention (and maybe cite me as more than "a blogger" next time!). Clive Brown has estimated that 40 will be in the field by the end of the month, though Clive is always a bit optimistic about his crew's shipping schedules (case-in-point: the new ligation kit is now slated for end-of-May, roughly two months late). The instruments can run 24 flowcells, or half of the design goal of 48.
Groups that have posted results seem to be quite happy, with 48 runs delivering between 50 and over 80 gigabases per flowcell -- so a 30X human genome can be reliably generated with just 2 flowcells. . In the first few hours, data races out at a 2.5Gb/hour clip.
I believe the current speed record for going from a patient sample to a clinical VCF file is about 17 hours on Illumina HiSeq 2500 in Rapid Mode. So that's a useful benchmark. Since Illumina builds the reads one base per cycle, you can't easily speed that up by throwing more machines at the problem.
What's interesting about PromethION is that the sequencing is so fast -- 450 bases per second -- and so you really can generate a faster 30X genome by throwing more machines at the problem. If libraries generate 2.5Gb per hour, then if you threw one fully-operational PromethION at the problem one could generate the requisite data (once a library is ready) in about three quarters of an hour.
Of course, that would be quite expensive. The pricing for PromethION flowcells depends on how big a commitment you make, but even locking in the best pricing of $625 each (by pledging to buy $1.8M worth) that genome data comes in at $30K -- and that's not including any other costs. Technically one could wash the flowcells and run other things, but its still an expensive proposition.
But perhaps there are other points on the curve. With that low pricing, 4 flowcells would be only $2500 but could still deliver the 30X genome in about 8 hours (using Wouter de Coster's tweeted yield plot),
Now, just about anyone reading this would point out that this is just the beginning of the cost -- and the time. So a complete racing program would start with a cell pellet and end with a VCF file. Some sort of criteria (considering both false positives and false negatives) would need to be devised to define a minimally useful VCF file. But that's the general idea.
So clearly not long, ultra-careful DNA preps; something relatively rapid will be needed. On the compute end, there's a lot of interesting space. How bit a cluster (perhaps in the cloud) do you throw at the problem? How much computing do you attempt to do during the run -- or do you save it all for the end? Do you try to use read-until, as Clive Brown has suggested, to more carefully sample the genome and therefore get to a lower total coverage but still deliver useful results? And all those runs were 1D not 1D^2, so is the hit in accuracy a big problem?
So on the one hand this could be seen as just a stunt or a spectacle of much ado over speed in something with no practical purpose -- just like the events in Louisville and Indianapolis this month. Or, it can been seen as a worthy technical challenge, one that pushes methods and practices in very useful directions. Which actually can happen in auto racing -- one of my childhood friends once pointed out that rearview mirrors first appeared in racing circles.
So raise the bugle and the checkered flag! Pour a mint julep and a glass of milk and cut some roses. Ladies and gentlemen, start your sequencing engines!