Auto racing involves applying extreme conditions to many things: engineering, drivers and the cars themselves. This is particularly true with Le Mans, both because it is 24 hours (allowing for driver switches) but also runs in all sorts of weather. You don't have to like racing to appreciate the amazingly practiced teamwork it takes in the pits to swap out tires or other parts in nearly no time at all. Because it is such a crucible for cars, racing often generates technical improvements. Indeed, parts of the movie are a portrayal of the challenges of engineering, the give-and-take with mechanical limits -- and with rule books.
Ultimately races have at least an appearance of simplicity: the first car crossing the finish line after the required number of laps wins, so long as it has followed the other rules.
A recent Twitter exchange about a news item on a rapid clinical test in Europe got me thinking. Wouldn't it be interesting to have a Le Mans for clinical genomics? Now, it probably wouldn't be run the same way -- perhaps the Blue Riband is more of a model with competitors trying at any time rather than congregating in one place. Also, we don't expect icebergs to sink a sequencer.
The current recognized holder of the prize is Stephen Kingsmore's group with a 26 hour pediatric sequencing. Well, that was the original claim -- it may have been shaved down. Unlike auto racing, there is an inherent value to turning around a genome that quickly when one of a new parent's worst nightmares is unfolding: a seriously ill newborn with mysterious symptoms.
So the basic idea I propose is a race against the clock to go from a clinical blood sample (or reasonable simulacrum) to a VCF. Now, there would need to be some reasonable rules set for the quality of the VCF -- I'm out of field so I really can't do a proper job. But there would be some thresholds for both sensitivity and accuracy of SNP calls. I'd also propose that these be restricted to a well-defined exome equivalent, so that if someone wants to race with an exome method they can. Those are the most actionable variants.
Two important guidelines on setting thresholds. First, this is intended to be attainable, not aspirational. A bunch of years back there was an X-Prize for genome sequencing which set the bar so high that no technology has yet reached it. That doesn't interest me; I want to have regular winners. So let's start with where the Kingsmore system was when published. But the thresholds should also be ratcheted to more severity on a regular basis, perhaps every year. Ideally this would be objectively performed -- perhaps that reached by 90% of the qualifying submissions in one year would become the thresholds for the following year.
Beyond that, you can do what you want. The X-Prize had some goal on cost, but that's always a funny number to compute given regional variation in labor and overhead. And real car racing doesn't worry about that: if you want to spend some absurd sum to build a faster car, go for it!
So if your method for processing the data requires burning huge sums on cloud compute, that's completely kosher. Or if an Nanopore entry wishes to run multiple PromethION flowcells in parallel, that's fine too.
It would certainly be interesting to see what a no-holds-barred entry on the ONT platform would look like, particularly since Clive Brown is very competitive and his tweets suggest he likes performance automobiles. As noted, within the restriction of the sample given how many flowcells could be run in parallel? What sort of compute architecture could boil the data down? And is ONT anywhere near able to compete with Kingsmore's system when it comes to sensitivity and accuracy in calling rare coding variants? And wouldn't it be fun to watch video of a pit crew swarming over a PromethION to refuel flowcells?
On the Illumina side (or MGI), the scanning nature of sequencing-by-synthesis means you must wait until it is all in. Or must you? How much pre-computation of the data can be usefully done with only partial reads? For example, if one is sequencing the whole genome but only interested in exomic regions, can reads be excluded from the final analysis based on mapping partial reads?
As noted, the hope is that this would yield more than just fevered competition and bragging rights: there's real clinical value to pushing genomic technology to its limits.