When I was at Millennium and the company was going great guns on EST and BAC sequencing, we had a huge room full of sequencers and the associated colony pickers, DNA preparation robots and the like which could generate a megabase or two (or three) a day.
Next I was at Codon Devices, and one small room held colony pickers and a hallway held three ABI 3730 capillary sequencers, and that was a megabase per day.
Neither of my next stops have had their own sequencers, but the world quickly shifted. Massively parallel sequencing approaches from 454, ABI and Illumina meant an end to colony pickers and huge DNA preparation robots. The original Solexa was rated at 1 Gigabase per run, which meant megabases per hour not per day. The numbers have kept going up, but even smaller instruments appeared -- a modern MiSeq generates more data in less time and at lower cost than that original Solexa, while occupying far less space.
However, for really big projects big iron has still been popular. Complete Genomics did go down the factory route, as I wrote about long ago. Last year Illumina rolled out the 10X system, a set of at least 10 souped-up HiSeq instruments (now available in packs of 5) at 1M per instrument. Not only does this cost at least $10M upfront, but the cost advantages of the beast are only attained by keeping it fed with 10K-17K genomes per year, which at $1K/genome is at least as much as the purchase price.
Earlier this year Complete Genomics (now acquired by BGI) announced they would launch two sequencer systems, one for small jobs and one for "nation-scale" genome sequencing. At the European Human Genetics Conference this weekend, the big unit was announced. Unfortunately, BGI hasn't seen it appropriate to reach out to the blogging community, nor are there a wealth of details. But some general characteristics of the machine can be grasped.
First, it is an end-to-end beast. Included in the $12M pricetag are robots for sample preparation and library prep, which are fully automated. The instrument has a rated throughput of 10K genomes per year, with planned expansion to 30K/year. Sequencing is based on Complete's ligation (cPAL) technology, which gives very short reads (28 bp paired end) using 48B spots on a patterned flowcell. Complete had a somewhat involved mate-pair system to get more information out of those short reads (though I'm blanking on the details); it is unclear if that is supported in the new system. As far as I can tell, Long Fragment Read technology for haplotyping is not in the package. Turnaround time is unclear. The system can start different sequencing runs at different times, though it isn't clear how many independent units are available. Both human whole genome and human exome sequencing will be supported by the integrated software, which performs the full range of analyses. Included in this is local reassembly of reads which don't perfectly match the reference. The system requires 1500 square feet (140 square meters) of space, which was about a third of the listed space in the original starbase; this is not a startup-friendly instrument!
My reaction? Well, I certainly wouldn't have picked the name. Clearly an attempt to combine Revolution and Velocity, it's a bit to get off the tongue. The end-to-end solution aspect will appeal to groups that aren't experienced in genomics, and indeed the two outfits which have signed up already are not well-known in the genomics space. The multiple independent units may appeal to some, but if you are really going to run full throttle planning for that will take over your scheduling.
A looming question is how data from the Revolocity will stack up against Illumina's X10. Both are claiming approximately human genomes at around $1K, but are these equivalent genomes? Given that X10 offers 150bp paired end, or 5 times the read length, what sort of variation can X10 detect better? For example, how well can you call STRs (or what is the upper limit on allele size) on each instrument? Complete had in the past released a number of datasets, and I would urge them to release some standard human genomes as raw data (such as the Genome-in-a-bottle reference).
It will be interesting to see how well Revolocity's promise of end-to-end automation and low staffing resonate with the marketplace. I realized from this I don't have any idea what library preparation methods are popular with X10/X5 sites. For example, a small number (3-4) NeoPrep devices could in theory keep a HiSeq X10 farm happy, though these require DNA to be sheared upstream of the NeoPrep. Hopefully the end-to-end, any biological sample aspect of Revolocity will not just be a lure for those with purchasing authority but far removed from actual operation.
Which gets to the big question: how many more institutions are really looking to plunk down $10+M in capital for a lab that burns another $10M-$20M a year in reagents (plus the rental cost of the floor space!)? At the outset, BGI's machine appears to be slightly inferior on throughput to Illumina's -- but will prospective buyers care about that difference? BGI/Complete is also coming in quite above the 5X system from Illumina, ceding the market for folks who want to do only 5K-7K genomes per year.
Off in the mist is a very different vision of population-scale human genome sequencing. Oxford Nanopore proposes that their PromethION device, running with an array of 48 flowcells that each sport 4000 pores and running at the future higher speed, will meet or exceed the throughput of X10 or Revolocity at potentially far lower cost. Coupled to the Voltrax sample/library preparation device, end-to-end might be possible. More importantly, this would offer long (but noisy) reads, capable of resolving far more complex structural variants (if the input material is also long). All of this with the footprint of an iPad! But, the important catch is that while Oxford has now demonstrated success with the original MinION, PromethION, denser chips, higher pore speeds and Voltrax are all future releases.
Which vision of population-scale genomics will dominate the next few years? An integrated factory of uber-short read sequencers? A factory of somewhat longer (but still short) read sequencers with user-defined sample and library prep? Or long, noisy reads run on something that will fit in an overhead bin -- but not available (if at all) for many months? In the span of two decades we've come a long way from dreaming of a 1Mb/day factory -- but do we still need a factory?