I've been asked several times recently about rumors coming out from BGI. They've started claiming they have a super sequencer which will radically beat Illumina's offerings on both cost and accuracy. The recent 10K Genomes meeting apparently had a quick talk from BGI which led to some limited Twittering, and judging from this Mendel's Pod interview at least one person believes the buzz (though the same individual quotes a price per PacBio human genome that high by at least a factor of 25). . The claim is that this summer at ESHG BGI will release two boxes, one a benchtop model which I haven't seen any details on, and the other claimed to offer throughput superior to a HiSeq with better accuracy. What might be backing up these claims?
The most obvious speculation would be technology based on Complete Genomics (CG). CG had gone for a model of building factories rather than shippable instruments. If this is the core technology, then a bunch of design decisions probably needed to be revisited, but that'life in the engineering world. If BGI is really intending on selling boxes, rather than sequencing-as-a-service, they'll also have to deal with shipping their consumables around the world, a difficult task as Oxford Nanopore has discovered. Virtually everything imaginable can go wrong: boxes are crushed, customs holds up shipments, things are shaken, novel interpretations are read into protocols, etc. It is very hard to project a technology across the world!
CG's technology relied on sequencing-by-ligation, with the innovative "rolony" (aka nanoball) template amplification. Rather than using emulsion PCR (454, SOLiD, Ion) or bridge amplification PCR (Illumina), CG's approach uses a localized rolling circle amplification. The Polonator was also going in this direction, so QIAGEN's long-delayed box may as well. High throughput sequencing's efficiency arises from deriving increasing amounts of data from a given amount of pricey reagents. While read length increases have contributed to this, the major route to throughput gains has been to increase the number of templates sequenced in parallel. This also has the advantage of increasing efficiency for a fixed unit of time; read length increases for a given platform always engender longer run times. Sometimes those longer times can be compensated by improvements elsewhere, but it will always be the case that a given chemistry can run half the readlength in half the time.
Even back when I was a graduate student, George Church was emphasizing that the ultimate goal for any image-based sequencing system would be to read one base per imaging pixel. If BGI can use nanoballs to achieve far greater densities than the current class of Illumina instruments, then they might have an angle. However, Illumina is full of clever people and it would be a mistake to count them out. Who would have thought the same chemistry that a decade ago delivered 25bp reads could do 10 times that, while consistently increasing accuracy and reducing cycle times. 2x100 in a week and a half at the early part of this decade was amazing; getting 2x250 in shorter time on more templates far more so. So I think it is foolish to assume that Illumina has little running room left in their platform. Given some fierce competition, it is likely that the San Diegans will be willing to take bigger risks.
For example, Illumina has achieved high cluster densities on their top-of-the-line HiSeqs (first the X series, now the 4000) with patterned flowcells. Rather than relying on random processes to lay clusters out on the surface, the flowcell has nanoengineered wells, each of which can amplify only a single template by a process termed exclusion amplification. Since the pattern is defined and regular, cluster localization is much simpler. Furthermore, clusters cannot grow into each other, which should reduce the variability in cluster size between densely clustered and less densely clustered regions. This is intended to enable greater tolerance to variations in the amount of library loaded on the flowcell; if exclusion amplification really works, then the system should be insensitive to overloading.
Last years launch of the X10 was their first foray into this technology, which seemed poised for a gradual rollout to the rest of the line. If BGI launches a benchtop instrument, as some of the rumors hold, that could easily accelerate Illumina pushing this to instruments in the NextSeq and MiSeq class (per my previous speculation). That would defend the low end. On the high end, it would seem unlikely that Illumina was aggressive in the cluster density on the patterned flowcells; etch a greater density of wells and the flowcell can support more reads. Obviously, there are a lot of issues involved -- the wells will be closer together and smaller, meaning more strain on the biochemistry and less tolerance for manufacturing defects in the smaller features. Smaller wells with smaller clusters mean less signal.
Complete Genomics has historically had truly short reads, I think more in the 35-50 range than the 100+ that passes for short in the sequencing world. While that can be more of problem for downstream mapping, as the industry demonstrated a long time ago one can get a lot of mileage from short reads. BGI/CG may have decided that very small features (the nanoballs) were the right route, offering very high numbers of templates sequenced. Small means less signal and more noise, particularly as reads "dephase" when whatever interrogation process (sequencing by synthesis, sequential ligation) fails to work in perfect unison on all the molecules in a cluster. Short reads may have been the price to pay for extreme density.
Could Illumina go the same route and abandon hard-won read length gains in favor of extreme density? There's the engineering question and then the business question. On the engineering side, if clusters can support 250 cycles of chemistry they must be very bright at the beginning, but going for extremely high densities would engender challenges both for positioning the clusters densely and then imaging them. Solexa/Illumina I think always envisioned very high densities, but the market pulled them in a different direction. That gets to the business side: could Illumina veer sharply off their chosen path, going for very high densities with short reads, without major disruption of their commercial message?
Sequencing-by-ligation potentially offers an accuracy advantage, as ligases are extremely finicky about insisting on correct base-pairing at the ligation junction. This can certainly be enhanced via protein engineering and/or directed evolution. Furthermore, with a ligation approach the unnatural labels can located far away from where the enzyme is focused, whereas with a modified nucleotide the polymerase is confronted with the oddity. Hence, every sequencing-by-synthesis platform has also worked on engineering polymerases to be as blind as possible to the fluorescent labels, or at least color-blind so that none is preferred over another.
Will BGI be able to deliver on their promises in time to ruin a past prediction of mine? I'd love to see it, but it would hardly be surprising if the announcement in June fills in some details, but one of those details is a much later launch of the system. Even so, the possibility of a short read platform competitive with Illumina should make nearly anyone in the market happy. I even think that Illumina would relish going head-to-head with a serious rival. Whether that will emerge from Shenzhen remains to be seen.