Wednesday, February 26, 2020

MGI Deconstructs the Sequencer

At some fancy restaurants one can get a "deconstructed dish".  As I understand it, as I don't frequent such restaurants, a deconstructed BLT would have the bread, bacon, lettuce and tomato each as their own individual item, but prepared in a novel way which highlights the strengths of each ingredient.  When I got a preview last night of Rade Drmanac's closing AGBT talk on achieving a $100 human genome (reagents price only), that was the vision I had: Drmanac and his team have created their Tx system by deconstructing the optical high throughput sequencing-by-synthesis instrument.
For nearly two decades, since the commercial launch of the 454, a core element of optical cyclic sequencing devices is the flowcell.  These precision devices consist of a solid surface on which the clustered DNA is arrayed and then imaged as sequencing progresses, along with temperature control and appropriate channels and such to enable microfludic delivery of the reagents.  Each specific sequencer, 454, Polonator, SOLiD, Helicos, Illumina, GeneReader, Compete Genomics/BGI/MGI and the rest all have this conceptual element, though differing in many details -- the nature of the DNA (single DNAs, polonies, DNA nanoballs), how they are on mounted (smooth surface, patterned flowcells, beads docking to wells), the size of the flowcell (tiny for the new NextSeq 2000, ginormous for the NextSeq 500 series) and so forth. Flowcells are miracles of modern manufacturing, created en masse to exacting specifications with multiple materials.  

The rest of the sequencer is built around the flowcell.  There are fluid lines fed by pumps and controlled by valves that link to upstream reagent reservoirs and downstream waste repositories.  There's temperature control elements and all the machinery to enable the imaging.  The lines and pumps and valves enable the precision delivery of expensive reagents in minute amounts to that all important flowcell.  By synchronizing all of this, this class of modern sequencing happens.  

To hit their $100 target, MGI has taken the heretical approach of discarding every piece of hardware I just described, starting with the flowcell.

Instead of a flowcell, MGI has just a simple silicon surface.  Taking an 8 inch diameter circular silicon wafer, they lop off three of the petals that would be formed if you drew the maximal square on the disc.  That's going to be a handle.  The square area is patterned with docking sites for their DNA nanoball templates, which are formed by a brief (10-20 minute) rolling circle reaction.  Because they are highly negatively charged, the nanoballs repel each other, so that once one nanoball has occupied a docking site it is highly unfavorable for a second one to bind.  This enables the flowcell to be packed with nanoballs, achieving loadings above 90%.  The Tx uses twice the density of their T7 sequencer so that on that huge surface they can generate enough data for a bit less than 100 human genome resequencings at 30X using 2x150 paired end chemistry in about three days.  The nanoballs also don't generate duplicates that way exclusion amplification can -- and as high as 20% as one speaker at AGBT mentioned (and I have seen in my own data). 

Instead of pumps, valves and tubes to deliver the CoolMPS reagents to the surface, the wafer itself is delivered to the liquids.  Robot arms dip the wafer, using that handle area, in reservoirs of each reagent.  Imaging is in separate imagers, also fed by another robot arms plus some rollers built into the imagers that slide the wafers between the chemistry work area and the imager serving arm.  The imagers are believed to have the widest field of view for any such high resolution, high numerical aperture camera, able to read 8.5 million spots per image.  Only two minutes are required to image each wafer with this camera.  The logic of the design is that there's far less waste of reagent on a typical flow, where only a tiny fraction of the reagents are used (since they are in high concentration to drive the reaction).

At the strain factory we'd call the setup an integrated robotic workcell.  This one is really complex, with three arms choreographed so they keep things moving but never crash into each other.  With everything at max performance, the system can handle 8 wafers simultaneously.  So that's theoretically 700 genomes every three days, 7,000 per month, 84,000 per year. That's roughly the equivalent of 10 NovaSeqs running flat out.  Drmanac also pointed out that by halving the DNB size, enabled by brighter CoolMPS chemistry, and being less conservative with the spacing of the array (the pitch), 64 fold more spots could be placed on the wafer.  In response to a question, he said that they have gotten quite good results with only 15X coverage.  So a future iteration of Tx might deliver over 10,000 genomes per slide!

Tx isn't a specific instrument but a concept which can be realized in many ways.  MGI sees it as a custom solution which will be optimized for each customer.  It operates in a standard laboratory environment; no need for a clean room or extra sophisticated control of ambient temperature.  MGI has had a smaller unit in operation long enough to generate 10 petabasepairs of data.  

Rade Drmanac explained all of this to me in the MGI Lanai Suite, his eyes flashing.  He's clearly proud of what three years of effort has produced and loves to share all the small details: how the DNA nanoball preparations have 100 million per microliter, that its nearly impossible to overload the wafer because of the repulsion between nanoballs and how that enables high loading with low concentration libraries.  How the upstream library prep quickly compresses multiple samples to a single tube, making processing simple.  

Overall it's a radical approach to scaling up sequencing to reduce the cost per datapoint.  Drmanac's team isn't done; he thinks they can probably go even denser, perhaps using smaller nanoballs.  But will anyone else try to copy this?  The trend has been for sequencers to be ever more integrated, smaller and cheaper; this isn't part of that trend.  There's also the question of how big is the market for machines that can generate 40+ terabases per week at a clip of $140K per week or $7M a year .

So maybe Tx won't be the vanguard of a new generation of sequencer design -- too strange, too far-out, too outside the mainstream of design.  But it's certainly fun to watch both the instrument run and how the rest of the sequencing industry responds to this radical rethink of how to execute the cyclic chemistry and optical imaging which dominates the genomics world.


Anonymous said...

There’s nothing radical here. It is de-automation. Commercially it is providing a factory with cheap labour. Complete Genomics started here, having missed the boat on making a shipable instrument, eclipsed by ILMN and others. Nothing new, nothing radical about the “system”. Chemistry on the other hand looks very interesting indeed,

Anonymous said...

Only someone who has never had to deal with stability and contamination issues of dunk baths can think that this is remotely economically feasible. Count me unimpressed as this looks to be nothing than a cheap PR stunt. Despite comments in this article that flowcells are a miracle of modern technology in reality they are very low tech devices that only cost as much as they do because they are made in low volumes with methodology that eschews almost every lesson that the semiconductor industry has learned in the past 50+ years.

Anonymous said...

I guess the blue plastic cat litter tray on the floor there is a 'radical rethink' of spillage management too.

Anonymous said...

I'm curious to know what the comment on 2/28/2020 was referring to regarding the high cost of flow cells. I'm just not familiar enough with these manufacturing processes. Could you elaborate on what methods they could be leveraging?