Omics! Omics!: CoolMPS Revealed

Having summarized MGI's announcement they are launching into the U.S. market this spring and started digging into the performance characteristics of MGI's instrument lineup, let us now turn to their BioRxiv pre-print on the CoolMPS chemistry, as it has many useful technical details.

CoolMPS Terminator Chemistry

Illumina's sequencing-by-synthesis platform, as well as QIAGEN's defunct one, use fluorescently labeled reversible terminators. In an ideal world the fluorophore would be a component of the terminator chemistry, but that just doesn't work in the real world, presumably due to steric constraints. So these nucleotides have the fluorescent moiety hanging off the nucleotide somewhere. Two implications of that ithere are separate chemical steps to remove the fluorophore and terminator residues (though they might be done simultaneously) and that the removal leaves some sort of chemical 'scar' on nucleotide; there just isn't a chemistry to remove the fluorophore and leave behind a native nucleobase. Those scars build up and are believed to interfere with further polymerization. The chemistry to make these reversible terminators is also complicated, as you need to jam two different items onto a nucleotide and those could conflict with each other and always in chemistry increased number of steps means lower yield. Some of that lower yield can be problematic: if there are dark terminators that lowers signals and if there are bright un-terminated nucleotides that can lead to dephasing.

MGI's CoolMPS is still a reversible terminator but they are unlabeled. Instead, there are specific antibodies to detect each of the terminator elements. The more I think about this the more audacious an idea it is and the more boldness I assign to MGI for actually pursuing this. Their terminators are simply native nucleotides except for a 3'-O-azidomethyl terminator group. They immunized rabbits with that linked by N-hydroxysuccinimide to keyhole limpet hemocyanin, a standard trick for raising antibodies to small molecules. But in the end they needed an antibody that recognized the terminated nucleotide as the last base of a base paired 3' recessed structure! Not at all what they immunized with, and also the other strand is presumably masking many of the distinguishing chemical features of each nucleotide. But they have succeeded

The pre-print details the screening process, bringing back memories of a rocky undergraduate class I had in laboratory immunology. Spleens were removed from the immunized rabbits and cells extracted. Initial screening by ELISA led to promising clones. Further screening used the antibodies in actual sequencing reactions with readout via a labeled secondary antibody. Heavy and light chains were cloned out of these to allow production in the 293 cell line. Labeling at this time is just by random labeling, so they had to titrate to find highly labeled antibodies which retain high specificity. This points to one substantial advantage of antibodies over nucleotides: nucleotides can only accept one fluorophore each but you can load up a huge protein. It apparently isn't just steric considerations either; trying to multiply label a nucleotide can lead to quenching effects.

An interesting bit about the antibodies is that they are recognizing in part the 3'-O-azidomethyl terminator moiety; take that off and the signal drops precipitously. This was tested running some cycles with detection first and terminator cleavage second but then running some cycles with detection after cleavage. MGI gives indication that it is possible some of the binding may involve "end breathing" of the DNA, so perhaps the terminator is not paired when recognized by the antibody. Binding can be as fast as 30 seconds at 35-40C in a low salt buffer; longer times did not improve signal substantially. High pH and temperatures over 55C were found to remove the antibody efficiently, particularly if unlabeled reversible terminators are included to serve as competitors.

Bright Advantages of Antibody Labeling

As noted above, one huge advantage of labeling an antibody over labeling a nucleotide is the ability to load up the antibody with multiple fluor moieties. The pre-print describes up to 3-fold greater brightness of antibody labeling over labeling on the terminator.

There are many parameters which may be explored to enhance the chemistry. Protein engineering on the antibodies could lead to even better binding or enable brighter labeling by non-random methods. All the different times and buffers may be amenable to further optimization.

The pre-print looks at possible quenching effects from scars on the nucleotides that arise from labeled reversible terminators. In their prior chemistry, there is a suppressive effect on the G signal if the previous base is a T. Unfortunately they don't show the structures expected for the scars; this would be interesting and I don't remember seeing it previously. With CoolMPS, no suppressive effects are seen.

In a test of real sequencing, they tried 200 cycles using DNA nanoballs (the rolling circle replication-prepared globs of DNA that serve as sequencing templates) prepared from a 300 basepair mean insert length E.coli library. The pre-print shows both a plot of the signal attenuation as the number of cycles grows and what they call the positional discordance, the amount of noise in the signal compared to the reference base. They attribute most of the positional discordance to out-of-phase signal being confused with dye cross-talk and signal loss particularly in DNA nanoballs with low template copy number. Ultimately they estimate that after 200 cycles 30% of the total signal is -1 or +1 out-of-phase signal. Curiously, there is a note in the figure legend mentioning that after cycle 185 the discordance measure shows a sharp increase that is probably due to short inserts that are reading adapter. It certainly would have been appropriate to use a much better size-controlled library for this test, and alternatively to have generated insert-specific references incorporating the adapters so that this important parameter could be accurately measured.

A test of overall accuracy used 100 basepair reads on a PCR-free E.coli library. Since DNA nanoball construction does not involve PCR -- and Illumina's template prep does -- MGI is keen to highlight possible errors from PCR. Picking the best areas of the array in terms of fluidics and optics and DNA nanoball size, they find overall discordance of 0.029%. They estimate that bases assigned a phred quality score of 20 or better have an error rate of 1 in 20,000 aka phred 43.

One benefit of brighter signal is high sequence quality even when the DNA nanoballs have low copy number one gets a bright signal. To test this, MGI used a library with 400bp mean inserts, which in a 10 minute rolling circle reaction generate about 50 copies and an error rate of 0.055%

Paired End and Longish Reads

MGI has also devised a paired end strategy. Standard DNA nanoballs are many copies of only one strand, as they are prepared by a rolling circle mechanism. For paired end sequencing, a primer is used in a multiple displacement (MDA) reaction to generate complementary strands. Because it is MDA, the polymerase will generate multiple such strands, hanging off the original DNA nanoball in a hyperbranched geometry.

In a 2x100 test, the second read suffers significantly lower signal and a drop in accuracy, but still generates useful results with over 99% mapping rate. This is one point where it would be particularly useful to have an available dataset to independently analyze; the lack of such a dataset is a serious shortcoming of the pre-print.

The other direction they try is to go for unusually long reads for a sequencing-by-synthesis system: 400 bases single end. Now, in this day of PacBio and Oxford Nanopore one can't call those long reads, but they are longer than anything Illumina or Ion Torrent has ever offered -- though 454 did have kits giving up to a kilobase (though I'm told only a tiny minority would be that long). 91% of these bases were Q30. Again, it would be useful to have a dataset to tackle as I'm not in love with any of the provided statistics. One interesting note is that after 300 bases they expect only 30 template copies are still giving correct signal, though this anecdote would be better if a number of starting template copies was given. They propose that with increased template copies per nanoball, better labeling and chiseling down of the out-of-phase incorporation and signal loss, that they can get to 500 or even 700 base reads.

Four Color Sequencing on Two Color Imaging

For a while most of Illumina's instruments have used two-color imaging, which requires less time for imaging but leaves G bases with no labeling at all. Hence data coming off the strain factory's NextSeq and NovaSeq instruments sometimes has long runs of Gs -- meaning long runs of no signal.

MGI points out that with their chemistry they can simply use two rounds of labeling, with one reading two bases and the other reading the other two, and all are labeled. In addition to simpler hardware, this has the advantage that they can pick the two dyes which are the best separated by the available imaging hardware. Also, since they are well separated, the need for expensive bandpass filters is eliminated, which also means signal from the fluors is not discarded by such filters. MGI calls this approach "four color CoolMPS on two color imaging" or 4cs2ci, with the standard as 4cs4ci. For 2 color mode, they first detect the purines A and G and then after stripping detect the pyrimidines C and T.

Comparing data generated with 4cs4ci and 4cs2ci there are slightly more reads after filtering with the 2-color scheme. Looking at overall errors the improvement if 5.1-fold on average, but that ranges from 1.7 fold for miscalling what should be T to 18.6 fold for miscalling at G positions. Looking at four of the twelve possible specific miscalls, G to C errors were reduced by an eye-popping 52.2 fold. I wish they had included the full set of 12 miscall modes, and again releasing the datasets is the best way to quiet such kvetching from the likes of me.

Summarizing the Advantages

Near the end the pre-print has a table summarizing all the advantages of CoolMPS and CoolMPS on the DNBSEQ platform over competing approaches. It's definitely in a very squishy zone between objective summary and advertising copy, but useful regardless. So unlabeled nucleotides are credited with lower synthesis cost, higher incorporation rates and less interference due to scars. Antibody detection is touted as enabling higher intensity labeling, less DNA photodamage since the labels are further away and the advantages of the 2-color scheme noted above. Then goes the sales pitch for DNBSEQ: PCR free and no barcode swapping, high densities with 200nm DNA nanoballs, high density per spot leading to imaging advantages, and something definitely not explored in the preprint of "over 90% of DNBs made are loaded without overloading", which is claimed to enable lower DNA input.

In the test accompanying this section, they point out that the strong signal seen with low copy DNA nanoballs suggests they can generate smaller DNA nanoballs to enable tighter packing and more spots per flowcell. They also note that it would be possible to label antibodies with multiple dyes and require a basecall to have the correct signature from the two dyes. For this they reference a patent issued in 2009 which I believe has been part of the basis of their countersuits against Illumina

They also present the now somewhat obvious extension of 4 color in 2 optical channels to 4 color with one optical channel. Presumably this is used in their small benchtop cube sequencer under development (the DNBSEQ E series)

I'm hoping to get a chance to chat with some of the technical experts from MGI's team during AGBT and there's also Rade Drmanac's closing talk promising $100 human genomes. Even if that doesn't excite you, just imagine the hunger for reads that is being stoked by interest in single cell and spatial sequencing. But don't worry, this won't be a channel devoted to MGI -- there's going to be a target rich environment at AGBT and I plan to cover all the exciting stuff. So please stay tuned and keep the comments flowing!

2 comments:

TorbenWednesday, February 26, 2020 5:04:00 AM
400 bases single end. Now, in this day of PacBio and Oxford Nanopore one can't call those long reads, but they are longer than anything Illumina or Ion Torrent has ever offered

Ion Torrent supports both 400bp and 600 bp reads, however, only for the smaller chips (PGM and S5 510, 520, and 530).
AnonymousWednesday, February 26, 2020 9:50:00 AM
It’s not ‘scars’ limiting the length thats an old myth from Solexa days, its photo damage that craps out the system over time.

Saturday, February 22, 2020

CoolMPS Revealed