Tuesday, March 18, 2025

Mission Impossible: Methylomics

Good morning Mr. Hunt.

Today's briefing will have a bit more background than usual - you haven't tangled with biotechnology since that odd little company in Australia with the fancy office tower and actual labs in caves.

As you may know, DNA has four letters or bases which form pairs; A with T and C with G.  It is also possible for the C to be modified by methylation to form 5-methyl-C or even 5-hydroxymethyl-C.  These in humans are always at C followed by G, called CpG, sequences.  There is great interest in reading the methylation of DNA from blood, as this "cell-free DNA' may be a oracle into current and future health conditions.

The best technologies for reading this are the single molecule sequencers from Oxford Nanopore and Pacific Biosciences, as they can read these marks directly with no additional preprocessing of the DNA beyond what is required  by the sequencer to just read the bases, the construction of sequence libraries.  But these suffer from relatively high input requirements, and any amplification of DNA by PCR or similar techniques erases the methylation.

It is possible to read methylation on the popular short read sequencers of Illumina and other companies but only with a trick.  The most popular method is to treat the DNA chemically with bisulfite, a rather nasty reagent; even with your disdain of danger, you really should read and adhere to the MSDS on this stuff.  It converts all unmethylated Cs to Us and so it pairs and sequences like a T; modified Cs are untouched.  Please do not bring up this technique with the bioinformatician joining your team; they are known to rage about bisulfite being "a weapon of mass sequence destruction".  Similar methods using enzymes produce the same result.  Conversion means this DNA can be amplified.  But it also means it is useless for calling genetic variants; a separate unmodified library must be used for that.

Watchmaker Genomics has a clever chemistry that performs a much more limited transformation - only methylated Cs are converted to Us.  An even more clever biochemistry is offered by Biomodal, which copies one strand of a DNA fragment into a second, linked strand and then treats with enzymes.  After sequencing both linked sides, the pattern of matching and mismatching between the two can call variants, 5-methyl-C and 5-hydroxymethyl-C.  

Roche has recently unveiled a new single molecule sequencing technology called SBX, but it requires first copying the DNA of interest into bizarre highly modified "expandomer' form.  So it shouldn't be able to read methylation.  But at AGBT, SBX boffin Mark Kokoris offered, in response to a question on the topic, that there was clearly room for a fifth signal level in their traces and hinted Roche was working on using that fifth level.   But how, if SBX requires copying the input DNA first?

We first thought of sending you in to extract the secret from Roche, but even we can't just go performing espionage on a legitimate company with no apparent plans for world domination; monopolization of the sequencing market by Illumina has never triggered us to action so that isn't a justification.

Instead, your mission, should you choose to accept it, is to realize the impossible by creating a system for creating SBX molecules using a third basepair, and such artificial basepairing schemes have been realized in the lab, so that the set of all 6 bases can be used to resolve A, C, G, T and 5-methyl-C; we will save 5-hydroxymethyl-C for a future mission.  Ideally this would be achieved by specifically converting 5-methyl-C into a base of the third basepair type, but if you devise schemes converting C or T to the extra basepair and then 5-methyl-C into whichever base you freed up, that's acceptable as well.

Our technical directorate has outlined two general strategies which could be independent or used separately.  

In one, a purely chemical approach conversion would change one of the bases to the new basepairing arrangement.  No chemical reaction has ever been proposed in the literature to do such a transformation of one base into an unnatural basepair.

In the other plan of attack, protein engineering would be used to generate one or more enzymes to perform the transformation.  No enzyme is known that would provide an obvious starting point for such a protein evolution campaign.  Perhaps one of the new machine learning models can take a crack, but this is far beyond anything demonstrated with AI-based protein engineering.  

Either way, in the event of success you may be required to disguise yourself when presenting the results at AGBT.

A note about commercial considerations, and we don't mean the usual of making sure you hold your beverages with the label facing outwards.  Whatever process you devise must not have an inordinate number of steps, should not require a fume hood and the cost-of-goods should not be exorbitant.  Nor should your spending during the project; if you should exceed your budget authority the secretary will disavow all knowledge of your 

This message will self-destruct in 5 seconds - or would have if we hadn't spent the micro-explosives budget on DNA sequencing reagents.
 

2 comments:

Anonymous said...

Keith any thoughts about Illumina's 5 base sequencing offering, now purportedly undergoing early access? Claimed to directly convert mC to T without touching C. See here: https://emea.illumina.com/content/dam/illumina/gcs/assembled-assets/marketing-literature/illumina-5-base-methylation-flyer-m-gl-03401/illumina-5-base-methylation-flyer-m-gl-03401.pdf

Keith Robison said...

Over on Bluesky NEB's amazing Laurence Etwiller pointed out an NEB preprint I had forgotten about - they found a 5mC selective deaminase. So something along those lines could be Illumina's approach. Also gives me another idea for how to solve the SBX problem which I should either blog or patent :-)