Tuesday, May 07, 2024

On The Expanding Versatility of Single Molecule Sequencing for Detecting Anomalous DNA

An exciting aspect of true single molecule sequencing has been the detection of methylated bases.  Both Oxford Nanopore and Pacific Biosciences technology generate altered signals if methylated bases are present.  For Oxford Nanopore this is hardly surprising, as it would seem any change in the DNA should alter the complex interaction with the protein pore and it should become just a computational challenge of recognizing that signal.  PacBio is a bit more surprising, but the kinetics of base incorporation are apparently sensitive to the complementary base.  I wanted to point out, though without much deep analysis, three recent preprints that demonstrate detection of other modifications to DNA and thereby enable some interesting applications (and of course, some wild speculations on my part).  It's also interesting because of the overlap between the papers, as they are interconnected to a degree in their methods.

Gotta first toot a tiny little horn of my own here. The potential for nanopore sensing to detect base methylation was suggested long before the MAP put ONT devices into users hands, but I did post what I think was the first example of this in the Nanopore Community.  I realized that an E.coli dataset generated and posted by Nick Loman was in a Dam+ Dcm+ positive strain, and these two methylases have frequent sites in the genome.  If I aligned the data back to the reference E.coli genome, computed miscall rates and then plotted those as a function of the distance from the nearest Dam or Dcm site (I think I did both; it was almost a decade ago) one might see a rise in the error rate around the methylation sites since the ONT basecaller considered an extended window of sequence (and indeed, the pore sees more than one base at a time).   And indeed that was the case!!!  Alas, I think that was the old community, as a search for "Robison Dam" finds nothing.

Okay, back to the serious recent stuff.  There's a whole collection of methods that have been published that I'm going to call "Long Read Chromatin Footprinting" (but won't call LRCF because I don't like to inflict ugly acronyms on the world) because the community can't seem to settle on a single name/acronym - my spreadsheet tracking these has 21 different names, though some are variations on a theme due to slightly different protocols with slightly different goals (e.g. SAMOSA vs. SAMOSA-CHAAT).  But the basic idea is to extract chromatin from cells and then treat with something that labels the accessible DNA but not that shielded by chromatin proteins.  Single molecule sequencing then reveals the marks and thereby which chromatin was open.

In most of these protocols, the marking method generates 6-methyladenine, which is nearly non-existent in eukaryotic genomes.  That means that the detection of native 5-methylcytosine and 5-hydroxymethylcytosine.  And of course mutations can be detected - so these methods potentially generate three orthogonal signals.   Early methods used adenine methylases with four-base specificity, but NEB has an Enzymes for Innovation product line of wonderfully strange enzymes and one of those is EcoGII methyltransferase - which seems to have no context specificity for methylating adenines.  So now nearly every publication uses EcoGII.

Angelicin

As stated: nearly every publication.  A recent preprint from Angela Brook's lab introduces a new marking agent: angelicin, a plant natural product that intercalates DNA and can be covalently cross-linked with UV exposure (and named for the common garden plant Angelica, not the senior author of the paper!). The proposed advantage of this small molecule for marking is that very short linker regions between nucleosomes are not labeled by methyltransferases due to steric clashes with the nucleosomes.  Importantly, each angelicin molecule crosslinks only to a single DNA strand, so the DNA is still competent for nanopore sequencing.  Angelicin shows a preference for intercalating in the order TA > AT >> TG > GT, as shown in their plot below - TATATA kmer shows a notable shift in nanopore raw signal not seen in GGCGCG or CGTTAC. 

BrdU

The second thread is the nucleotide analog BrdU -bromodeoxyuridine, which replaces the methyl in thymine base with bromine.  BrdU is incorporated in place of thymine into DNA, but oddly will pair with guanine during replication (why the asymmetry?  I haven't been able to find an explanation) and so is classed as a mutagen -- though apparently a mild one.  BrdU has been used for many DNA studies - for example that heavy bromine atom can be used to assist phasing in X-ray crystallography. It can be also be picked up by electron microscopy = and now that I think of it, was probably part of the labeling scheme for attempts at direct sequencing of DNA by electron microscopy - commercialization attempts I've been told by an inside source were starved of funding after ONT's big 2012 AGBT announcement.

A number of papers have shown that BrdU can be detected in nanopore content; if used in a pulse-quench or pulse-chase experiment the BrdU will label DNA synthesized during a specific time period - a method that long predates nanopore sequencing.  So with nanopores, one can perform this genome wide.  The recent preprint that caught my eye showed that human replication initiation occurs at more sites than previously thought, but in preparing this piece I discovered the long trail of preprints detecting BrdU with nanopores.

Why not both?  

A technique called RASAM uses both BrdU labeling of nascent replication and methyltransferase marking of open chromatin simultaneously, potentially providing in one assay four different 'omics readouts - sequence variation, native methylation, open chromatin and nascent DNA synthesis.  RASAM builds on a chromatin footprinting protocol called SAMOSA; I knew before that the latter was a component of South Indian cuisine and now I understand the former is too (and the same group has the variant SAMOSA-CHAAT; I'm getting hungry writing this!). Interestingly, these use PacBio for detection - PacBio can also be trained to recognize BrdU

Speculations

There are probably many more ways to leverage the detection of angelicin and BrdU in single molecule sequencing.  

For example, there is an interesting technique I've considered writing up (sadly, I have mislaid a draft) called Strand-Seq, and it relies on BrdU labeling to provide phasing information in difficult-to-map regions of the genome.  While long reads and particularly ultra-long read sequencing has largely solved this, there might still be a niche for long read Strand-Seq.  The chromatin footprinting schemes may be a generally interesting approach to studying genomic samples in rare disease research, in order to detect allele-specific open chromatin and methylation simultaneously - the utility of this has already been demonstrated in one case.

I've also cooked up some educational uses.  Sequencing BrdU-labeling of DNA synthesis could be demonstrated in a relatively early lab experience, making those diagrams in intro bio much more relatable.  If single molecule DNA library preparation becomes streamlined and inexpensive, then students could perform an updated version of the "most beautiful experiment in biology" (watch the video at the link - it's amazing)  - the Meselson-Stahl experiment, demonstrating that DNA replication is semiconservative.  If it all works, then after one replication one should find BrdU-labeled fragments mapping to one strand in each cell and BrdU-clean strands mapping to the other.  That should rule out -- check my logic - both dispersive (which is from Max Delbrück!) and fully conservative. 

This idea of updating Meselson-Stahl to modern molecular biology is dedicated to the anonymous Harvard undergraduate who heard either Meselson himself or Bill Gelbart explained Seymour Benzer's elegant phage recombination mapping experiments that attained single basepair resolution - and asked why he hadn't "just sequenced them".  

Back in the experimental world, one of my favorite modifications to see a single molecule method trained on is phosphorothioate linkages.  These are often used in biotechnology for oligos to be used in vivo,  because they are resistant to many nucleases.  As with many seemingly clever inventions of humans, biology beat us by a million years or more.  Some bacteria incorporate phosphorothioate into their DNA in a limited fashion.  It's thought to be a restriction-modification system, but isn't well understood.  It's quite a serious change, since the phosphorothioate linkages must be incorporated by nicking the DNA and limited resynthesis.

What other DNA modifications will fall next?  And I've focused on DNA - the world of RNA modifications is known to be vast and 

Long Read Chromatin Footprinting References

Note: I believe this is complete at this time but it is easy to miss a reference.  I don't plan to maintain this publicly, but if you'd like to leave any missed or new ones in the comments, please do so!

6mA-Sniper: Quantifying 6mA sites in eukaryotes at single-nucleotide resolution | Science Advances
BIND&MODIFY: a long-range method for single-molecule mapping of chromatin modifications in eukaryotes | Genome Biology
Data-adaptive methods in detecting exogenous methyltransferase accessible chromatin in human genome using nanopore sequencing | Bioinformatics | Oxford Academic
DiMeLo-seq: a long-read, single-molecule method for mapping protein-DNA interactions genome-wide - PMC
DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools
Examining chromatin heterogeneity through PacBio long-read sequencing of M.EcoGII methylated genomes: an m6A detection efficiency and calling bias correcting pipeline | bioRxiv
Long-range single-molecule mapping of chromatin accessibility in eukaryotes | Nature Methods
Mapping protein-DNA interactions with DiMeLo-seq | bioRxiv
Massively multiplex single-molecule oligonucleosome footprinting | eLife
Nucleosome density shapes kilobase-scale regulation by a mammalian chromatin remodeler | Nature Structural & Molecular Biology94-023-01093-6
Probing chromatin accessibility with small molecule DNA intercalation and nanopore sequencing | bioRxiv
Profiling Chromatin Accessibility in Humans Using Adenine Methylation and Long-Read Sequencing
RNA polymerases reshape chromatin and coordinate transcription on individual fibers
scNanoATAC-seq: a long-read single-cell ATAC sequencing method to detect chromatin accessibility and genetic variants simultaneously within an individual cell
Sensitive multimodal profiling of native DNA by transposase-mediated single-molecule sequencing
Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing
Simultaneous profiling of histone modifications and DNA methylation via nanopore sequencing | Nature Communications
Single-molecule long-read sequencing reveals the chromatin basis of gene expression - PMC
Single-Molecule Multikilobase-Scale Profiling of Chromatin Accessibility Using m6A-SMAC-Seq and m6A-CpG-GpC-SMAC-Seq
Single-molecule regulatory architectures captured by chromatin fiber sequencing | Science
Single-molecule simultaneous profiling of DNA methylation and DNA-protein interactions with Nanopore-DamID | bioRxiv
Spatial chromatin accessibility sequencing resolves high-order spatial interactions of epigenomic markers
Synchronized long-read genome, methylome, epigenome, and transcriptome for resolving a Mendelian condition - PMC
The single-molecule accessibility landscape of newly replicated mammalian chromatin | bioRxiv

No comments: