A new preprint based on Genomics UK data has identified a set of single base insertion mutations (predominantly a specific A insertion) in a spliceosomal RNA which is responsible for about 0.5% of previously undiagnosed genetic cases of syndromic neurodevelopmental disorders . That's a remarkably high frequency mutation which has gone unnoticed to date, but the fact it was hiding in a non-protein-coding RNA (a spliceosome component called RNU4-2) had much to do with that - this gene won't be in any exome panels. The mutation always appears to be de novo and therefore the pathogenic phenotype is dominant. I'd like to write down a few other thoughts - mostly in the form of questions -- with the caveat that I've never worked on a rare disease project and to describe me as a detached armchair voyeur of the field would be far too generous.
One observation in the paper is that while the insertion might be expected to affect splicing, they can't detect any alteration in splicing within RNA-Seq datasets. Which has me wondering, are existing RNA-Seq datasets well-powered to detect alterations in splicing? This thought clearly draws inspiration from Pengfei Liu's work presented at the Ultima Napa Valley shindig describing ultra-deep RNA sequencing for rare disease interpretation. How are such studies affected by details of the sequencing platform? If we have experiments of equal cost of short reads and nanopore reads, will the long nanopore reads win or are the greater number of short reads superior for detection? For paired end reads, would sensitivity be improved significantly using longer inserts - Illumina's move to patterned flowcells and Exclusion Amplification locked that platform into 500 bp inserts; Element can handle well over a kilobase.
Along those lines, would it make sense to generate extremely deep RNA-Seq datasets from multiple technologies on a panel of cell lines engineered with different mutations known to affect constitutive or alternative splicing? Or perhaps some on wildtype cell lines treated with splicing-altering drugs such as spliceostatins and pladienolide? How diverse an underlying panel of cell lines would be required? Could one cell be engineered and then converted to a bunch of cell types via induced pluripotent stem cell (iPSC) technology?
How poor are we at detecting non-coding pathogenic variants? DNA and RNA "foundation" machine learning models are all the rage now; will these boost our ability to call out pathogenic non-coding variants? Or do we not have an appropriate knowledge base to train such models? If so, are the RNA-Seq panels I describe above the best way to build that knowledge, or does something else fit these machine learning schemes better?
I'll confess I know nearly nothing in detail about spliceosome biology. RNU4-2 encodes a U4 snRNA. I saw some suggestion that the mutations are occurring near a binding site identified by yeast genetics. If we mimic these mutations in Saccharomyces, what happens? That yeast has few introns and little (no?) alternative splicing. What if we instead mimic them in Drosophila or Caenorhabditis - will there be a gross phenotype?
The authors speculate that local secondary structure may make this site prone to insertion mutations. What sort of experimental systems could be devised to study this further? I can already start envisioning using a frameshifted reporter (or selectable) gene as a scheme to look for genes which when knocked out increase the rate of such insertions -- though it wouldn't seem unlikely that most hits would be already identified DNA repair genes.
I'll welcome any sort of suggestions or critiques of the above -- don't hesitate to point out where I've been painfully naïve or missed some key literature1
No comments:
Post a Comment