Saturday, June 13, 2026

Long Reads for Rare Diseases Hits New England Journal of Medicine

Alexander Hoischen and colleagues have a brief piece in New England Journal of Medicine that published early this morning, in conjunction with the European Society of Human Genomics (ESHG) meeting.  I'm now in the thrall of ESHG FOMO - I attended last year in part because it was immediately after London Calling and it seemed silly not to extend my trip (plus Alex had asked me personally to go), but this year I reluctantly decided my travel load was getting high and decided to make only one transatlantic flight in the May-June timeframe.  Hopefully I picked the right one.  The piece is the latest installment of a long running program to demonstrate that long read sequencing, specifically PacBio HiFi sequencing, can replace a battery of older tests when trying to diagnose rare genetic diseases.



The centerpiece of the publication is a Sankey diagram (y'know, like the Napolean's army to and from Moscow diagram so loved by Edward Tufte) of diagnostic classification for a large cohort which received both standard-of-care (SoC) and long read sequencing.  SoC here was a heterogeneous mix, with not all patients receiving the same battery of tests.  Also notably it included whole exome but not whole genome short read sequencing.  So, to rattle off a few numbers, 389 had exome sequencing, 262 targeted NGS (short read?), 254 Sanger sequencing, 152 Multiplex Ligation Probe Amplification - and the list goes on.  Of course many patients had multiple tests and these may have been run sequentially.  

In general, there was a high degree of concordance - but not always did these agree.  So long read found 11 insertion/deletion variants missed by SoC, but SoC found 1 missed by long read.  Similarly it was 36:1 long read to SoC for single nucleotide variants.  Long read found two additional structural variants and two copy number variants missed by SoC but SoC had no solo wins in those categories. It would be interesting to hear the details on the two examples of definitive variants missed by long reads, or two additional cases where possible diagnoses were supplied by SoC but missed by long read.

Overall, long read has a 2.5 percentage point advantage over SoC - which means if applied to the 15,150 index patients this consortium had referred to them in 2024 it would have been expected to yield 373 more diagnoses.  That's 374 more patients moving beyond a diagnostic odyssey.  Interestingly, most of this comes in the form of converting possible diagnoses to conclusive diagnoses; the number of no diagnosis cases would be expected to drop by only 33.

Which is the sobering part of this study - long read gives full diagnoses to 18.9% of patients and possible diagnoses to another 6.4%R - together just about a quarter of the patients.  I hate to be a glass half empty guy, but that's much worse.  We are acquiring a great deal of information, but not solving the cases.  Now, it could be that this group is already seeing a group enriched in hard cases - the simple stuff doesn't get referred to these specialists.  But that's still an ugly statistic.

What will it take to move this number substantially?  Or to frame it a different way, what are the likely root causes behind these diagnostic failures?

PacBio sequencing can struggle with regions which are strongly biased for purines (A,G) on one strand, pyrimidines (C,T) on the other - but even at 20X cover it was determined that only 0.5% of autosomal genes have coverage issues, mostly due to this purine bias.

Methylation information is embedded in PacBio data, but it isn't clear from the paper whether it is used routinely.  It could also be that we just don't yet have the right downstream tools and databases.  It does mention that three cases were diagnosed as examples of skewed X-inactivation.  Last year at ESHG I remember a talk that found an X-autosome translocation in which inactivation spreading into the autosomal region silenced a locus, causing the disorder.

How much would chromatin state information help?  I've covered the fiber-seq/chromatin stenciling approach in the past (known by at least a half dozen other names as well).  This has been used to identify some disorders; I think last year in one session there was an example where this information helped nail down a diagnosis to a damaged chromatin insulator domain.  Again, do we have the right tools and databases to interpret such data - and these approaches due require specialized chromatin extraction and processing.

One clear theme at the sessions I attended at ESHG last year - whenever there were parallel sessions I picked the one most focused on rare disease diagnosis by sequencing - is that tools for predicting the splicing impact of non-coding mutations definitely have much room for improvement.  Talk after talk showed a non-coding variant that was linked to aberrant splicing by RNA-Seq or other methods showing specific aberrant RNAs, and all too often  SpliceAI did not flag them as likely problematic.  This is an area where many academics and companies claim to have new and improved foundation models of genomes that can make such predictions, but so far it isn't clear there is any substantial improvement. It would be amazing to have some of these clinical genomics groups could run their no-call genomes through such new tools and score how many true positive and false positive calls are made on SpliceAI negatives.  Of course, knowing truth requires having or generating RNA data from appropriate tissues with the variants in question - perhaps by generating induced pluripotent stem cells (iPSCs) from patients.  

There are more exotic explanations which are likely rare, but perhaps explain some.  Genome mosaicism - such as aneuploidy rescue - could mean that the sampled tissue just isn't the same genome as the problematic tissue.  Or one early cell of an embryo might have suffered a genetic lesion that disproportionately contributed to a target organ. X-inactivation occurs in a mosaic pattern in most individuals, even in the lymphocytes (I had Gemini check this - I could have imagined a case where all lymphoid tissue had a common post-inactivation progenitor).   Or it could be that some of these patients do not have a genetic disease - probably very rare, but look at enough patients and even the ultrarare might manifest itself.

Solving this gap in rare disease diagnosis is most importantly a step towards ending diagnostic odysseys.  Next it will help focus efforts on bringing treatments forward - an area where there has been some success but not nearly enough.  And often that failure is more around funding and distribution of therapies, not a lack of viable therapeutic interventions.  Ultimately, rare diseases have routinely provided striking and novel windows into biological function which influence large swathes of biomedicine.

Next year ESHG is in Rotterdam at around the same time as next year, so I'll probably be faced with similar timing dilemmas viz-a-viz London Calling.  In the near term, between some themes from my Venter series and some of the issues raised here, I really should get cracking on a pile of bookmarked publications that are worth covering.  

No comments: