A computational biologist's personal views on new technologies & publications on genomics & proteomics and their impact on drug discovery
Monday, December 29, 2025
The Joy of Rediscovery
Tuesday, December 16, 2025
UltraMarathonRT: When Your Reverse Transcription Must Go Long
The 1960s, 1970s and 1980s were both the early years and golden years for nucleic acid enzymology. Scientists unraveling the secrets of DNA replication and repair, RNA transcription, viral replication and other basic processes purified enzymes responsible for numerous processes. Other scientists envisioned practical applications for these enzymes and put them to work in the recombinant DNA revolution that began just over 50 years ago. Due in no small part to the great body of literature that has arisen around these pioneer enzymes, they tend to be important still today in biotechnology - sometimes retaining monopolies on a particular type of in vitro biochemistry. But there are new entrants, and today I’m going to explore a new player in the reverse transcription space, UltraMarathonRT from a small Connecticut company, RNAConnect. RNAConnect has launched two new products in the second half of this year, a kit for cDNA synthesis back in August and today a kit for generating long direct RNA reads on Oxford Nanopore platforms.
Two reverse transcriptases have dominated the field of cDNA generation for cloning, sequencing, RT-qPCR and other applications, both arising from early research on retroviruses. Each is named for the retrovirus it was found in, Avian Myeloblastosis Virus (AMV) Reverse Transcriptase and Moloney Murine Leukemia Virus (M-MuLV, MMLV). Many of the reverse transcription kits on the market today are either formulations of one of these two retroviruses or made with versions carrying a small number of point mutations. While these enzymes have served biotechnology well, they do have shortcomings, particularly in the lack of helicase activity which can cause them to stall on templates which have formed complex and strong secondary structures.
UltraMarathonRT is not from a eukaryotic retrovirus, but instead from a prokaryotic group II self-splicing intron. UltraMarathonRT is far more processive than the venerable eukaryotic RTs and able to reverse transcribe transcripts of 30 kilobases or longer. A key difference from another group II retron RT on the market, Induro from NEB, is that UltraMarathonRT has a temperature optimum of 30C vs 55C for Induro; higher temperatures risk more damage to template RNA.
Template switching is a property of reverse transcriptases in which the enzyme has a 3’ terminal transferase activity which adds a predictable set of untemplated nucleotides to the 3’ end of the first strand cDNA; for UltraMarathonRT this is three As. By inclusion in the reaction of an oligo with the complementary sequence, second strand cDNA can be triggered in the same reaction. Since the template switching oligo (TSO) can have barcode and unique molecular identifier sequences as well, this makes template switching particularly valuable for sequencing applications.
RNAConnect today launched a kit for Oxford Nanopore direct RNA sequencing which uses UltraMarathonRT. While direct RNA sequencing can work without any reverse transcription, having a first strand cDNA bound to the RNA improves performance, particularly since the helicase activity of UltraMarathonRT can unwind secondary structures which might not be unwound by the motor protein in ONT’s chemistry. RNAConnect has run comparisons of their kit to a process using Induro and shown 67% more mapped reads which are in excess of 10Kb, with this dropping to 24% for mapped reads >5kb and only 11% for reads >2kb. With increasing research interest in exploring long non-coding RNAs (lncRNAs) - the best known lncRNA, XIST responsible for driving X chromosome inactivation, is 17 kilobases long. The kit also has the convenience of including all required components, other than those unique to ONT.
Higher processivity and greater ability to push through complex secondary structures - both highly desirable properties in a reverse transcriptase. As the price/performance ratio of both PacBio and ONT improve for cDNA sequencing, continued improvement of metrics for ONT direct RNA and now meso-length reads from Roche’s SBX chemistry will all enable greater surveys of RNA at longer scales. Such studies can be reasonably expected to sharpen our understanding of splicing.
When I was an undergraduate nearly 40 years ago, we were taught that alternative splicing existed but was a rare phenomenon that only rarely deserved attention. I think we were taught about the alternative splicing that drives soluble vs membrane-bound antibodies in B cells, but probably no other examples. That view has changed radically over the last 40 years, with alternative splicing now recognized as a generator of both protein diversity and regulation.
As a graduate student, I discovered an overlooked set of alternative exons in a Drosophila visual protein gene which my labmate Carlos Alvarez demonstrated, by clever PCR assays, that while there are theoretically 8 different splice forms possible (which would all generate valid ORFs) only three of these are detectable in flies. Made a nice PNAS paper. Nowadays we’d do that by sequencing. Cataloging such coordinated splicing would be one clear use for long read direct RNA or long read cDNA sequencing.
Another emerging class of splicing events of great interest are “poison exons” and “detained introns”. Detained introns are introns that are normally spliced out, but are systematically retained in certain contexts. If these cause the translation of a premature stop codon which leads to nonsense-mediated decay of that mRNA, then it is a poison exon. A number of labs have reported on poison exons that appear to be very carefully regulated, providing yet another opportunity for cells to regulate production of a particular gene product. Inadvertent retention of poison exons is yet another way that mutations can negatively affect mRNAs and trigger rare genetic disorders.
Clearly for a complete survey of alternative splicing, poison exon usage, and other types of retained/detained introns it is important to have an unbiased view of a transcript, not degraded by secondary structure or biased to the 3’ end. UltraMarathonRT shows approximately 2X higher detection of retained introns than other RTs when applied to the Universal Human Reference RNA (UHRR) sample.
RNAConnect the company is sited in Branford CT, not far from Yale University where founder Dr. Anna Marie Pyle teaches. For this piece I spoke with Andrew Bond, previously at gene synthesis company Gen9, and Jason Underwood, ex-PacBio,
UltraMarathonRT appears to be a useful new tool in the molecular biology toolbox, enhancing the ability to sequence long and difficult RNA templates. It shows promise for advancing the understanding of splicing as well as the medical consequences of inappropriate splicing. UltraMarathonRT citations in PubMed are currently only from Dr. Pyle’s lab; it will be interesting to see what new discoveries are made as kits with this enzyme enter widespread use in RNA sequencing laboratories.
Thursday, December 04, 2025
Countable Labs: New Approach to Enumerating DNA
Countable Labs, formerly Enumerix, was founded by serial entrepreneur Stephen Fodor, who originally stormed on the molecular tools scene with Affymetrix. I caught up with their new CEO, Giovanna Prout, at ASHG the other week and got a rundown on their new approach to counting molecules with PCR.
Prout recently found herself out of the CEO job at Scale Biosciences after its acquisition by 10X Genomics. She says she resolved to become the best stay-at-home mother ever and her kids loved having her home - but soon urged her to find a new gig as they recognized it was what made her happiest. So she quickly landed the CEO role at Countable Labs. Formerly Enumerix, the company is yet another molecular tools company from prolific scientific entrepreneur Stephen Fodor, best known for Affymetrix.
Countable’s standard workflow is simple. A 50 microliter reaction of sample DNA, probes, primers, PCR mastermix, and Countable’s proprietary matrix consumable are placed in a spin column. Centrifuging the columns generates a matrix, with approximately 30M individual picoliter-scale compartments capturing individual DNA molecules. After a brief (60 minute) PCR amplification in a conventional thermocycler (Countable has a list of preferred instruments), the tubes are placed in Countable’s benchtop instrument for light sheet microscopy imaging of the compartments, requiring 5 minutes of imaging per tube. There’s no dead volume - every picoliter scale chamber in the tube will be imaged. Because there are so many compartments, the system has a dynamic range of 6 logs! Countable’s instrument holds 96 tubes in the form of 24 strips of 4 tubes each. The instrument is priced at $150K with consumables adding up to $16 per sample. A full set of 96 samples can be processed in half a workday.
Assays can be designed using Countable’s universal multiplexing kit or conventional TaqMan hydrolysis probes can be included. The universal multiplexing kit has the advantage of enabling the use of inexpensive, fast arriving ordinary oligos which simply require a 5’ tail sequence on the forward primer to enable linking (via primer extension) the universal multiplexing codes to the user primers. Countable provides a software tool to streamline converting existing assays into universal multiplexing assays and analyze the resulting multiplex primer designs for undesirable cross-reactivity. Assays based on the universal multiplexing kit can be designed and tested in under a week
Countable is developing a high degree of multiplexing by managing the optical system so that it is capable of distinguishing 10 different fluorescent dyes. By imaging in 9 different channels, each channel a different pairing of excitation wavelength and emission wavelength, each dye can be distinguished by the unique fingerprint of intensities in each channel. This yields 10 clearly separable labeling schemes. Theoretically 48 different labeling schemes can be distinguished in this way. One ASHG poster from Countable demonstrated 8-fold multiplexing
So what can you do with so many colors? And with 6 logs of dynamic range?
One poster presented by ASHG focused on BRAF oncogenic mutation detection. Three clinically relevant mutations are seen at position 600 of the protein: V600E, V600K, and V600R. By first using 18 cycles of a PCR design agnostic to the status of codon 600 as a pre-amplification and then using allele-specific primers covering the four alleles (three oncogenic variants plus wildtype) and each primer a different color, Countable was able to demonstrate detection of the variants when present in a sample at a frequency of 0.08% - much better than the 1-5% achievable with qPCR and 0.1% for digital PCR. That’s also not a trivial detection limit to achieve with a sequencing assay - but at about $16 per sample the Countable assay will be far less expensive than any NGS assay unless you can batch to a very high degree. These results also leverage the high dynamic range of Countable’s assay - detecting between 400 and 700K molecules with a single assay system.
In another poster, Countable demonstrates measuring mitochondrial genome copy number, using multiple distinguishable probes targeting the mitochondria plus an additional one to get the nuclear genome as a reference.
Countable is also touting that they have built the system for GMP workflows, with the built-in software providing audit trails and other required security features for 21 CFR Part II compliance.
Another interesting feature of Countable PCR is the ability to recover samples post-amplification - a protocol is provided to extract DNA out of the matrix. So you can count from a precious sample and then potentially fully sequence it as well.
Tuesday, November 11, 2025
Bell Labs Wasn't Built in a Day. Or Two Years.
Tuesday, November 04, 2025
Nineteen
Wednesday, October 15, 2025
ASHG Posters: The Agony and The Ecstasy
Tuesday, October 14, 2025
PacBio: $300 Genome Via Chemistry Update
Thursday, October 09, 2025
Thursday, August 07, 2025
10X Scoops Scale
Sunday, June 29, 2025
Could It Have Been Found With Short Reads?
Thursday, June 26, 2025
Food For Thought On ONT's Proteomics Push
Monday, June 02, 2025
Roche Gives SBX Updates - and a Name!
Wednesday, May 21, 2025
Oxford Nanopore Should Spin Out Protein Sequencing
Tuesday, May 20, 2025
London Calling 2025: What I'm Thinking About
Monday, May 19, 2025
Clive Brown At ONT: A Belated Retrospective
Wednesday, April 23, 2025
AGBT Flashback: Scale Biosciences’ QuantumScale
Thursday, April 17, 2025
Stellaromics Dives into the Thick of Spatial Genomics
Monday, April 14, 2025
Will the Result Be GeneXpertION?
Tuesday, March 18, 2025
Mission Impossible: Methylomics
We first thought of sending you in to extract the secret from Roche, but even we can't just go performing espionage on a legitimate company with no apparent plans for world domination; monopolization of the sequencing market by Illumina has never triggered us to action so that isn't a justification.
Saturday, February 22, 2025
Is Midi Read Sequencing A Thing?
Thursday, February 20, 2025
Roche Ripple Predictions
In the prior piece, I covered the technical details unveiled by Roche for their SBX technology, but generally tried to avoid predicting its effects on the marketplace. Here I put on the pundit’s hat. The TL;DR is this is a major new sequencing platform and if you’re at one the competitors you have about a year before it fully hits the market - though in reality the action has already started as Roche starts grabbing hearts-and-minds. What can we anticipate about the effect on each of the current players? As noted in the prior piece, some key aspects - in particular purchase price and run cost - aren’t being disclosed by Roche and complicate prognostication.
Roche Xpounds on New Sequencing Technology
Bar bets can be a powerful force in human society. One of the best known books on the planet, The Guinness Book of World Records, originated from the need to equitably settle wagers. Many entries in that tome are questions of immense scale - the largest this or heaviest that. Shortly before this posted, Roche unveiled a sequencing technology that per its inventors may be the result of such a bar bet: how large a dangling bit can you stick on a nucleotide and still have it incorporated by a polymerase.
Monday, January 27, 2025
Olink Reveal: Focused Proteomics, Simplified
I’ve covered a lot of genomics in this space, but there is an inherent challenge to studying biology via DNA - DNA is the underlying blueprint, but that blueprint must pass through multiple steps before actual biology of interest emerges. RNA-Seq gets closer, but much of the real action is at the level of proteins (though much is not - let’s not forget all the metabolites!). When I set out in this space 18 years ago, I thought I’d cover more proteomics but that didn’t materialize - time to plunk one piece on the proteomics side of the ledger!
Proteomics has multiple challenges, but two inherent ones are the diversity of proteoforms and the dynamic range within the proteome.
The diversity of proteins within a human is astounding, even if we discard the inherently hypervariable antibodies and T cell receptors which have specific means of diversification within an individual that include random generation of sequence during VDJ recombination and somatic hypermutation of antibodies. The rest of the bunch are subject to transcript-level diversification by features such as alternative promoters, alternative splicing and RNA editing and then another wealth of post-translational proteolysis, phosphorylation, glycosylation and a heap more covalent modifications. If we really wanted to make things complex, we’d worry about protein localization, who a protein is partnered with and even alternative protein conformations - but let’s just stick to primary proteoforms and a diversity that is estimated in excess of 1 million different forms.
The key part here is that there is no analytical method capable of resolving all of these. Any proteomics method is to some degree ignoring much of the proteome entirely, and for many other proteins compressing many forms into a single signal. Indeed, most proteomic tools look at very short windows of sequence or perhaps patches of three dimensional structure, and will rarely if ever be able to directly connect two such short windows or patches - they will be stuck correlating them. The key takeaway here is that all proteomics methods work on a reduced representation of the proteome.
The dynamic range in the proteome is astounding, with some potentially challenging effects. For example, blood serum is utterly dominated by a handful of proteins such as serum albumin, beta 2 microglobulin and immunoglobulins - for methods that look at the total proteome there is a serious danger of flooding out your signal with these abundant but relatively dull proteins and not being able to seen interesting ones such as hormones that are many logs lower in concentration.
Proteomics has been dominated by mass spectrometry, which has had over three decades to develop into a mature science. Mass spec is inherently a counting process and on its own can’t focus or filter out the dull stuff. Even more so, you don’t fly intact proteins in a mass spec, but peptides and there’s only a few useful proteases out there. Peptides don’t ionize consistently, so that adds a layer of challenge to quantitation. But as noted, this has been an intensely developed field for multiple decades and so there are very good mass spectroscopy proteomics techniques using liquid chromatography (LC-MS) and other methods to remove abundant dull proteins and fractionate complex peptide pools into manageable ones.
But, protein LC-MS is very much its own discipline, and most proteomics labs aren’t strong in genomics or vice versa - though there are certainly collaborations or dual-threat labs. LC-MS setups require serious capital budgets for the instruments and their accompanying sample handling automation and highly skilled personnel.
A number of companies are attempting to apply the strategies of high throughput DNA sequencing to peptide sequencing or identification. Quantum-SI is the only one to make it to market but there are other startups out there such as Erisyon are plugging away. These methods look a bit like mass spectrometry in their sample requirements, as they will also be counting peptides - and the current Quantum-SI doesn’t count nearly enough to be practical for complex samples such as serum or plasma.
The other “next gen proteomics” - one lesson not learned from the DNA sequencing world is the problem of calling something “next-gen” - this year will be the 20th anniversary of the commercial launch of 454 sequencing - approach is to use affinity reagents such as antibodies or aptamers and tag them with DNA barcodes, then sequence those barcodes on high throughput DNA sequencers. By using affinity reagents, the problem of boring but abundant proteins goes away – just don’t give them any affinity reagents. Dynamic range can be addressed as well - the exact details aren’t necessarily disclosed by manufacturers but one could imagine only labeling a fraction of a given antibody to tune how many counts are generated from a certain concentration of targeted analyte.
Olink Proteomics, now a component of Thermo Fisher, is one company offering a product in this space. Olink’s Proximity Extension Assay (PEA) relies on two antibodies to each protein of interest and requiring hybridization between the probes on both antibodies to enable extension by polymerase in order to generate a signal. This increases the specificity of the signal and tamps down any signal from non-specific binding - or from just having antibodies in solution.
Olink has released a series of panels targeting increasing numbers of target proteins in the human proteome. This is generally a good thing - except counting more proteins means generating more DNA tags which means a bigger sequencing budget per sample. The other knock on Olink’s (and their competitor SomaLogic, now within Standard Biotools and also marketed by Illumina) approach is a complex laboratory workflow that mandates liquid handling automation. So this has meant that the big Olink Explore discovery panels are inevitably going to be run at huge genome centers that have both the big iron sequencers and the liquid handling robots that are required. And this strategy has started paying out scientific dividends - some of which were covered by the Olink Proteomics World online symposium last fall that featured speakers such as Kari Stefansson. Olink’s and Ultima’s recent announcement on starting to process all of the UK Biobank is an example of such grand plans, and this will be run at Regeneron’s genome center.
Academic center core labs and smaller biotechs often power important biomedical advances, but if Olink Explore is only practical with NovaSeq/UG100 class machines and fancy liquid handlers, then few of these important scientific constituencies will be able to access the technology. Which would be unfortunate, since small labs often cultivate very interesting sample sets that very large population-based projects like UK Biobank might not have. Large population based and carefully curated small projects are complementary, but is only one able to access Olink’s technology?
And that’s where Olink’s newest product, Olink Reveal, comes in, enabling smaller labs to process 86 samples. First, a select set of about 1000 proteins is targeted, bringing the required sequencing for a panel of samples plus controls to fit on a NextSeq-class flowcell - only 1 billion reads required. Second, the laboratory workflow has been made very simple and practical to execute with only multichannel pipettes. The product is shipped with a 96-well plate that contains dried down PEA reagents; simply adding samples and controls to the wells activates the assay for an overnight incubation. The next day, PCR reagents are added to graft sample index barcodes onto the ligation products and then that is pooled to form a sequencing library. The library prep costs $98 per sample (list price) - $8,428 per kit. Throw in sequencing costs of $2K-$5K per run (depending on the instrument) and this isn’t out-of-line for other genomics applications.
Of course, this is a reduced representation over the larger “Explore” sets But Olink has selected the proteins to be a useful reduced representation. They’ve used sources such as Reactome to prioritize proteins, and have also prioritized proteins that have been shown to have genetically-driven expression variability in the human population - protein QTLs aka pQTLs. If the new panel is cross-referenced to studies using the larger panels, most of these studies would have found at least one protein showing statistically significant change in concentration. This can be seen in the plot below, where each row is a study colored by disease area. On the left is the distribution of P-values for the actual Olink Explore data and the right the same data filtered for proteins in the Olink Reveal panel.
It’s also robust - Olink has sent validation samples to multiple operators and compared the results, and the values from each lab are tightly correlated.
So Olink with their affinity proteomics approach is basically following the same playbook as genomics did with exomes. When hybrid capture approaches for exome sequencing first came out, it was thought these would be used for only a few years and then be completely displaced by whole genome sequencing (WGS). But exomes have proven too cost effective - even with drops in WGS costs, it is still possible to sequence more samples with exomes for the same budget. Yes, that risks missing causal variants outside the exome target set was always a concern – the recent excitement around lesions in non-coding RNAs such as RNU4-2 have demonstrated that - but many investigators saw exomes as enabling studies that otherwise wouldn’t happen. Plus sometimes the bigger worry is biological noise obscuring a signal you could see and that is dealt with by more samples.
The new Olink Reveal product fills a gap between Olink’s large “Explore” discovery sets and very small custom panels. In the Proteomics World talks many speakers described work run with PEA panels of only two dozen or so targets, often using PCR as a readout rather than sequencing. This shows one bit of synergy in the Olink acquisition by ThermoFisher, as Thermo has an extensive PCR product catalog including array-type formats. Thus PEA follows the well worn patterns in genomics: huge discovery panels for some studies, high value panels that balance cost and coverage for many studies and focused custom panels for validating findings on very large cohorts. The Proteomics World talks even suggested some of these focused panels might soon be seriously evaluated as in vitro diagnostics. With developments like these, targeted proteomics via sequencing will be a very interesting space to watch.


