Friday, May 17, 2024

HiFi WGS As A (Nearly) Unified Tool For Rare Genetic Disease Diagnosis

What is now way back in February, Alexander Hoischen presented a talk at AGBT which described early results from an effort to apply PacBio HiFi sequencing at scale for solving rare disease cases.  Hoischen passionately made the case for how providing a diagnosis can change affected families.  It's also worth noting how important rare disease genetics has been to the history of biology, illuminating new processes and entire pathways.  Something I hadn't appreciated until his presentation is how many technologies are currently thrown at a case in current workflows because each technology can cover a few types of mutations but miss others.  So this is good snapshot of the current state of human genomics technology with hints of where it might be going.  And Hoischen made a strong case that many other technologies  - but not all of them - can be retired if PacBio HiFi sequencing is the lead approach.  A longer, similar talk is also available as a PacBio-sponsored webinar given by Lissenka Vissers from the same institution and some of the data is in a preprint linked below.

Sunday, May 12, 2024

DoveTail Transposes Their Hi-C Methodology

Technologies vying for state-of-the-art in human genome analysis are a recurrent theme in this space, and there are many ideas on this in the collection I really need to get out over the next two weeks before my brain is overwhelmed by London Calling.  Up today: Dovetail Genomics popping back on the scene (as a subsidiary of Cantata Bio)  with an AACR poster several weeks ago showing early results from a "LinkPrep" kit that will commercialize tagmentation (in vitro transposition to fragment DNA and add adapters) for Hi-C library generation, with the promise of enabling short read sequencing to deliver both SNVs as well as long-range structural information all from the same library.  

Tuesday, May 07, 2024

On The Expanding Versatility of Single Molecule Sequencing for Detecting Anomalous DNA

An exciting aspect of true single molecule sequencing has been the detection of methylated bases.  Both Oxford Nanopore and Pacific Biosciences technology generate altered signals if methylated bases are present.  For Oxford Nanopore this is hardly surprising, as it would seem any change in the DNA should alter the complex interaction with the protein pore and it should become just a computational challenge of recognizing that signal.  PacBio is a bit more surprising, but the kinetics of base incorporation are apparently sensitive to the complementary base.  I wanted to point out, though without much deep analysis, three recent preprints that demonstrate detection of other modifications to DNA and thereby enable some interesting applications (and of course, some wild speculations on my part).  It's also interesting because of the overlap between the papers, as they are interconnected to a degree in their methods.

Wednesday, May 01, 2024

First Illumina Complete Long Reads Preprint

Readers of this space might have detected a significant slant towards skepticism in my coverage of Illumina Complete Long Reads (iCLR), exacerbated by now deposed Illumina CEO Francis deSouza claiming it isn't a synthetic read technology.  Illumina's posters on iCLR at AGBT this year seemed to reinforce my view that Illumina was marketing purely on short-read like terms - call SNPs in a few more hard-to-map regions of the genome, but not really compete head-to-head with the true long read platforms.  But now there is a preprint out on MedRxiv that reports iCLR results for a Genome In A Bottle (GIAB) sample as well as seven samples from individuals wiith potential genetic diseases of unresolved cause.  The GIAB sample was also sequenced with some of the latest Oxford Nanopore chemistry (Duplex R10.4.1) and as HiFi libraries on PacBio Revio - enabling comparisons of the platforms.  The preprint is probably going to be revised and expanded - I'm certainly hoping some of my comments are found constructive - but is very useful to see.  And perhaps it will soften positions such as mine on iCLR's utility.

Tuesday, April 30, 2024

A Peek At QuantumSI's Protein Sequencer

A number of academic labs and startups have been trying to build new ways of parallel sequencing of large numbers of peptides using schemes that have significant resemblance in their logic to the highly parallel DNA sequencing schemes often highlighted in this space; QuantumSI is the first (and so far only) such company to actually commercialize in this space.  Resemblances to NGS but not identity - for a few important reasons.

The biggest such challenge is the lack of anything resembling Watson-Crick basepairing in proteins. Sequencing chemistries almost invariably rely on basepairing, with the notable exceptions of Maxam-Gilbert reactions and nanopore sequencing.  Even ONT's scheme ends up leveraging basepairing at times, such as the sequencing adapters and various incarnations of double-stranded sequencing (2D, 1D^2, duplex). And very notably, there is not and probably will never be an equivalent of PCR for peptides; any peptide sequencing technology will inherently be a single-molecule approach  

Furthermore, peptide management enzymology just isn't as well developed.  There's some known proteases with degrees of specificity, but nothing like the wide catalog of restriction enzymes you can order from NEB or other vendores.  There's no polymerases of course, but even tools like ligases just don't have as wide a scope - though again, ligation are often driven by some basepairing.  Nature didn't make this space easy!

For these reasons, nearly all of the proposed chemistries are degradative in nature, with nanopore direct reading of peptides making up the rest. N-terminal degradation is an old concept; Edman developed his chemistry around the same time Fred Sanger was first solving the sequence of a protein (insulin) about 70 years ago.  Performing such analysis on single peptides, rather than pools will clearly be challenging - though it does eliminate the phasing problem and the problem of dealing with mixed populations of input peptides such as we did in a paper back yonder.

So the general concept will be to digest proteins into peptides, likely with trypsin, tether those peptides to a solid surface by their C-termini and then progressively read each N-terminal amino acid followed by removal of that terminal amino acid to expose the following one.

One idea for next-gen protein sequencing, with one example pursuer Encodia, is to try to build what is in effect a "reverse translatase" - progressively disassemble a protein and encode the released amino acids as DNA to be sequenced on a high throughput sequencer.  Each amino acid is coded back into DNA using some sort of code words, based on oligo-tagged recognizers.  One challenge with such a concept is the difficulty of distinguishing closely related amino acids, with leucine vs. isoleucine perhaps the most tricky.  The next is that each amino acid must have its own recognizer.  Of course, it might be acceptable to have some compression - maybe isoleucine and leucine aren't distinguished and that is dealt with in downstream search software.  But, even if the amino acid sequence space must, by necessity, be compressed, the total space of interest is huge if common post-translational modifications are desired to be in scope.  And many of these modifications may complicate the selection of recognizers.

QuantumSI is detecting the recognizers directly using optics. Importantly, they are using the time domain as well -- something a reverse encoder probably can never leverage. In fact, they use the time domain two different ways.  

First, each recognizer is labeled with dyes with different fluorescent lifetimes but the same absorbance and emission spectra.  This enables a monochrome optical system, and monochrome is always simpler and higher resolution than a polychromatic system.  Put another way, they've shifted possible optical and/or mechanical complexity into the chemical domain.

Second, the dynamics of the recognizers binding the N-terminus of a peptide are a key part of the signal. Rather than some sort of 1:1 pairing of recognizers to amino acids, each recognizer will display a certain pattern of binding kinetics with each possible terminal amino acid.  QuantumSI says they can distinguish leucine from isoleucine, as they display different kinetic signals. The biggest advantage is that a small number of recognizers can potentially differentiate a very large number of amino acids - QuantumSI's latest chemistry uses just nine recognizers.  They aren't yet claiming decoding all the funky amino acids - from my Millennium life I have not only a love for phosphorylation but also ubiquitination and its kin - but their system may have a shot at many of these without requiring a custom recognizer for each one.

A very interesting design choice from QuantumSI is to make their system a single-pot chemistry; there is no chemical cycling as with their corporate cousin 454.bio.  This makes for a much simpler instrument - a great deal of microfluidic complexity avoided - and saves on reagents since none of the expensive components are lost.  Unlike 454.bio, QuantumSI doesn't even need to remove incorporated labels, since they are degrading the analyzed peptides.  

But, this does complicate things.  There's basically always a race going on for access to the N-terminus of each peptide. Recognizers will come and go, but eventually the N-terminal endopeptidase strides in and clips off an amino acid - and hopefully leaves without clipping another.  In the ideal case, a set of recognizers flit in and out, giving a complex and useful signal, before the clipping - but there's no guarantee of that.  The scheme also seems a nightmare for any homopolymeric stretch - I doubt QuantumSI will be used to count glutamines within huntingtin.  But with looking up in a database, these should be manageable issues -- and the incumbent technique of mass spectrometry has its own challenges.

How simple is the workflow?  QuantumSI says their communications guy ran it.  One hours hands on time to digest the sample and click-label the C-termini for attachment to the flowcell, followed by 10 hours of running.  Automation of this workflow is on their development roadmap.

On the recognizer front, QuantumSI has made steady progress.  Their publication in Science used only three recognizers; at launch they had five and the newest kits have six.  This really emphasizes how their kinetic analysis can extract a great deal of data from a small number of recognizers.  Some post-translational modifications can already be detected, though the high value space of detecting phosphorylation is still in development.

On the informatics site, QuantumSI provides a hierarchy of data, with "what proteins are we identifying" on top, counts of individual peptides the next rung down and detailed kinetic information on each residue at the bottom.  

If QuantumSI is the Answer, What is the Question?

A core challenge with biological mixtures of proteins is the extreme of dynamic range. For example, with human blood (or serum or plasma) you can remove something like 99.99% of serum albumin and the dominant signal will still be serum albumin.  Solve serum albumin and a new set of abundant proteins must be batted down. The really interesting stuff is many orders of magnitude less abundant than all that.  Which is one of the reasons immunoassays such as home pregnancy tests are so amazing - they detect absurdly dilute targets in a sea of abundant proteins yet can be made cheaply and run with essentially no training.  

Some in the mass spec field have been not been shy about pointing out this issue; indeed, some have been downright obnoxious about it. Unless you can sequence enormous numbers of peptides - or figure out some extremely clever ways to deal with those abundant proteins - sequencing approaches will be swamped by boring background.  

QuantumSI's answer to this is to not take on such difficult challenges, at least not yet.  What they are proposing is that m biologists for ages have used tools such as Coomassie staining, Western Blots and ELISAs to study abundant proteins in simplified mixtures, and QuantumSI can provide higher information content but with workflows that are simple to learn and use.  After all, one drawback to mass spectrometry is it requires a very expensive set of instrumentation that requires a high degree of training to operate.  Mass spectrometers with associate liquid chromatographs are not something every lab is going to splurge on; doubly so on the mass spectrometrist to go with it.  QuantumSI claims their sample prep workflow is just a simple set of biochemical steps; no chromatography required if your inputs are simple.

At $85K an instrument, QuantumSI certainly isn't going to be ubiquitous as a simple gel box. Perhaps more seriously, the current instrument processes only two samples at a time, with runtimes of roughly overnight.  That's much less throughput than a simple gel box.  QuantumSI says that for applications so far they are resolving more peptides than required, so expanding the number of samples is high on their priority list.  This also points to another place the nucleic acids have a leg up - it's really easy to design barcoding schemes for DNA or RNA since we can easily design, synthesize and tack on such barcodes; this technology isn't well developed for peptides for direct peptide reading (the mass spectrometrists do have fancy mass-encoded tags).  But there are already case studies using QuantumSI to read out genetically encoded peptide barcodes, so there's already progress there.

Among applications mentioned by QuantumSI: reading out protein-protein interaction partners detected by immunoprecipitation, verifying protein engineering results, quality control for antibody production., and verifying if an engineered protein mutation is being correctly expressed.  All applications where the number of abundant proteins is sufficiently low to avoid the signal of interest being swamped out.

QuantumSI commented on the sorts of conferences they've attended and the response.  The Festival of Genomics - I first saw a box in the wild at FOG Boston last autumn - has been very successful, as has been other genomics-oriented conferences.  In their view, genomics practitioners are reluctant to invest in mass spectrometers.  They also go to proteomics-oriented conferences and encounter a much more mass spec oriented audience and the skepticism for NGS-like approaches held by that community.  Currently they are selling themselves in North America and Europe and using distributors to sell into Asia-Pacific geography.

It will be interesting to watch the further development of this space.  QuantumSI launched at the end of 2022 and is still the only NGS-like protein sequencing that has launched.  The new kits just announced have increased the number of peptides read out by about two to seven fold.  Personally, I think having more sample chambers per run is likely to be very popular; nobody ever ran a two lane gel!  And it may take time to identify the "killer apps" which will drive labs to buy into the platform, though even a few splashy publications could create some significant buzz.  

A final thought: it's interesting that QuantumSI gets attention at genomics-oriented meetings, but how much low-complexity protein sequencing are genome-focused labs interested in?  Perhaps it is a new direction that some are contemplating branching out in, but in general I don't see the QuantumSI approach - at its current level of sample throughput or tolerance for sample dynamic range - being a frequent companion for high throughput genome sequencing, RNA-Seq or spatial analysis.  There is an apparent fit for smaller scale synthetic biology and protein engineering labs perhaps - it remains to be seen how many such labs will try this technology out.  Rather than core labs, I suspect the better fit for QuantumSI is individual principal investigators or their equivalent in industry.  That is a very diffuse market with weaker network effects to drive adoption (versus genome labs that love to get on the latest bandwagon).

Tuesday, April 23, 2024

Bruker Wins NanoString Auction

NanoString declaring bankruptcy on the eve of 2024's edition of AGBT was a shock to many at the meeting and then there was confusion: would one of the sponsors have a dark booth? The aggressive 10X Genomics legal strategy that forced the bankruptcy raised a degree of polite ire. But NanoString marketing carried on and CSO Joe Beecham delivered a fiery speech saying "we're not going anywhere". Then an investment firm, Patient Square Capital, appeared to be the front runner for acquiring the assets, with speculation they would combine NanoString with their other spatial omics portfolio company, Resolve Biosciences.  But last week, as the genomics world was still processing PacBio's turmoil, news broke that Bruker had significantly outbid Patient Square - $392.6M vs $220M.  So Bruker takes NanoString home - and I gives me an entree to float an ontology of spatial technologies I've been fermenting, as Bruker will now have instruments in the four major spatial approaches.  And 10X now has a more formidable opponent in the ongoing patent wars.

Wednesday, April 17, 2024

PacBio Plummets

PacBio announced preliminary earnings yesterday, and the nearly immediate result was a 50% plunge in their share price.  Along with the earnings, the company announced significant cost cutting.  The details of those cuts were not made available, but some clever tea leave parsers noted a significant omission from what the company said it would continue.  The ASeq Discord channel on PacBio absolutely blew up, with opinions ranging from PacBio is in a death spiral to PacBio must be for sale, with significant numbers of "Christian Henry won't be CEO by year's end".