Omics! Omics!

Monday, January 27, 2025

Olink Reveal: Focused Proteomics, Simplified

I’ve covered a lot of genomics in this space, but there is an inherent challenge to studying biology via DNA - DNA is the underlying blueprint, but that blueprint must pass through multiple steps before actual biology of interest emerges. RNA-Seq gets closer, but much of the real action is at the level of proteins (though much is not - let’s not forget all the metabolites!). When I set out in this space 18 years ago, I thought I’d cover more proteomics but that didn’t materialize - time to plunk one piece on the proteomics side of the ledger!

Proteomics has multiple challenges, but two inherent ones are the diversity of proteoforms and the dynamic range within the proteome.

The diversity of proteins within a human is astounding, even if we discard the inherently hypervariable antibodies and T cell receptors which have specific means of diversification within an individual that include random generation of sequence during VDJ recombination and somatic hypermutation of antibodies. The rest of the bunch are subject to transcript-level diversification by features such as alternative promoters, alternative splicing and RNA editing and then another wealth of post-translational proteolysis, phosphorylation, glycosylation and a heap more covalent modifications. If we really wanted to make things complex, we’d worry about protein localization, who a protein is partnered with and even alternative protein conformations - but let’s just stick to primary proteoforms and a diversity that is estimated in excess of 1 million different forms.

The key part here is that there is no analytical method capable of resolving all of these. Any proteomics method is to some degree ignoring much of the proteome entirely, and for many other proteins compressing many forms into a single signal. Indeed, most proteomic tools look at very short windows of sequence or perhaps patches of three dimensional structure, and will rarely if ever be able to directly connect two such short windows or patches - they will be stuck correlating them. The key takeaway here is that all proteomics methods work on a reduced representation of the proteome.

The dynamic range in the proteome is astounding, with some potentially challenging effects. For example, blood serum is utterly dominated by a handful of proteins such as serum albumin, beta 2 microglobulin and immunoglobulins - for methods that look at the total proteome there is a serious danger of flooding out your signal with these abundant but relatively dull proteins and not being able to seen interesting ones such as hormones that are many logs lower in concentration.

Proteomics has been dominated by mass spectrometry, which has had over three decades to develop into a mature science. Mass spec is inherently a counting process and on its own can’t focus or filter out the dull stuff. Even more so, you don’t fly intact proteins in a mass spec, but peptides and there’s only a few useful proteases out there. Peptides don’t ionize consistently, so that adds a layer of challenge to quantitation. But as noted, this has been an intensely developed field for multiple decades and so there are very good mass spectroscopy proteomics techniques using liquid chromatography (LC-MS) and other methods to remove abundant dull proteins and fractionate complex peptide pools into manageable ones.

But, protein LC-MS is very much its own discipline, and most proteomics labs aren’t strong in genomics or vice versa - though there are certainly collaborations or dual-threat labs. LC-MS setups require serious capital budgets for the instruments and their accompanying sample handling automation and highly skilled personnel.

A number of companies are attempting to apply the strategies of high throughput DNA sequencing to peptide sequencing or identification. Quantum-SI is the only one to make it to market but there are other startups out there such as Erisyon are plugging away. These methods look a bit like mass spectrometry in their sample requirements, as they will also be counting peptides - and the current Quantum-SI doesn’t count nearly enough to be practical for complex samples such as serum or plasma.

The other “next gen proteomics” - one lesson not learned from the DNA sequencing world is the problem of calling something “next-gen” - this year will be the 20th anniversary of the commercial launch of 454 sequencing - approach is to use affinity reagents such as antibodies or aptamers and tag them with DNA barcodes, then sequence those barcodes on high throughput DNA sequencers. By using affinity reagents, the problem of boring but abundant proteins goes away – just don’t give them any affinity reagents. Dynamic range can be addressed as well - the exact details aren’t necessarily disclosed by manufacturers but one could imagine only labeling a fraction of a given antibody to tune how many counts are generated from a certain concentration of targeted analyte.

Olink Proteomics, now a component of Thermo Fisher, is one company offering a product in this space. Olink’s Proximity Extension Assay (PEA) relies on two antibodies to each protein of interest and requiring hybridization between the probes on both antibodies to enable extension by polymerase in order to generate a signal. This increases the specificity of the signal and tamps down any signal from non-specific binding - or from just having antibodies in solution.

Olink has released a series of panels targeting increasing numbers of target proteins in the human proteome. This is generally a good thing - except counting more proteins means generating more DNA tags which means a bigger sequencing budget per sample. The other knock on Olink’s (and their competitor SomaLogic, now within Standard Biotools and also marketed by Illumina) approach is a complex laboratory workflow that mandates liquid handling automation. So this has meant that the big Olink Explore discovery panels are inevitably going to be run at huge genome centers that have both the big iron sequencers and the liquid handling robots that are required. And this strategy has started paying out scientific dividends - some of which were covered by the Olink Proteomics World online symposium last fall that featured speakers such as Kari Stefansson. Olink’s and Ultima’s recent announcement on starting to process all of the UK Biobank is an example of such grand plans, and this will be run at Regeneron’s genome center.

Academic center core labs and smaller biotechs often power important biomedical advances, but if Olink Explore is only practical with NovaSeq/UG100 class machines and fancy liquid handlers, then few of these important scientific constituencies will be able to access the technology. Which would be unfortunate, since small labs often cultivate very interesting sample sets that very large population-based projects like UK Biobank might not have. Large population based and carefully curated small projects are complementary, but is only one able to access Olink’s technology?

And that’s where Olink’s newest product, Olink Reveal, comes in, enabling smaller labs to process 86 samples. First, a select set of about 1000 proteins is targeted, bringing the required sequencing for a panel of samples plus controls to fit on a NextSeq-class flowcell - only 1 billion reads required. Second, the laboratory workflow has been made very simple and practical to execute with only multichannel pipettes. The product is shipped with a 96-well plate that contains dried down PEA reagents; simply adding samples and controls to the wells activates the assay for an overnight incubation. The next day, PCR reagents are added to graft sample index barcodes onto the ligation products and then that is pooled to form a sequencing library. The library prep costs $98 per sample (list price) - $8,428 per kit. Throw in sequencing costs of $2K-$5K per run (depending on the instrument) and this isn’t out-of-line for other genomics applications.

Of course, this is a reduced representation over the larger “Explore” sets But Olink has selected the proteins to be a useful reduced representation. They’ve used sources such as Reactome to prioritize proteins, and have also prioritized proteins that have been shown to have genetically-driven expression variability in the human population - protein QTLs aka pQTLs. If the new panel is cross-referenced to studies using the larger panels, most of these studies would have found at least one protein showing statistically significant change in concentration. This can be seen in the plot below, where each row is a study colored by disease area. On the left is the distribution of P-values for the actual Olink Explore data and the right the same data filtered for proteins in the Olink Reveal panel.

It’s also robust - Olink has sent validation samples to multiple operators and compared the results, and the values from each lab are tightly correlated.

So Olink with their affinity proteomics approach is basically following the same playbook as genomics did with exomes. When hybrid capture approaches for exome sequencing first came out, it was thought these would be used for only a few years and then be completely displaced by whole genome sequencing (WGS). But exomes have proven too cost effective - even with drops in WGS costs, it is still possible to sequence more samples with exomes for the same budget. Yes, that risks missing causal variants outside the exome target set was always a concern – the recent excitement around lesions in non-coding RNAs such as RNU4-2 have demonstrated that - but many investigators saw exomes as enabling studies that otherwise wouldn’t happen. Plus sometimes the bigger worry is biological noise obscuring a signal you could see and that is dealt with by more samples.

The new Olink Reveal product fills a gap between Olink’s large “Explore” discovery sets and very small custom panels. In the Proteomics World talks many speakers described work run with PEA panels of only two dozen or so targets, often using PCR as a readout rather than sequencing. This shows one bit of synergy in the Olink acquisition by ThermoFisher, as Thermo has an extensive PCR product catalog including array-type formats. Thus PEA follows the well worn patterns in genomics: huge discovery panels for some studies, high value panels that balance cost and coverage for many studies and focused custom panels for validating findings on very large cohorts. The Proteomics World talks even suggested some of these focused panels might soon be seriously evaluated as in vitro diagnostics. With developments like these, targeted proteomics via sequencing will be a very interesting space to watch.

Illumina & NVIDIA Team to Remake How to Train Your DRAGEN

If you've been in a movie theater recently, you may have seen a trailer for a mixed live action and animation spectacle called How To Train Your Dragon. Having seen the purely animated original - and wished I had gone to a 3D showing as the flight scenes must have been amazing - it was a bit unsettling, as the animated dragon in the new looks exactly like the one in the old. It's apparent a shot-for-shot remake of the original, but this time with live human actors. So effectively a port of a script from one cinematic language to another. In a similar vein, at last week's J.P. Morgan Conference, Illumina and NVIDIA announced they will start porting Illumina's DRAGEN applications onto NVIDIA GPU hardware.

Benchtop HiFi: PacBio Unveils Vega

Well, the embargo has passed and one of the worst kept rumors of genomics land has come true: at this year's ASHG PacBio has unveiled their benchtop instrument, Vega. At about 2 feet for each dimension, it should fit easily in many labs, and there's no utility requirements beyond standard power (120V in US/220 V for Europe). With a list price of $169K or the alternative reagent rental pricing of $80K and a two year reagent commitment, it should fit many budgets. Vega runs a single 25M (Revio) flowcell in 24 hours to produce one 20X HiFi human genome.

Revio Refresh

ASHG is ongoing and tonight PacBio has a big party planned, with an unnamed musical guest. Rumors swirl as to what will be announced at that event. But in advance of the meeting, last week PacBio described multiple updates to the Revio platform, an instrument which made its debut two years ago at ASHG. PacBio CEO Christian Henry was kind enough to chat with me last week about the upgrades.

MiSeq Makeover

MiSeq is the the oldest instrument in Illumina's lineup, first unveiled back in 2011. MiSeq's launch stole much of the thunder from the Ion Torrent PGM at the time. Illumina brought out other instruments to push the lower boundary of their line: MiniSeq came in 2016 and iSeq 100 in 2018 - but MiSeq remained the most popular instrument of that batch. It has a warm place in my heart; at Starbase we contracted out many MiSeq runs since the necessary batch size was often very appropriate for us. In the meantime, various other instruments came and went - HiSeq originally launched about the same time as MiSeq and later there was HiSeq X, and in that time period we've seen Ion PGM be replaced by Ion Proton, PacBio cycle through multiple models, and 454 abandon the market and - as well as fizzles such as Genapsys. But today Illumina announced a new instrument family under the MiSeq moniker - and the iSeq 100 moniker - called the MiSeq i100, which harmonizes the low end of their line with the higher end.

QuantumScale: Two Million Cells is the Opening Offer

I'm always excited by sequencing technology going bigger. Every time the technology can generate significantly more data, experiments that previously could only be run as proof-of-concept can move to routine, and what was previously completely impractical enters the realm of proof-of-concept. These shifts have steadily enabled scientists to look farther and broader into biology - though the complexity of the living world always dwarves our approaches. So it was easy to say yes several weeks ago to an overture from Scale Bio to again chat with CEO Giovanna Prout about their newest leap forward: QuantumScale, which will start out enabling single cell 3' RNA sequencing experiments with two million cells of output- but that's just the beginning. And to help with it, they're collaborating with three other organizations sharing the vision of sequencing at unprecedented scale: Ultima Genomics on the data generation side, NVIDIA for data analysis, and Chan Zuckerberg Initiative (CZI) which will subsidize the program and make the research publicly available on Chan Zuckerberg Cell by Gene Discover.

Scale Bio is launching QuantumScale as an Early Access offering, originally aiming for 100 million cells across all participants - though since I spoke with Prout they've received over 140 million cells in submitted proposals. First 50 million cells would be converted to libraries at Scale Bio and sequenced by Ultima (with CZI covering the cost), with the second 50 million cells prepped in the participants labs with Scale Bio covering the library costs (and CZI subsidizing sequencing cost). Data return would include CRAMs and gene count matrices. Labs running their own sequencing have a choice of Ultima or NovaSeq X - the libraries are agnostic, but it isn't practical to run these libraries on anything smaller. Prout mentioned that a typical target is 20K reads per cell, though Scale Bio and NVIDIA are exploring ways to reduce this, so with 2M cells that's 40B reads required - or about two 25B flowcells on NovaSeq X.

How do they do it? The typical Scale Bio workflow has gotten a new last step, for which two million cells is expected to be only the beginning. The ScalePlex reagent can be first used to tag samples prior to the initial fixation, with up to 1000 samples per pool (as I covered in June). Samples are fixed and then distributed to a 96-well plate in which reverse transcription and a round of barcoding take place. Then pool those and split into a new 96-well plate which performs the "Quantum Barcoding", with around 800K barcodes within each well. Prout says full technical details of that process aren't being released now but will be soon, but hinted that it might involve microwells within each well. Indexing primers during the PCR add another level of coding, generating over 600 million possible barcode combinations. This gives Scale Bio, according to Prout, a roadmap to experiments with 10 million, 30 million or perhaps even more cells per experiment - and multiplet rates "like nothing".

As noted above, the scale of data generation is enormous, and that might stress or break some existing pipelines. Prout suggested that Seurat probably won't work, but scanpy "might". So having NVIDIA on board makes great sense - they're already on the Ultima UG100 performing alignment, but part of the program will be NVIDIA working with participants to build out secondary and tertiary analyses using the Parabricks framework.

What might someone do with all that? I don't run single cell 3' RNA experiments myself, but reaching back to my pharma days I can start imagining. In particular, there are a set of experiment schemes known as Perturb-Seq or CROP-Seq which use single cell RNA readouts from pools of CRISPR constructs - the single cell data both provides a fingerprint of cellular state and reveals which guide RNA (or guide RNAs; some of these have multiple per construct) are present.

Suppose there is a Perturb-Seq experiment and the statisticians say we require 10K cells per sample to properly sample the complexity of the CRISPR pool we are using. Two million cells just became 200 samples. Two hundred seems like a big number, but suppose we want to run each perturbation in quadruplicate to deal with noise. For example, I'd like to spread those four cells around the geometry of a plate, knowing that there are often corner and edge effects and even more complex location effects from where the plate is in the incubator. So now only 50 perturbations - perhaps my 49 favorite drugs plus a vehicle control. Suddenly 2M cells isn't so enormous any more - I didn't even get into timepoints or using different cell lines or different compound concentrations or any of numerous other experimental variables I might wish to explore. But Perturb-Seq on 49 drugs in quadruplicate at a single concentration in a single cell line is still many orders of magnitude more perturbation data than we could dream about two decades ago at Millennium to pack into three 96-well plates.

And that, as I started with, is the continuing story: 'omics gets bigger and our dreams of what we might explore just ratchet up to the new level of just in reach.

The announcement of QuantumScale also has interesting timing in the industry, arriving a bit over a month after Illumina announced it was entering the single cell RNA-Seq library prep market with the purchase of Fluent Biosciences. While nobody (except perhaps BGI/MGI/Complete Genomics) makes their single cell solution tied exclusively to one sequencing platform, the connection of Scale Bio and Ultima makes clear business sense - Illumina is now a frenemy to be treated more cautiously and boosting an alternative is good business. Ultima would of course love if QuantumScale nudges more labs into their orbit, and these 3' counting assays perform very well on Ultima with few concerns about homopolymers confusing the results (and Prout assures me that all the Scale Bio multiplex tags are read very effectively) . And as is so often the case, NVIDIA finds itself in the center of a new data hungry computing trend.

Will many labs jump into QuantumScale? Greater reach is wonderful, but one must have the budgets to run the experiments and grind the data. PacBio in particular and to a degree Illumina have seen their big new machines face limited demand - or in the case of Revio the real possibility that everyone is spending the same money to get more data (great for science, not great for PacBio's bottom line). But perhaps academic labs won't be the main drivers here, but instead pharma and perhaps even more so the emerging space of tech companies hungry for biological data to train foundation models - sometimes not even having their own labs but instead relying on companies such as my employer to run the experiments.

A favorite quote of mine is from late 1800s architect Daniel Burnham; among his masterpieces is Washington DC's Union Station. "Make no little plans. They have no magic to stir men's blood and probably will not themselves be realized." I can't wait to see what magic is stirred in women's and men's blood by QuantumScale, which is certainly not the stuff of little plans.

[2024-10-02 tweaked working around how program is funded]

Thursday, August 29, 2024

Illumina Would Like to Change the Conversation

A maxim from the great but fictional advertising executive Don Draper: "if you don't like what people are saying, change the conversation". In an online strategy update presented two weeks ago ( Slides / Replay ), Illumina announced they'd like a new conversation around sequencing costs. No longer will they tout reagent cost per basepair, but instead will be focused on the total cost of sequencing workflows. The obvious cynical response is that Illumina is conceding defeat on the raw cost, having been severely beaten by Ultima Genomics (and Complete Genomics aka MGI, but that group continues to face stiff headwinds) and even matched - if you have the volume - by Element Biosciences. Total cost of ownership is what really matters, right? The catch is how is it being calculated and who is doing the calculating?

It has always been known that cost per gigabase or per million reads was a convenient fiction. Convenient because only simply arithmetic was required to convert performance specs and list prices into the metrics. But a fiction since all the other costs didn't magically go away. But which costs are we now counting? And how do you count them? For example, if the library prep requires 4 hours of hands-on time, whose hands? A Ph.D. paid at Boston rates or a fresh B.S. graduate paid at U.S. heartland rates? (not knocking either - but cost-of-living in Boston is particularly painful for those starting out and that is reflected in higher wages). Illumina would particularly like to highlight the value of their DRAGEN computational acceleration platform - but when comparing it to conventional compute, what number do you pencil in? It all runs afoul of a dictum thrown out at a class on product financial modeling back at Millennium: keep it simple - "why spend the effort to invent a lot of numbers when you can just invent a few?"

Illumina would like to calculate from having a purified DNA sample to results on the other end, which fits with their strategy of offering - but not insisting on - vertical integration. So library prep, running the sequencer, primary bioinformatics and secondary bioinformatics. The same webinar teased that two new library prep products will be coming, though a year to a year-and-a-half (if they keep schedule) away that will further fit this model.

Other companies have already been taking potshots at Illumina on cost angles that might not make it in Illumina's official numbers. For example, Ultima Genomics UG100 has a "daily care and feeding" arrangement which differs greatly from Illumina's "load a new run after the next has finished" - since Illumina runs often annoyingly exceed an even multiple of 24 hours, full Illumina instrument utilization will ultimately require night and graveyard shifts. Oxford Nanopore would similarly tout the ability of PromethION to launch new runs at will. Element and Oxford would both count to lower capital costs. And so on.

Which also brings up under what scenario are we calculating costs? One with enough samples arriving all-at-once to get maximum cost efficiency on a NovaSeq X 25B flowcell? Or a scenario favoring Element where you must run now with a much smaller batch of samples - which seems to be a more practical model for the majority of core labs. So many ways for each company to frame the problem to favor themselves and prevent any sort of apples-to-apples comparison!

Two New Library Preps -- in the Future

Illumina touted two new library prep approaches they are developing - one which claims it will perform library prep on the flowcell and another offering "5 base" sequencing which would call 5-methylcytosine (5mC). No details were provided as to how either of these would accomplish this.

Element has been leading in moving library processes onto the flowcell, though in their case it isn't the initial library prep but hybrid capture enrichment. The Illumina prep won't be cost feasible without some sort of pre-instrument operation; the input DNA's must be tagged because there are just about no applications which call for running an entire 25B flowcell on a single sample. Perhaps this would just be tagging with barcoded Nextera (Tn5), but then the samples can be pooled and placed on the flowcell to complete the process. Another speculation I've seen is that the PIPseq templating technology acquired from Fluent would somehow apply.

Illumina not only is promising a simplified workflow, but also that the quality of the final data would be better than any other solution out there - and they were clearly aiming at (but without naming) PacBio HiFi data. That is certainly in the category of "show me the data!", as that is a very hard challenge - particularly since good long range contiguity data requires high molecular weight preps going into the process. This claim might suggest they are using the PIPseq technology to generate linked reads ala the old 10X Genomics kit - but I still remain skeptical that such data can deliver in the face of certain types of repetitive content, such as Variable Number of Tandem Repeat (VNTR) alleles where the repeat array is longer than the actual read length. And there are a range of applications - perhaps not yet as big as whole human genomes but someday - which require high accuracy single molecules - each single molecule read is the datapoint.

The other big promise is a 5-base reading chemistry. The first thing to note is it isn't the same as the "on instrument library prep". Illumina also didn't talk about reading 5-hydroxymethylcytosine (5hmC), the rarer but potentially buzzier additional mammalian epigenetic mark. The claim is their method will be a simple workflow with a single library, so not a case of running one bisulfite or enzymatically modified library to read 5mC and another native one to read the genome itself. A speculation I'll throw out is again around PIPseq - perhaps some partitions would have the enzymes to recode 5mC to something else (or all the non-5mC to U, as most modification methods do.

The most advanced approach in this space is Biomodal, which is overdue for a focused approach (and was founded by the creator of Solexa technology, Shankar Balasubramanian, originally under the name Cambridge Epigenetix). Biomodal creates libraries which effectively are duplexes, with one read reading one strand and the other reading the other. By clever series of enzymatic steps, the end result is that comparing the two strands can reveal both 5mC and 5hmC while still reading the underlying sequence - 6 base sequencing. Of course, there ain't no such thing as a free lunch - any advantages of having paired end reads for mapping are no longer available, and there's always the danger of creating noise by the enzymes not always hitting their marks.

Illumina didn't announce a purchase of Biomodal, so they must have found a different way of converting. They also promised a simple workflow - a knock I've heard on Biomodal is the workflow is not simple.

One smaller tease from Illumina is a goal of putting XLEAP chemistry on the MiSeq - which would certainly tidy up their product line. But would this be existing MiSeqs or is a next generation MiSeq under development? That was left ambiguous - as well as what would happen to MiniSeq and iSeq in the process.

All-in-all, it is a welcome change to see Illumina acting as if competition exists - the webinar was full of claims that the company is listening to their customers and seeking input. So they are going to talk the talk of not being stuck in monopolist mode - but will the walk the walk? Let's see how the next few years play out

Musings on Possible Fixes To PacBio & ONT's Achilles Heels

I recently tried to place a claim that I had first conceived Oxford Nanopore's "6b4" strategy for solving homopolymers, but that appropriately brought a number of citations for the concept that predated my blog piece. Not one to give up easily (and as hinted in that piece), I'm going to spend part of this piece trying to stake claim on some new concepts for fixing Oxford Nanopore's homopolymer issues - and PacBio's trouble with polypurine stretches. To be honest, much of this piece will consist of me posing questions I haven't bothered to try to chase down if they've already been answered in the literature. But not only might someone do that, but it may well be that data already exists in the public sphere to explore proof-of-concept! But I haven't checked that either - though doing so was on my list of "what to do if management gave me the summer off" - but they didn't.

Tagify: seqWell's Line of Tagmentation Reagents Awaits Your Creative Thoughts!

One of the most important enzymes in the sequencing world, one which enables spectacular creativity on the part of novel assay designers, is Tn5 transposase. Personally, I spend many times each month thinking about how to use Tn5 and its ability to tagment - both tag and fragment - input DNA. There’s even reports that Tn5 can tagment RNA-DNA hybrids such as from reverse transcription or even long single-stranded DNA. I’ve covered seqWell in the past,with their fully kitted reagents; now the company (which just turned ten) is launching a Tagify product line that is focused on enabling NGS dreamers to easily explore new Tn5-based library preparation methods.

mRNA Therapeutic / Vaccine Quality Control: A Major ONT Opportunity?

Oxford Nanopore is in the process of morphing into a product-focused company, and so must identify specific markets in which they believe nanopore sequencing can compete or even dominate. One such market that was spotlighted this year at London Calling is the quality control of mRNA therapeutics, where nanopore sequencing may be able to replace a kitchen sink of technologies and often provide superior data.

Pharmaceutical and diagnostic quality control is both similar and very different to research. While many sequencing research experiments are to some degree a fishing expedition, in a quality control assay very specific hypotheses are tested with specific, pre-determined thresholds. Consistency of results is the most critical; an assay run today must be comparable with one run last month or last year. These markets may be less sensitive than research to cost; if a QC test is part of qualifying a vaccine batch which will sell for millions of dollars, spending a thousand on that assay isn't unreasonable at all.

It's worth reviewing the process of how mRNA vaccine drug substance are made. The initial vaccine design is synthesized into a plasmid; this design includes a poly-A tail followed by a restriction site (which cannot occur within the vaccine design, though it could occur elsewhere in the promoter backbone). Enormous batches of plasmid are grown in E.coli and extracted and then linearized with the restriction enzyme that cuts after the poly-A tail and has no sites . In vitro transcription is used to transcribe the linear template, with the nucleotide mix containing a uridine analog such as 5-pseudouridine in place of uridine. If the BioNTech process is used, then the nucleotide pool also contains a guanine analog which contains a 5' cap structure (CleanCap). If Moderna's process, then the in vitro transcription product is treated with a capping enzyme (typically Vaccinia Capping Enzyme aka VCE; please see conflict-of-interest disclosure at the bottom of this piece). After purification and concentration of the active drug substance (removing nucleotides, process enzymes, uncapped product, etc), drug product is ready for the finish-and-fill steps of encapsulating it in the lipid nanoparticles and filling vials for distribution.

QC is all about detecting what might go wrong and ensuring consistency of product. mRNA therapeutics and vaccines are complex products, with many possible parameters to measure.

First, there's the question of "is this the right product?". mRNA vaccines continue to evolve and expand in scope, with new designs targeting specific SARS-CoV-2 variants, influenza and RSV vaccines. If a vaccine product should be one specific variant, it is mislabeled and unusable if it is really a different variant. Many vaccines are now polyvalent, targeting multiple viruses or multiple variants within a single virus. This adds a whole new dimension of not only have the correct set of vaccines been blended together, but is the fraction of the whole for each one within defined bounds. As RNA products, there is also the question of whether the RNA is what was intended and no mutations have arisen during propagation of the plasmid.

Similarly, was the correct uridine analog used in production? In vitro transcription may generate undesirable products, such as double-stranded forms of the intended product. How much of these are present? What fraction of the transcripts are capped? Are the RNAs full length or are there partial or degraded versions present? How much plasmid is left, and is it linear or closed-circular form? How much E.coli genomic DNA contamination is present?

Many "old school" technologies exist for many of these questions. A standard gel can be used to assess the length distribution. Sanger or short read sequencing can be used for sequence verification - though Sanger will be a poor choice for multivalent designs. HPLC may be used for a number of the questions. But typically each assay asks a single question, and often with significant constraints. For example, if a problem is discovered in a multivalent vaccine in which there are out-of-spec shorter RNAs present, can Sanger or short reads tell which component is degraded?

Pfizer has published an approach using specific RNA cleavage (harking back to how Woese sequenced RNA to create the Archea hypothesis - and much before) feeding into mass spectrometry. In some ways it looks like really short short read sequencing - some fragments are indistinguishable. The perceived advantages are that this method can distinguish fragments with the correct uridine analog vs. those with just uridine and it can distinguish capped 5' end fragments from uncapped ones. I've meant to do a deep dive on this for over a year after Kevin McKernan had pointed me to it; time to re-prioritize that!

ONT is proposing that Direct RNA sequencing (plus DNA sequencing of plasmid batches) can be used to build a single assay to test nearly all - if not all - of the final drug product and standard DNA sequencing for assessing batches of circular or linearized plasmids. As noted in my piece on ElysION and TraxION, this sort of "applied market" would be very appealing to ONT in terms of providing a steady source of revenue. Direct RNA is the only currently marketed sequencing approach that can look at the modified bases, potentially giving ONT a large edge. Many of the questions of interest are better answered with long reads - the distribution of RNA species lengths, which RNA species are which length - giving any long read platform an edge. Should there be a problem, long read sequencing can quickly identify correlations between different anomalies.

Of course, this does require levels of precision and accuracy. Data was presented suggesting that minor variants can be detected at around 1% frequency. Improved algorithms for poly-A length determination appear to enable very precise determination.

ONT dreams of covering more angles. For example, nanopore sequencing on its own probably can't determine whether the 5' cap structure is present. But, with some sort of pre-processing - perhaps resembling Cappable-Seq/Recappable-Seq, it may be possible to tag either correctly capped or non-capped messages. Similarly, it may be possible to differentially tag single stranded and double-stranded RNA

In terms of scale, Direct RNA sequencing in the current ONT protocol cannot be barcoded. For huge infectious disease batches that may not be an issue; for small personalized cancer vaccine batches cost may be more of an issue. Flowcell washing may be one solution, or ONT may be driven to enable barcoding (there are apparently external protocols for this).

How big a market will RNA vaccines be for ONT? That is of course the big question. mRNA vaccines seem to be here to stay, but how many more vaccines will be launched? Delivering other therapeutics by mRNA is still an unproven market. If mRNA delivery turns out to be a growth market, ONT can ride that wave. If it remains a niche market, there's still gain for ONT but not what will drive them to profitability. Lacking a reliable crystal ball, everyone must simply wait to see how this unfolds.

Conflict of Interest Disclosure / humble brag / me pretending to do Business Development. I am (still!) employed by and hold stock in Ginkgo Bioworks. During the pandemic Ginkgo Bioworks developed a new fermentation process for producing Vaccinia Capping Enzyme (VCE). This process is ten-fold more productive than the baseline process. Ginkgo licensed this process to Aldevron, which is now owned by Danaher. So production of mRNA therapeutics with capping using VCE may, through an opaque process, benefit me financially. Little to no evidence of that so far, but it could happen! And if you have a fermentation process that could be tuned up, feel free to reach out to me!

ONT T2T Genome Bundle: Hot New Thing or Flash in Pan?

Last month at London Calling, Oxford Nanopore announced a consumables and reagent bundle which enables generating six telomere-to-telomere (T2T) human genomes for $4K each. Even in the very friendly audience at London Calling, there was some skepticism over the market viability of this offer - how much would it really drive sales? T2T human genomes really only became possible in this decade. The first examples of T2T chromosomes generally used a mix of different technologies, often including PacBio HiFi, ONT Ultralong and BioNano Genomics mapping information. What ONT is proposing is the ability to routinely generate T2T genomes using only ONT data.

ScalePlex: Easing High Sample Count 3’ scRNA Sequencing

Scale Bioscience officially rolled out today - their rep was already talking about it at the Boston Single Cell Symposium I attended yesterday - a new cell indexing reagent called ScalePlex to streamline single cell 3' RNA sequencing of multiple samples.

Aftermath

I have multiple drafts of posts trying to finish up my London Calling items and then a long list of ideas in various stages of gestation - and been dangled a new tech update under embargo. But today, I'm on a mission - to help my now former colleagues. My employer, Ginkgo Bioworks, has executed an approximately 25% layoff. I survived the cut, but the list of talented, wonderful people who have been cast away is long and covers a wide range of talents. You really could start multiple quality small biotechs with these new unemployed people.

I've been laid off twice before and it's miserable. I was lucky each time and had only a short period of unemployment - but biotech was doing well each of those times. The industry is in a serious slump right now, with many companies cutting back and some closing altogether. Even large companies are slashing away - Takeda is setting free over 600 employees here in Boston - perhaps some are remnants of the Millennium acquisition. Far too few companies are being created.

So if you have leads on open positions, I am listening. You can leave comments, email me (keith.e.robison on Gmail), connect on LinkedIn, DM on Twitter, etc. I've never received a message by carrier pigeon, but if that's you're style I won't object. All will be passed on to a Ginkgo alumni community.

As a meme I saw put it "this too shall pass - perhaps pass like a kidney stone, but it shall pass". The long-term societal upside from biotechnology is too great for this to be anything but a temporary dip - but temporary can be a very long time.

Friday, June 07, 2024

CariGenetics: Breakthrough Breast Cancer Genetics in the Caribbean - but Also a Template for ONT Clinical Push?

London Calling isn't nearly as exhausting as AGBT, but the first day of talks is packed and then follows with the social event that goes late - this year with CEO Gordon Sanghera living out his dream of being the frontman for a band. Then if you'd like you can follow the crowd to a pub to drink on ONT's tab (that and crashing the ONT wrap-up dinner is the extent of my drawing personal benefit from ONT, contrary to a commenter on the prior piece who wrongfully believes they fund my LC expenses), and when that pub closes to another one (I peeled off after the first pub). So one can be a bit draggy heading into the second morning, but that was solved quickly by CariGenetics CEO Dr. Carika Weldon, who wowed with an exuberant strut down the central runway to a lively calypso beat - and then wowed everyone further with a stellar presentation. She also gave the lunchtime Product Demo talk (alas, I can't find that talk either on YouTube or in Nanopore Community) in the central product area, filling in some colorful details on her young company's early travails - all resolutely conquered.

ElysION vs. TraxION: Divergent Shots at Applied Market End-To-End Automation

London Calling was a particularly good opportunity to take stock of Oxford Nanopore's progress to a "fire-and-forget" sample-to-answer solution for "applied markets" such as food safety, public health and biotherapeutics quality control. ElysION (formerly Project TurBOT) and TraxION represent very different approaches targeting different subsets of this broad market opportunity - and I heard from some interested parties that neither is quite what they want. That doesn't mean they aren't right, but it does mean ONT may need to think of more approaches.

The broad concept is to have a a device that takes some sort of biological input, with minimal to no upstream processing, and performs all necessary steps so that nanopore sequencing data emerges from the instrument, with no human intervention after the run is set up. ONT is envisioning these being placed in clinical labs, public health labs, biotherapeutic quality control labs, etc.

Thoughts on A Decade of Oxford Nanopore Sequencing

I'm writing this the eve of Oxford Nanopore's London Calling conference. This is a big one, as this summer marks the 10th anniversary of ONT releasing devices into the wild. It's been a long, interesting journey and I'm much too jet-lagged to try to review old posts or even link to them, but a bunch of thoughts have been in my head the last few days.

FOMO Index at All Time High This Week

I'm in the airport getting set to jet off to London Calling - already spotted a kindred spirit at Logan doing the same, but I could be easily be going somewhere else - or staying home. There are three major 'omics conferences this week, all in incompatible geographies and overlapping. Then there is a major vendor announcement day - also in London and perhaps about nanopores and conflicting. There's also a pub meetup in London that thankfully doesn't Back in Boston, there's also two free NGS vendor events I might have gone to. Not only can't I attend them all, I can't attend most - and it will be impossible to monitor Twitter in real time much of the time.

Business travel is always a two-sided coin. On the one hand I enjoy seeing new sights and revisiting other favorites. I've been lucky that I've fit some sort of fun into just about every business trip I've taken, even if it's meant riding a nearly empty cable car to a deserted Fisherman's Wharf. Two exceptions were both day trips, but how I couldn't sneak in excitement at Monmouth Junction New Jersey or East Haven Connecticut won't exactly haunt me. But I'm also torn about travel: home is where the dog is. Also the spouse and F1.

Conferences are also an exhausting rush; London Calling isn't as bad as AGBT in this regard as it is shorter, there's only a late evening one night and there's not afterparties or European friends defeating your well intentioned plans to sleep by bringing a bottle of port and fine Dutch chocolates. But it's always intense trying to catch talks, make meetings, visit demonstrations and booths, catch up with friends you only see at a specific meeting and take notes to share with colleagues and have a shot at writing something coherent for this space. I'm not complaining - I could always quit, but I won't anytime soon. This year London Calling is three days of talks instead of the usual two, so a bit more of the thrilling grind.

And of course, at least for the next several weeks, there's my current day job to attend to. Conferences are great and often lead to valuable connections and information; at AGBT i even tried on a Business Development persona which seems to have yielded multiple legitimate leads four our enzyme discovery & engineering business. But in the end, the team I'm on plus myself would like to see progress on the projects I've taken on, and I'm many solar orbits past the age where I can conference all day and code all night.

This year London is also featuring a Diagnostics Day from Roche on Wednesday. There have been persistent rumors that Roche will finally, over a decade since acquiring Genia and four years after acquiring Stratos, announce a nanopore sequencing platform. Or maybe not. But counterprogramming that against London Calling is downright annoying!

If you are in London for either event, the folks at Plasmidosaurus have announced a pub night on Thursday at The Court. It's over near University College London, which isn't very close to Old Billingsgate where London Calling is, but it appears to be a simple tube ride around the Circle Line.

There's a list of conferences I'd like to attend but the timing never works; the Biology of Genomes has been "I'll go next year" for two decades. SFAF is higher on the list, given the location - I love the American West scenery and Santa Fe can be such a great launchpad for so many interesting adventures - plus it sounds like a great conference. SFAF has a reputation for taking on more of the early stage commercialization technologies that don't have a strong home at AGBT anymore. Last year it was bumped for the practical concerns of a college graduation, moving the F1 back home and a celebratory trip. Now it's in conflict with London Calling and put off for another year.

If I'd stayed in Boston, I might have thought about catching NextFlow Summit. Understanding workflow languages is becoming critical in bioinformatics (don't ask if I've followed that advice), and NextFlow is one of the leaders.

Back home, PacBio has the Boston edition of their PRISMBoston edition of their PRISM series -- I caught it two years ago and the talks were very good, but last year some schedule conflict of another caught me. Complete Genomics is having a grand opening celebration at their new "Customer Experience Center" in Framingham, the biomedically biomedically immortal Boston suburb. I won't begin to claim I've checked other geographies; I did see a Nanostring event in central Europe going on this week also.

So I'm off to Gatwick in under an hour. It horrified my British colleague I mentioned this to - Heathrow does have cachet plus often great views of Windsor Castle on approach - but I pointed out she's the one who got me hooked on picking up delectables at Borough Market, and if you get off the train from Gatwick one stop early at London Bridge the market is almost under the station. I'll get there just around the time it opens - and it's a quite reasonable walk to my hotel near The Tower of London from there. JetBlue made it even easier last year by sending my bag to Heathrow and then delivering it, but I can't count on such service on a regular basis!

Going to London Calling or the pub outing? Look for my hat celebrating the birthplace of Taq polymerase and please do say hi!

Friday, May 17, 2024

HiFi WGS As A (Nearly) Unified Tool For Rare Genetic Disease Diagnosis

What is now way back in February, Alexander Hoischen presented a talk at AGBT which described early results from an effort to apply PacBio HiFi sequencing at scale for solving rare disease cases. Hoischen passionately made the case for how providing a diagnosis can change affected families. It's also worth noting how important rare disease genetics has been to the history of biology, illuminating new processes and entire pathways. Something I hadn't appreciated until his presentation is how many technologies are currently thrown at a case in current workflows because each technology can cover a few types of mutations but miss others. So this is good snapshot of the current state of human genomics technology with hints of where it might be going. And Hoischen made a strong case that many other technologies - but not all of them - can be retired if PacBio HiFi sequencing is the lead approach. A longer, similar talk is also available as a PacBio-sponsored webinar given by Lissenka Vissers from the same institution and some of the data is in a preprint linked below.

DoveTail Transposes Their Hi-C Methodology

Technologies vying for state-of-the-art in human genome analysis are a recurrent theme in this space, and there are many ideas on this in the collection I really need to get out over the next two weeks before my brain is overwhelmed by London Calling. Up today: Dovetail Genomics popping back on the scene (as a subsidiary of Cantata Bio) with an AACR poster several weeks ago showing early results from a "LinkPrep" kit that will commercialize tagmentation (in vitro transposition to fragment DNA and add adapters) for Hi-C library generation, with the promise of enabling short read sequencing to deliver both SNVs as well as long-range structural information all from the same library.

AQTUAL: Arthritis Drug Selection Via Assaying Cell Free Chromatin

[Note: after I initially released this, Dr. Abdueva spotted some glitches; I pulled it back for editing & then got swept into London Calling; this revised version is finally emerging]

Liquid biopsies - the idea of peering into the disease state somewhere in the body by looking at “cell-free DNA” in the blood - is quite the rage these days. There are a host of companies and approaches, and I haven’t quite found the discipline to start trying to build a census of all of them. The field started with Non-Invasive Prenatal Testing (NIPT), and then some early NIPT cases had odd DNA that looked like oncogenic in healthy mothers - who turned out to actually have the cancer. Oncology has been the primary focus, but there’s been many hints that liquid biopsies may be valuable in a wide range of diseases. A bit over a week ago, Dr. Diana Abdueva founder and CEO of AQTUAL, walked me through (over Zoom) that company’s liquid biopsy approach to inflammatory disease management.

Monday, January 27, 2025

Wednesday, January 22, 2025

Wednesday, November 06, 2024

Wednesday, October 09, 2024

Friday, September 27, 2024

Thursday, August 29, 2024

Two New Library Preps -- in the Future

Monday, July 29, 2024

Tuesday, July 09, 2024

Friday, June 28, 2024

Thursday, June 27, 2024

Wednesday, June 26, 2024

Monday, June 24, 2024

Friday, June 07, 2024

Tuesday, June 04, 2024

Tuesday, May 21, 2024

Monday, May 20, 2024

Friday, May 17, 2024

Sunday, May 12, 2024

Friday, May 10, 2024

Get new posts by email: