Kevin McKernan has played a key role in the development of genomics technology; if I am not mistaken he contributed to SPRI beads for cleanup and size selection, the SOLiD sequencer and improving Ion Torrent. He later had a run of startups which have focused on various applications of genomics - I remember hearing him speak at one conference on mitochondrial diseases and most recently he has headed Medicinal Genomics which is focused on cannabis and other producers of psychoactive natural products. Pre-pandemic, I was even on one teleconference with him.
Kevin's Anandamide substack, his tweets, and regular YouTube videos have presented his personal views on SARS-CoV-2 and the pandemic. I won't detail them here; the first post I link below has a long summary of some of those views. Early in the pandemic I criticized on Twitter one of Kevin's YouTube videos, which seemed to me to be engaging in serious wishful thinking in terms of viral phylogenies. Later I critiqued other tweets, but I finally realized that the time I was spending and the stress of it just wasn't worth it, so I stopped. But it is still the case, for whatever reason, that if I comment on SARS-CoV-2 then it is not uncommon for Kevin to accuse me of being a shameless pandemic profiteer, since my employer has done business in COVID testing, SARS-CoV-2 sequencing and licensed out a production process for a key component for making some mRNA vaccines. I have in the past been actively involved in the SARS-CoV-2 sequencing and am extremely proud of my contributions. But if you, like Kevin, feel that is a COI that disqualifies me from ever commenting on SARS-CoV-2 policy, so be it And by the way, a reminder that this is my personal blog and the views are purely mine and not something dictated or even pre-reviewed by my employer.
As one last bit of preamble, the available data shows that the mRNA vaccines for SARS-CoV-2 have saved enormous numbers of lives as well as reduced hospital stays and many other measures of human misery. Their effect on transmission has not been as strong as hoped, but anyone who says they never reduced transmission is ignoring multiple peer reviewed publications. I've had three rounds - two before a capsid finally got me and one after. And I won't be hesitant for more rounds if that's where the data points. I also expect we will have many other mRNA vaccines and therapeutics in the future - and that's an area I'm happy to contribute to. So again, if you think any COI disqualifies someone, I'm disqualified in your book.
Okay, let's get to the data. What has been transmitted to the world are the results of sequencing vaccine samples and some follow-up work confirming one of the striking findings. There has been at least one prior sequencing of vaccines released publicly, but they didn't go to great sequencing depth. Kevin and crew hammered the samples, performed metagenomic assembly and they looked for more than just the expected vaccine - they found expression vector. The notes are being pumped out regularly, so I am highlighting the ones I read in depth but I suggest interested parties scan the site regularly
A quick sketch of how mRNA vaccines are produced. A DNA template is synthesized to encode the antigen and cloned in an E.coli plasmid vector. Huge quantities of plasmid are produced and then linearized by digestion with a restriction enzyme. In vitro transcription (IVT) from a promoter in the construct drives creation of a linear transcript. The transcription reaction has a uridine analog such as pseudouridine replacing uridine, because RNAs lacking uridine are less immunostimulatory. That's core to both production schemes; details differ -- BioNTech's process creates the 5' cap by including a cap analog in the IVT whereas Moderna uses an enzyme later (the one my employer figured out how to make much more efficiently).
DNA from the vector - or the E.coli host - is undesirable for two reasons. First, DNA has the potential to be immunostimulatory. Second, the precise content of the DNA might be undesirable to deliver. That's an important point: regulatory agencies have set limits on total DNA content, but given that the potential risk from DNA could vary by what that DNA is, such a measurement is in my opinion inadequate given that we have means to plumb what specific DNA is present.
One big surprise to me is that the BioNTech/Pfizer vaccine is using a dual E.coli-mammalian expression vector (and worse, structural variation in it!). In my opinion - and you are free to dispute this point - that is bad design. The odds of that extra bit being a problem are likely small, but they are unknown - and if you can eliminate a possible bad actor at nearly no cost, why not? Could this replicate in human cells? Good question - but why not just take the question off the table? There may even be a metabolic benefit to having a bit less DNA in a high copy number plasmid. Having a single construct for vaccine production and cell culture studies maybe was a convenience, but given the trivial cost of subcloning or just resynthesizing it, that's not a good excuse.
I'll go out on a limb and stake an even harder line: for an application such as this, you should be treating your complete vector system as an engineered artifact that should be precisely engineered. Every base in a vector for making human mRNA or DNA therapeutic agents should have a purpose; there should be nothing that is just carried along because "it's always been there". The Saturn V didn't have odd bits because somebody was reusing something and couldn't be bother to engineer it out. - why in molecular biology do we not routinely custom engineer our systems but instead drag along unwanted parts?
A second big surprise is that the sequence data implied circular vector still in the vaccine - and in later work the Anandamide crew proved they can clone out vector by antibiotic selection. So the restriction digestion didn't go to completion. When producing at these huge scales, that's hard to guarantee -- but there are ways to do it. The most obvious to not rely on a unique restriction site, but instead pack multiple into the vector backbone - the odds of multiple going unhit is smaller than one. And since this vector has a functional antibiotic resistance marker, if you can't tweak that marker to hide the key restriction site in it, then at least put a restriction site between the promoter and the coding sequence, so that even most linear fragments won't have a live antibiotic resistance element. Life finds a way! - but don't make it easy.
These also raise the question of FDA and EMEA (and other regulatory agencies) requirements around release testing. There is a standard for contaminating DNA, but only in total quantity and not in content. Anandamide claims the logic for where the threshold is set is murky -- if that is true then that logic must be surfaced. But the even bigger question (and yes, posed by the Anandamide piece) is should there be standards on what DNA can be found at a specified level of detection in a lot of vaccine. Any DNA is undesirable, but perhaps some particular sequences are less desirable. Anandamide also found an apparently empty mammalian expression vector in one of the Moderna samples.
Anandamide also brings up questions around the fidelity of the transcription reaction when faced with uridine analogs. This is technically extremely hard to measure - hard but probably not impossible - because all high accuracy sequencing systems require DNA and reverse transcriptases inject their own error into the mix. So most authors report a combined IVT+RT error rate, though I just spotted a case where the authors deluded themselves into thinking they had a pure IVT error rate (reality is the errors, calculating from their own numbers, are probably about evenly split between IVT and RT but we don't know that precisely nor the variance in either component.
One other finding in the preprint that is an interesting surprise, though to me just a datapoint. Pfizer's bivalent booster may have full length versions of both the original spike sequence and the Omicron version; Moderna's appears to have full length for original spike (which is what was found by the earlier sequencing) but a truncated version of the spike for the Omicron. Now one caution I have here is I haven't looked carefully at the assemblies and reads (once again, wishing I had some interns I could throw at problems like this) and some of the assembly results are a bit surprising given how similar some of these sequences are -- that one would get two contigs with very high identity rather than many smaller ones. That's probably some of the magic of Megahit, the assembler used in the preprint and one I haven't used for many years. But it also could be misleading, if Megahit is basically making reasonable guesses of paths through the de Bruijn graph but those are not deterministic paths.
What is a more weighty question is whether sequencing should be part of the lot release process for mRNA therapeutics, and if so with what parameters and protocols and what cutoffs? Any such process will almost certainly only be realized if required by regulatory agencies, as there are many disincentives for manufacturers to explore the product. And I would reiterate that there must be definitive guidelines around what is actionable -- what if found would cause a lot to be rejected or potentially reworked. Setting limits must also be within bounds of reason; a limit of detection set too low would require sequencing to such depth as to add a substantial cost to the final product.
But these are discussions that should be had. For example, and not a sequencing question really, is what should the limit on viable plasmid be? The concerns for a non-mammalian vector would be primarily around pumping more antibiotic resistance markers into humans. It is murky how much of this goes from injection site to microbiome, so maybe no standard is needed. But that itself is a worthwhile discussion and an area to think about new experiments to inform such regulation.
Or, should there be stricter limits on functional DNA at all? When producing at scale, things which work well in typical research reactions can become major headaches - but that isn't a reason to completely discard various approaches to keeping the DNA out, whether via binding the DNA to a solid surface or destroying it with enzymes after RNA generation.
Perhaps an easier to agree on standard are sequences which just should not show up in a therapeutic mRNA lot, again with some defined limit-of-detection. For example, any given product should not have sequences coding for other proteins. Right now mRNA vaccines are the only such products, but I fully expect (and welcome!) the idea of our pharmacopoeia having many mRNA agents. It is likely therefore for multiple products to be manufactured in the same facility, and ensuring a lack of cross-contamination would enhance product safety and public confidence. Similarly, no known human pathogens should be detectable and probably no human DNA.
Pharmaceutical regulatory agencies have often set standards requiring very high technical excellence in analytical methods. Particularly when agents are delivered to large numbers of individuals, there is a strong public interest in the safety of these agents and in consistent quality. Ultimately, it is in the pharma industry's best interest as well, reducing the risk of disasters that taint all pharmaceuticals. The emergence of a new class of therapeutics - mRNA delivery - and one which is amenable to an entirely new class of analytical methods not appropriate to small molecules or even most biologics - should engender a sincere debate about what new technical standards are appropriate.
It's worth noting that because Kevin is a fierce critic of the vaccines, he has motivation to probe deeply. There's a lot of questionable analysis of SARS-CoV-2 sequences out there (and some that is absolutely mind-blowingly awful), but the "skeptics" do sometimes find intriguing stuff, such as contamination of metagenomic entries with SARS-CoV-2 and related sequences (e.g. in rice data; the first Anandamide piece listed above discusses at least one other case). One need not subscribe to the interpretations put forward to note that these suggested examples require follow-up by a broad bioinformatics community. Or if nothing else, attacking such cases with different approaches is a great student project concept.
Oh, one last thought. This work by the Anandamide folks is a reminder that if you are using recombinant methods to make a product, there really is no keeping your sequences secret. There have long been rumors that people have sequenced various recombinant enzyme preparations and successfully reconstructed the underlying expression vectors. There are ways to tamp this down and those make good sense when contaminating DNA is a serious nuisance, such as kits for amplifying single cells or metagenomics preparation kits. But realistically, every year our ability to detect the trace DNA in a manufacturer's reagent is increasing faster than any improvements in removing DNA. The sequences of the mRNA vaccine production systems should have been made public long ago, because their "secret" was never going to stay that way.
[2023-03-21 11:55 - fixed all the And-and errors in anandamide
[2023-03-25] For those interested in the literature on COVID-19 vaccine effectiveness, here is a Cochrane review of RCTs of the vaccine - this won’t include other approaches based on health statistics