Wednesday, June 24, 2026

E.coli Doesn't Like Diets Either, At Least Genomic Ones

Another interesting preprint that fits several recent themes - genome minimization, discontinuation of products - comes from a Japanese group that reports on significantly reducing the genome of E.coli.  They tried for 1.7Mb but ended up settling - for now - with a 2.3Mb genome.  That's only about half the size of the strain they started with, but not at all a small genome.  

One note about this paper is that they lean in hard on computing analogies, on synthetic biology as code.  This is particularly captured in a sentence in the abstract: "Our platform consists of integrated development environment (IDE) and runtime environment (RTE)".  Their intended design is a "kernel genome"

One hiccup the group encountered is that they were originally replicating very large chunks of the genome in vitro using technology from the Japanese company OriCiro.  OriCiro had developed an in vitro method to reconstitute a very sophisticated bacterial replication system in vitro, purely with recombinant proteins.  So not at all PCR or LAMP - this is full replication of 100 kilobase or more circles of DNA complete with error correction and topoisomerases.  A really cool step towards booting up a full cell from defined parts - and you could buy it!  I was part of a cadre at The Strain Factory that proposed acquiring OriCiro, but that didn't happen.  Instead, they were purchased by Moderna - which always seemed a bit odd given that OriCiro was great for replicating huge plasmids and Moderna needs only small ones.  Perhaps it was a good fit - definitely eliminates the issue of endotoxins. But the major consequence is that Moderna wasn't interested in continuing externally-facing commercial operations, and so the kit is no longer on the market.    The preprint authors had their own recipe for the parts, but this is an approach not feasible for many - particularly if they are worried about Moderna's legal team (any of them have 10X Genomics on their resume? That would be a very big red flag!)

The designing started with the already reduced genome DGF-298W, which has only 3Mb. The attempted reduction to 1.7Mb was conservative in many ways, attempting to retain the existing relative positions and orientations of retained genes and operons.  As they note, this also enables debugging with RecA-assisted recombination versus the DGF-298W genome.

The kernel genome is designed to be finally assembled inside a host, and then shattering of host genome can be triggered by expressing the restriction enzyme AvrII.  This requires scrubbing all recognition sites for AvrII, CCTAGG, from the kernel genome.  An interesting hitch is that the E.coli 16S rRNA sequence has CCUAGG in it, and so the 16S RNA was edited to remove the site but retain the same secondary structure.  Similarly, tRNA-Glu genes also have AvrII sequences in them, so versions were designed to remove this sequence yet retain full affinity for EF-Tu. 

All of the cloning was done in E.coli, skipping the yeast Transformation Assisted Recombination (TAR) steps which have been used often for large genome rewriting projects. Yet another trick here was to have AvrII sites flanking some pieces along with a recognition sequence for the E.coli replication termination protein Tus - by including Tus in reactions it was possible to mask AvrII sites to prevent them from being digested.  Eventually this gave 6 megachunks, ranging from 206 to 423 in size.  These were assembled using a phage inegrase, but within the host genome.  And then that was excised using Cre-Lox recombination.

Except the original genome didn't really work when set on its own - no colonies.  So RecA-mediated recombination was used to identify sections of the RERE6 genome which if added back would enable viability.  Finally a genome was achieved, REGE-229, with about 7 large segments of RERE6 recombined into the kernel design.  That is viable - if perhaps a bit slow, with a doubling time of 108 minutes about 5 times that of wild-type E.coli  

So this is another reminder that there is still much about the basics of genomes we don't understand.  Why didn't the original design work?  Gemini uncovered Pelagibacter ubique (HTCC1062), a free-living oceanic gram negative bacterium that clocks in at 1.7Mb.  There's insect endosymbionts that are much smaller yet still have the double membrane scheme, but P.ubique seems like a good benchmark - why can't we get down to that size? Which is the intended size of this group.

Of course, one issue is that editing genomes is only vaguely analogous to editing code bases.  Imagine how much slower coding would go if you were only allowed to overwrite old code with new using homologous recombination, or had to carefully track specific 6-mers in your code and use them for patching - but in doing so might break other code with a stray copy?  The tools for genome manipulation have advanced mightily in the last quarter century, yet still can feel like trying to manipulate tools inside a glovebox - with a remotely-controlled set of manipulators working the glovebox and honey smeared on all the optics.  Okay, a clunky simile to try to illustrate a clunky methodology.  

Having only a limited number of required undesigned segments is exciting.  While debugging designed genomes is never easy, in theory each segment can now be attacked in parallel.  The group performing this work has extensive experience designing "guest chromosomes", so each offending segment could be offloaded to a guest and then sliced-and-diced to create combinatorial libraries that drop out different combinations of undesigned DNA.  The software metaphor breaks down here again - who has debugged their code by building a combinatorial collection of different versions that leaves out different sets of code blocks?  And making different code blocks is cheap; designing combinatorial dropout libraries of DNA much less so - though it is definitely in the realm of feasibility.

A recent publication from the Church lab on debugging designed genomes offers some other approaches - it's in my virtual stack to read & ideally I'll get to it very soon.

No comments: