Tuesday, June 23, 2026

Shredding Genomes: Establishing Plausible Deletability

There's an interesting new preprint from Jay Shendure's lab which introduces the technique of Shred-Seq, which is a Cas3-driven approach to generating large deletions across a large eukaryotic genome.  By creating diverse collections of deletions, Shred-Seq enables identifying hidden non-coding sequences of functional importance.

I wrote recently of Craig Venter's efforts to first synthesize and then minimize a Mycoplasma genome.  Establishing what genes could be shed relied largely on transposon insertions to disrupt coding sequences, which works well in a densely-packed bacterial genome.  But with an expansive mammalian genome, we'd like to understand which stretches carry important regulatory signals and which are just accumulated clutter - and since we lack great tools for systematically identifying non-coding signals that's not easy.

The Shred-Seq preprint has a bunch of interesting ideas and also shows results from different iterations of the concept,  I'm going to give a quick overview that will certainly miss some aspects and won't respect that careful history of how the technology developed - please read the preprint! 

First, there are the "beacons" - constructs that will enable generating deletions and reading them out in pools.  Each beacon contains GFP, which enables sorting initial transformants for successful beacon introduction.  This will also be the initiation point for Cas3-driven deletion.  Each beacon also has a randomly-generated barcode, so that different beacons can be distinguished in downstream data.

There are many ways to map insertions - such as transposons - within a pool of genomes containing such insertions.  I've proposed several for here at the strain factory, mostly because I haven't loved some of the existing methods.  Shred-Seq picks yet another one - each beacon has two phage promoters in it (originally both T7, later one T7 and one SP6) which face each other across the barcode.  Extracted DNA can be transcribed with the appropriate polymerase to generate a long transcript going into the adjacent genomic DNA, which can then be converted to cDNA and read with either long reads or PCRed to read with short reads.  Importantly, T7 transcription can also be performed in methanol-fixed and rehydrated cells, enabling single cell RNA-Seq to also read the synthetic transcripts from the beacon.

Beacons are delivered into cells via transposon or lentivirus and then positive cells can be sorted by their GFP signal.  Most cells will have a single transposon, though at some frequency cell replication causes a second hop.  DNA can then be extracted from the pool, transcribed with phage polymerase(s), cDNA made, and the resulting sequences used to map the location of insertions.  Critically the post-sorting population is "bottlenecked" - the library size is artificially reduced before the cells are grown out.  The value of this will show up below.

Introducing the machinery to create deletions - Cas3 plus the components of the Cascade complex (Cas5, Cas7, Cas8 and Cas11)  - causes deletions to be made. The Cas3 guide RNA targets GFP, causing deletions to radiate from it in one direction.  Since each beacon location went into the Cas3 deletion process as multiple cells - the progeny of the original bottlenecked cells - many different deletions (aka an allelic series) are created from each beacon.  This population can now be sorted for loss of GFP fluorescence.  

Beacons within the extracted DNA from the post-sorting population can now be activated with the appropriate phage polymerase, with the cDNA sequences now marking out the boundary of the deletions and tying them to beacon barcodes.  This can also be performed after rounds of growth - providing "pre-selection" and "post-selection" snapshots of deletion abundances.  If a particular barcode drops out during growth, then some critical element has been removed by that deletion. And since these deletions range from 0.1 kb to over 300 kb, the allelic series for a beacon can flag a critical region distant from the original insertion using very efficient sequencing from pools.

Of course, if you can't get an initial insertion near an element there won't be any deletions covering it.  Three different introduction methods were tested - PiggyBac and Sleeping Beauty transposons as well as lentivirus.  Each favored open chromatin, but with different biases within that.  Lentiviruses favored actively transcribed regions  - and particularly introns - and PiggyBac enriched for transcription start sites and enhancers. SleepingBeauty had the least bias - but also the greatest number of multiple insertions, which is undesirable. Both transposons tended to land in intergenic regions.  One whole subline of work could be identifying additional introduction methods, engineering to alter these biases, or treatments (HDAC inhibitors to open up more chromatin?) to ameliorate them.

There's a large section exploring Cas3 deletion properties in terms of chromosomal features.  One component of this is generating deletions in the other direction from the same beacons; I don't see detailing of this but a guide RNA targeting the other strand of GFP should work.  

Generally the deletion lengths follow a log-normal distribution, but a shift in the distribution before and after growing for multiple cell divisions can highlight functional elements.  An example given is a deletion series starting 32kb away from WDR3 - pre-selection 39% of the deletions in haploid HAP1 cells overlapped the gene but after selection for growth but post-selection none did.

Analyses were performed in both haploid and diploid HAP1 cell lines - apparently if you grow out HAP1 long enough it will become diploid.  That isn't how the diploid beacon set was created - they started with diploid HAP1 lines - but it did confuse me at first because I didn't know about this phenomenon and thought, as the name suggests, they were always haploid.  Diploid HAP1, as expected, showed a much higher tolerance to deletions and most beacons showed a log-normal distribution of deletion lengths even after selection, whereas haploid HAP1 often deviated from log-normal after selection (this analysis omitted beacons that showed deviation from log-normal pre-selection).    Interestingly, analysis of 173 beacons from haploid HAP1 puts bounds of between 50% and 96% of the genome is dispensable for growth of haploid cells.  This clearly can be refined with a deeper set of beacons, but does suggest that for just growth much of the genome could be cleared out.  Of course, there is a lot more to human biology than just replicating cells!

There are many additional directions this work could lead to.  Generating a deletion map for haploid HAP1 cells could identify many non-coding elements that control core cellular processes.  As the preprint notes, replacing sorting with selections could enable greater numbers of beacon-bearing cells to be generated and processed.
 
My good buddy Gemini (not a horoscope believer, but that is my sign!) says there exists human embryonic stem cell (ESCs) lines which are haploid, as well as mouse haploid ESC lines.  So the idea of implanting beacons and then differentiating these lines isn't completely absurd.  That could extend the Shred-Seq approach to interesting cell developmental or cell type specialization questions - what deletions affect T-cell activation or even just prevent differentiation into certain cell types.  Creating a haploid deletion map of mouse or other species would help map which elements are conserved across the mammalian radiation.


No comments: