AGBT Agriculture was a really nice meeting - lovely venue in the Arizona desert, a meeting size that you felt you could (but weren't compelled to) introduce yourself to everyone, good food, great scientific content, and everyone super-friendly. I came for the workshops on Sunday, and there were two on graph genomes / pangenomes. Which was a theme throughout the meeting - not every talk or paper mentioned these topics, but a huge fraction did. The agriculture world - both plant and animal (hmmm, no mushroom talks that I can recall) - is completely sold on the utility of graph genomes. Ideally really large ones not only capturing the complete genomic diversity of a species, but also layering in information from related species so that the graph can service questions of when particular genomic features arose in evolution.
On a related note, this group is very fond of telomere-to-telomere (T2T) genomes - there was very little questioning if they were a good idea. Some plant folks remarked it's still hard in some of the higher ploidy plants to achieve T2T, but still that was their goal. Many of these T2T efforts were throwing the kitchen sink at the problem - multiple long read technologies, short reads, Hi-C. One clever trick - not easy to repeat - was a horse project that built a T2T genome for a mule. The horse and donkey genomes are sufficiently diverged that they could show the resulting assemblies had zero detectable haplotype switch errors - every chromosomal contig either had all horse kmers or all donkey kmers.
As an aside, this fondness for T2T and long read genomes was not reflected in the sponsor list - neither Pacific Biosciences nor Oxford Nanopore had any representatives at the conference, let alone a sponsorship. Are they now taking this community for granted and deciding battling between themselves is a low growth strategy?
Where there was some disquiet is in building the graph genomes and querying them. There was a great talk on the panagram fast query tool for graph genomes. But what worried some of the workshop participants is the often long compute time required to build a graph genome - sometimes exceeding a week (or maybe even two). The favorite approaches essentially perform an all-all comparison of the inputs, which is a clear scaling problem
One person I spoke with who knows these algorithms was unworried. Any experts are welcome to chime in. But if we start building graphs of large allopolyploid genomes of interest such as wheat and layer in not only all the known varieties but wild relatives, those are going to be huge.
Genomics has always had a tight connection with computing technology, particularly in the high throughput "next generation sequencing" era. If someone had perfected the concept of a 454 sequencer in 1980, just calling the bases would have been a challenge - and storing all the images. One compute person I know thought BLAST would fade away before this century, but the sequencing kept ramping up and BLAST is still around. The NIH plot everybody loves to hate showed what a boon advances in sequencing technology have been for data generation - but now we are awash in data. Will Moore's law continue to save us? That whole space faces challenges around the capital cost to build the chip fabrication facilities to achieve even higher densities - and at some point transistors will hit an actual atomic limit on their size.
In an audience participation section at one workshop, a number of ideas were thrown out. Perhaps graph genomes will behoove careful decisions of what to include - taking a hit on full utility for the sake of being manageable. Perhaps quantum compute will save us - I know so little about what isn't hype in that space that I can't judge this at all. Has anyone published on quantum algorithms for graph genomes? - looks like one paper/preprint, but the quantum compute power isn't much yet, is it? Others worried that all the effort in silicon design is being sucked up by GPUs and such to support machine learning, starving more general compute. I found papers on building graphs with the assistance of GPUs.
Of course, mostly the same problems are faced by human geneticists and the growing number of long read human genomes is driving graph construction technology. Human is smaller and less complex than some plant genomes, but likely that community will continue to track ahead of the agricultural (and other) communities due to more intense funding.
Gila woodpecker offspring peering from a nest near my room
A different Gila woodpecker nest - a reasonable prior for any tall cactus is to find one
No comments:
Post a Comment