Sunday, October 31, 2010

Plenty of Genomes are Still Fair Game for Sequencing

I've been grossly neglecting this space for an entire month with only the usual excuses -- big work projects, a lot of reading, etc. None good enough. Worst of all, as usual, it's not that I haven't composed possible entries in my head -- they just never get past my fingertips.

Tonight is the night most associated with pumpkins, and an earlier highlight was attending the Topsfield Fair, where the pictured specimen was on display. Amazing as it is, it fell nearly 15 pounds shy of the world record. If you want to try to grow your own, every year the variety which has dominated the winners can be purchased. Nature isn't all though; champion pumpkin growing requires a lot of specialized culture ranging from allowing only a single fruit to set to injecting nutrients just upstream of that fruit.

Sometime in recent memory there were some other blogs noted in GenomeWeb for discussing whether there are any truly remarkable genome sequencing projects left. Which I've been pondering: what makes for a very interesting species to sequence. Now, both of the bloggers mentioned clearly were not fond of either "K" genome project -- the 1,000 humans or 10,000 vertebrates. There were also some potshots taken at the "delicious or cute" genomes concept. One suggested that no interesting metazoa ("animals") are left.

So, what does make an interesting genome? Well, I can think of several broad categories. I'll try to throw out possible examples of each, though to be honest I wouldn't be surprised if some of these genomes are sequenced or nearly so -- it's very hard to keep track of complete genomes these days!

First, which I think would resonate with those two critical articles, would be genomes with interesting histories -- genomes that might tell us stories purely about DNA. This was the bent of these papers I refer to. In particular, they were thinking of many of the unicellular eukaryotes which are the result of multiple endosymbiont acquisition / genome fusion events. But, I would definitely throw into this category a particular animal: the Bdelloid rotifers, which have gone without recombination for a seeming eternity. Of course, to really understand that genome, you'd need to also sequence one of the less chaste rotifers.

Another hugely interesting class of genomes would be those to shed light on development and its evolution (evo-devo). In particular, there are a lot of arthopod genomes yet unsequenced -- from what I've noted it appears that most sequenced arthropods are either disease vectors, agricultural pests or economically important (plus, of course, the model Drosophila). Even so, I'd guess there are not many more than a dozen complete arthopod genomes so far -- quite a paucity considering the wealth of insects alone. And, if I'm not mistaken, mostly insects and an arachnid or two have gone fully through the sequencer -- where are all the others? By the way, I'd be happy to help with sample prep for the Homarus americanus genome!

Another huge space of genomes worth exploring are those were we are likely to find unusual biochemistry going on. Now, a lot of those genomes are bacterial or fungal, but there are also an awful lot of advanced plants that have interesting & useful biochemical syntheses.

All that said, I find it odd that some don't see the import and utility of sequencing many, many humans and a lot of vertebrates also. It is important to remember that a lot of funding is from the public, and the public considers many of these other pursuits less important than making medical advances. It is easy for those of us in the biology community to see the longer threads connecting these projects to human health or just the importance of pursuing curiosity, but that doesn't always sell well in public.

An optimistic view is that all the frustrated sequencers should hunker down and patiently wait; data generation for new genomes is getting cheaper by the minute, with short reads to fill out the sequence and ultra-long reads to replace physical mapping. A more conservative view holds that bioinformatics & data storage will soon dominate the equation, which might still make it hard to get lots of worthy genomes sequenced.

Personally, I can't stroll a country fair without wanting to sequence just about everything I see on display -- the chickens that look like Philadelphia Mummers, the two yard long squash, bizarrely shaped tomatoes -- and of course, the three quarter ton plus pumpkins.


Titus Brown said...


I'm kind of shocked that you look at the tree of metazoa and conclude that - hey! we need more Arthropod genomes!

Ecdysozoa and chordates as a whole are severely overrepresented, in fact. We have only a few lophotrochozoa, even if you include the unpublished ones. We need more!

Only a few nonchordate deuterostomes have been sequenced, too: sea urchins and some related echinoderms, as well as one or two hemichordates (Saccoglossus kowaleskii). If we want to know more about origin of chordates and vertebrates, there's where we should look.

Note that neither the lamprey nor hagfish genomes are available yet (although we're working on lamprey quite feverishly). What, you think all vertebrates have jaws?

Very few plants have been sequenced, and they're going to be a real bitch because of the repeat content.

Hey, and microbes! The Tree of Life is not well covered at the microbial level, even with all the bacteria and archaea that have been sequenced in recent years. See the GEBA project for an example of the impact that some judiciously chosen genomes can have on protein discovery.

We've got another decade of genome sequencing ahead of us before we can say that we're even close to well sampled. People who claim otherwise just don't know their biology.


Anonymous said...

Of course there are a lot of interesting genomes which are still waiting for sequencing. In my experience, I can say that many plants of agricultural relevance (such as wheat) are still far from being sequenced, their genomes are huge and highly repetitive. However, their genomes could be extremely useful since these plants feed the world.

Anonymous said...

Genome sequencing is getting much cheaper. At UCSC, there is an attempt to sequence the campus mascot (the banana slug) with essentially no funding.

Keith Robison said...

Nice to see some good feedback.

Titus: Definitely you have some good suggestions. I had a jawless fish tagged as done; I guess it still has a way to go. The 10K genome project should cover a lot of important unsequenced vertebrates -- lots of good targets there.

I'll fully admit I don't know my invertebrates well. Arthropods are obvious to me since there are so many forms & it is easy to see some interesting developmental questions there (starting with all those appendages but also different styles of metamorphosis and the acquisition of flight).

And yes, I agree there is still a lot of interesting fungi, bacteria and whatnot worth sequencing -- particularly to discover novel biochemistry. Just imagine all the complex metabolite biosynthesis operons that must still lie in the Streptomyces alone!

S. Pelech - Kinexus said...

Most of the species of organisms on this planet have not even been identified, and it is possible that the genomic sequences of organisms could become the chief criteria for their classification. This should drive the sales of DNA sequencing machines and employ armies of gene sequencing technicians for another few decades at least. A few might argued that with mass extinction of many species on our planet due to human activity that we have a moral obligation to do this more quickly. The sequencing of genomes of life forms from other worlds, if they have genomes, will most certainly also keep the gene sequencing stakeholders happy when we run out of organisms on this planet. Intriguingly, even the concept of an interplanetary "Noah's" ark for the preservation of life from Earth to distant worlds when we ruin this one may not even require a broad selection of living animals, planets and microbes, but just their genomic sequences and the capacity to produce synthetic life.

With the drop in the cost of gene sequencing, all kinds of fanciful ideas will abound. Just because we can undertake certain activities, does means that we should do so now. The main issue I have with the currently proposed gene sequencing efforts is not so much the direct costs but rather the lost opportunity to make sense and use of the genetic and proteomics information that we already possess. With increasingly reduced funding for biomedical researchers trained and engaged in enzymology, metabolism and basic biochemistry, our capacity from actually benefiting from the genomics legacy we already have will diminish rather than improve.

Paul Morrison said...

My brother used to enter pumpkins in the Topsfield Fair back when 800 pounds was a big one. When they started soaking seeds in colchicine to increase ploidy he called it quits.

I agree completely and more. The pumpkin, the lobster, the weird bug in a South American jungle. Where are we getting the new discovery that will impact on human health? From one of those outlier genomes no doubt.

We might have to hunker down for the analysis and storage of genome data get out of kindergarten where we seem to be stuck for the time being. But we'll fix it.