Monday, May 14, 2007

What's left to sequence?

The era of complete organismal genome sequences is just over a decade old. Somewhere on another blog (alas, I cannot find the page) there was a rumor that the National Center for Human Genome Research is no longer soliciting white papers for genome sequencing, but rather the funding focus will shift to standard funding mechanisms. If true, this may mark the end of the era of genome sequencing for the sake of understanding genome sequences.

The initial sequences had quite a rush of excitement to them, because they were blazing new territory. Even before the E.coli genome was finished, the early rushes on it revealed exciting details of genome organization. Haemophilus gave the first picture of a complete genome and Mycoplasma genitalium a hint at a minimalist genome. Saccharomyces, Caenorhabditis and Homo each represented a huge new milestone in genome complexity. As genomes came rushing out it became harder and harder to track what had and had not been completed, with the tracking harder if you expand the list to what was well underway. Since it wasn't in my specific professional needs, my personal tracking grew very sloppy.

It's gotten so that now I'm surprised when something is sequenced more by the claims of how the genome is novel than the actual content. What, we hadn't sequenced one of those yet? If the genome sequencing era is entering a plateau, what has been overlooked?

Of course, a key question is what makes a genome interesting to sequence. Obviously, if the organism is important to you, then seeing its genome sequenced is important. For example, our household's next generation of omicist will be quite disappointed to learn that his beloved Ailuropoda melanoleuca hasn't made the cut. Similarly, had the new Massachusetts biotech proposal come in the era of genome sequencing excitement, perhaps we would see its centerpiece the sequencing of local favorites Homarus americanus and Gaddus morhua (favorites that is, with drawn butter and beer batter, respectively).

So one criterion for genome sequencing has obviously been those species which are very important to a lot of people, or at least a lot of biologists. Another has been genome novelty, or the probability that the genome will on its own tell a very interesting genomic story, in terms of organization, content or evolutionary history. And another has been the use of genomes for cross-comparison. In this last category, it could be argued that the mammalian space is pretty well covered now.

But was anything overlooked? To ask the question with the criteria above is to, unfortunately, severely probe my ignorance in a number of fields. For example, what is the most serious (perhaps measured by worldwide deaths) bacterial pathogen yet unsequenced? How many of genera of bacteria or archea have a currently culturable member but no member sequenced? What is the most important (economic value as a proxy) industrial microorganism to not yet reveal its genomic secrets?

Even if I stick to eukaryotes, I find my understanding lacking. Despite a lifetime of gardening & outdoors exploration, I'll confess mostly ignorance in the plant arena. Of course, a lot of economically relevant plants have been tough going due to high repeat content (with maize as the poster child for this issue) or have large genomes with high ploidy (many important wheat varieties are hexaploid or octaploid). Despite this, I will propose below one plant that seems to have been missed. Fungi are another area where I am more ignorant than knowledgable (luckily, fungal genomes have their own blog!.

Furthermore, as one might hope, the people who think about this stuff for a living have covered most of the obvious stuff. Clearly it would be useful to round out our sampling of chordates with a jawless fish and a cartilaginous fish, and indeed a lamprey and a shark are in progress. The remaining orders of mammals, particularly the monotremes (egg-layers), are getting their due. Early animal evolution, with all sorts of interesting questions of the appearance of genes relevant to complex developmental processes, are being covered with a sponge and hydra. Lichen fungus? Covered! Flatworm -- yes! Earthworm -- no, but a leech covers the segmented worms.

I fully expect that someone will point out half or more of these as actively being sequenced, but here is my list of possible oversights. I tried to hit Google with a reasonable number of searches, but there's always one more to do. Furthermore, I hope -- no I challenge -- readers of this will make cases for other species.
  • Euglena is the original taxonomic paradigm-busting organism, neither clearly plant or animal. Now believed (last I heard) to be an ancient symbiotic fusion of a trypanosome-like organism and a photosynthetic species. There has been an EST project, but apparently no full genome sequence

  • Genome sequencing has been applied to a number of insects and at least one tick, but this hardly covers the amazing variety of arthropods. Certainly it would be interesting from an evo-devo angle to have some more. Millipede would be an obvious one, and of course (as noted above) many crusteaceans are economically important.

  • Many plants rely on symbiotic fungi on their roots to extract various nutrients from the soil; one (or a few) of these would give a window into what is entailed in this particular symbiosis. Another fungus that might be interesting would be the one cultured by leaf-cutter ants.

  • Dodder is one weird looking plant, due to it lacking photosynthesis. I've periodically stumbled across patches of the stuff, and it looks more like a human creation or the spinnings of a deranged extraterrestrial spider than a plant -- tangles of orange thread-like tendrils. Dodder's chloroplast genome was sequenced quite a while back, and I'm surprised the nuclear genome hasn't followed -- parasites seem to always have quirky genomes and a lot to say (by comparison) about their non-parasitic relatives.

  • Finally, there is what I wish I had thought to propose for the 454 sequencing contest: a rotifer genome project. Rotifers are tiny invertebrates which are found ubiquitiously in standing water. What is striking is that one of the two large subdivisions of rotifers, the Bdelloids, show every evidence of avoiding any sexual reproduction or recombination over vast evolutionary time. This long-term clonality has fascinating effects on the genome -- each copy of each gene in this diploid genome is evolving on its own trajectory. For comparison, at least one sexual rotifer species should be sequenced


neilfws said...

Somewhere on another blog (alas, I cannot find the page)
This post at Evolgen

The Genomes Online Database is probably the most comprehensive survey of what's being sequenced.

Keith Robison said...

Thanks -- I'm not sure how I didn't find GOLD, but it does say how long I've been out on the sidelines of this stuff.

Jonathan Badger said...

How many of genera of bacteria or archea have a currently culturable member but no member sequenced?

Too many to count. But, it is feasible and arguably interesting to ask how many bacterial or archeal phyla have a culturable member but no member sequenced.

The bacterial Tree of Life project is sequencing eight representatives from bacterial phyla that haven't had a sequenced representative yet. Yes, I'm involved in the project, but you did ask...

Keith Robison said...

Thanks -- of course that's what I was asking about!

I heard E.O. Wilson speak about the Encyclopedia of Life project & the number he gave for known bacterial species seemed small (too small, but who am I to question EOW?), so I was guessing they lumped into a few hundred genera, many of which had a member sequenced. So much for back-of-the-envelope estimation!

Anonymous said...

The genomes of a couple mycorrhizal fungi are currently being sequenced.