With valuable information emerging from the 1000 (human) genomes project and now a proposal for a 10,000 vertebrate genome project, it's well past time to expose to public scrutiny a project I've been spitballing for a while, which I now dub the 10,201 genomes project. Why that? Well, first it's a bigger number than the others. Second, it's 101 squared.
Okay, perhaps my faithful assistant is swaying me, but I still think it's a useful concept, even if for the time being it must remain a gehunden experiment. All kidding aside, the goal would be to sequence the full breadth of caninity with the prime focus on elucidating the genetic machinery of mammalian morphology. In my biological world, that would be more than enough to justify such a project once the price tag comes down to a few million. With some judicious choices, some fascinating genetic influences on complex behaviors might also emerge. And yes, there is a possibility of some of this feeding back to useful medical advances, though one should be honest to say that this is likely to be a long and winding road. It really devalues saying something will impact medicine when we claim every project will do so.
The general concept would be to collect samples from multiple individuals of every known dog breed, paying attention to important variation within breed standards. It would also be valuable to collect well-annotated samples from individuals who are not purebred but exhibit interesting morphology. For example, I've met a number of "labradoodles" (Labrador retriever x poodle) and they exhibit a wide range of sizes, coat colors and other characteristics -- precisely the fodder for such an experiment. In a similar manner, it is said that the same breed from geographically distant breeders may be quite distinct, so it would be valuable to collect individuals from far-and-wide. But going beyond domesticated dogs, it would be useful to sequence all the wild species as well. With genomes at $1K a run, this would make good sense. Of particular interest for a non-dog genome is the case of lines of foxes. which have been bred over just a half century into a very docile line and a second selected for aggressive tendencies.
What realistically could we expect to find? One would expect a novel gene, as is the case with short legged breeds, to leap out. Presumably regions which have undergone selective sweeps would be spottable as well and linkable to traits. A wealth of high-resolution copy number information would certainly emerge.
Is it worth funding? Well, I'm obviously biased. But already the 10,000 vertebrate genome has kicked up some dust from some who are disappointed that the genomics community has not had "an inordinate fondness for beetles" (only one sequenced so far). Genome sequencing is going to get much cheaper, but never "too cheap to meter". De novo projects will always be inherently more expensive due to more extensive informatics requirements -- the first annotation of the genome is highly valuable but requires extensive effort. I too am disappointed that greater sampling of arthropods hasn't been sequenced -- and it's hard to imagine folks in the evo-devo world being fond of this point either.
It's hard for me to argue against sequencing thousands of human germlines to uncover valuable medical information or to sequence tens of thousands of somatic cancer genomes for the same reason. But, even so I'd hate to see that push out funding for filling in more information about the tree of life. Still, do we really need 10,000 vertebrate genomes in the near future or 10,201 dog genomes? If the trade for doing only 5,000 additional vertebrates is doing 5,000 diverse invertebrates, I think that is hard to argue against. Depth vs. breadth will always be a challenging call, but perhaps breadth should be favored a bit more -- at least once I'm funded for my ultra-deep project!