Monday, December 14, 2009

Panda Genome Published!

Posted by Picasa

Today's big genomics news is the advance publication in Nature of the giant panda (aka panda bear) genome sequence. For I'll be fighting someone (TNG) for my copy of Nature!

Pandas are the first bear (and alas, there is already someone making the mistaken claim otherwise in the Nature online comments) and only second member of Carnivora (after dog) with a draft sequence. Little in the genome sequence suggests that they have abandoned meat for a nearly all-plant diet, other than an apparent knockout of the taste receptor for glutamate, a key component of the taste of meat. So if you prepare bamboo for the pandas, don't bother with any MSG! But pandas do not appear to have acquired enzymes for attacking their bamboo, suggesting that their gut microflora do a lot of the work. So a panda microbiome metagenome project is clearly on the horizon. The sequence also greatly advances panda genetics: only 13 panda genes were previously sequenced.

The assembly is notable for being composed entirely of Solexa data using a mixture of library insert lengths. One issue touched on here (and I've seen commented on elsewhere) is that the longer mate pair libraries have serious chimaera issues and were not trusted to simply be fed into the assembly program, but were carefully added in a stepwise fashion (stepping up in library length) during later stages of assembly. It will be interesting to see what the Pacific Biosciences instrument can do in this regard -- instead trying to edit out the middle of large inserts by enzymatic and/or physical means, PacBio apparently has a "dark fill" procedure of pulsing unlabeled nucleotides. This leads to islands of sequence separated by signal gaps of known time, which can be be used to estimate distance. Presumably such an approach will not have chimaeras though the raw base error rate may be higher.

I'm quite confused by their Table 1, which shows the progress of their assembly as different data was added in. The confusing part is that it shows the progressive improvement in the N50 and N90 numbers with each step -- and then much worse numbers for the final assembly. The final N50 is 40Kb, which is substantially shorter than dog (close to 100Kb) but longer than platypus (13 kb). It strikes me that a useful additional statistic (or actually set of statistics) for a mammalian genome would be to calculste what fraction of core mammalian genes (which would have to be defined) are contained on a single contig (or for what fraction will you find at least 50% of the coding region in one contig).

While the greatest threat to panda's continuing existence in the wild is habitat destruction, it is heartening to find out that pandas have a high degree of genetic variability -- almost twice the heterozygosity of people. So there is apparently a lot of genetic diversity packed into the small panda population (around 1600 individuals, based on DNA sampling of scat)

BTW, no that is not the subject panda (Jingjing, who was the mascot for the Beijing Olympics) but rather my shot from our pilgrimage last summer to the San Diego Zoo. I think that is Gao Gao, but I'm not good about noting such things.

(update: forgot to put the Research Blogging bit in the post)
Li, R., Fan, W., Tian, G., Zhu, H., He, L., Cai, J., Huang, Q., Cai, Q., Li, B., Bai, Y., Zhang, Z., Zhang, Y., Wang, W., Li, J., Wei, F., Li, H., Jian, M., Li, J., Zhang, Z., Nielsen, R., Li, D., Gu, W., Yang, Z., Xuan, Z., Ryder, O., Leung, F., Zhou, Y., Cao, J., Sun, X., Fu, Y., Fang, X., Guo, X., Wang, B., Hou, R., Shen, F., Mu, B., Ni, P., Lin, R., Qian, W., Wang, G., Yu, C., Nie, W., Wang, J., Wu, Z., Liang, H., Min, J., Wu, Q., Cheng, S., Ruan, J., Wang, M., Shi, Z., Wen, M., Liu, B., Ren, X., Zheng, H., Dong, D., Cook, K., Shan, G., Zhang, H., Kosiol, C., Xie, X., Lu, Z., Zheng, H., Li, Y., Steiner, C., Lam, T., Lin, S., Zhang, Q., Li, G., Tian, J., Gong, T., Liu, H., Zhang, D., Fang, L., Ye, C., Zhang, J., Hu, W., Xu, A., Ren, Y., Zhang, G., Bruford, M., Li, Q., Ma, L., Guo, Y., An, N., Hu, Y., Zheng, Y., Shi, Y., Li, Z., Liu, Q., Chen, Y., Zhao, J., Qu, N., Zhao, S., Tian, F., Wang, X., Wang, H., Xu, L., Liu, X., Vinar, T., Wang, Y., Lam, T., Yiu, S., Liu, S., Zhang, H., Li, D., Huang, Y., Wang, X., Yang, G., Jiang, Z., Wang, J., Qin, N., Li, L., Li, J., Bolund, L., Kristiansen, K., Wong, G., Olson, M., Zhang, X., Li, S., Yang, H., Wang, J., & Wang, J. (2009). The sequence and de novo assembly of the giant panda genome Nature DOI: 10.1038/nature08696

No comments: