Thursday, July 12, 2007

David Copperfield's Favorite Database

An interesting paper in BMC Bioinformatics led me to a database I hadn't heard of, and one which is very unusual. Most databases grow over time, often exponentially. This is a database intended to disappear.

The database is ORENZA, a database of orphan enzyme activities. These are enzyme activities which have been described in the literature, but not yet linked to a cloned protein. In other words, it is a big punchlist for our understanding of metabolism. This is the mirror image of all those lists of ORFs lacking known function out there; this is the list of identified functions lacking known ORFs.

I have found one puzzle in the paper which has me scratching my head; I wish a reviewer had insisted on an explanation. In the list of validated orphans, one entry is for EC (Heparosan-N-sulfate-glucuronate 5-epimerase), an enzyme I claim no mental familiarity with (though apparently I routinely take advantage of this activity). The note for it says
Involved in the biosynthesis of
heparan sulfate, which binds
proteins to modulate signaling
events in embryogenesis. Mouse
gene knock-out results in late
lethal phenotype

Huh????? How do you knock out a gene for an orphan enzyme? Indeed, there would seem to be a paper describing the cloned mouse gene in J Biol Chem from 2001. The protein seems to be annotated with the activity in UniProt. I'm clearly missing something here -- perhaps only the bacterial activites are orphans?

If I were behind ivy-covered walls, I would see this as a grand opportunity for projects for advanced undergraduate students in biochemistry / molecular biology / systems biology and so forth. Assign each student a bunch of activities from ORENZA and have them prepare a report on what is known about them. If the students can propose a good candidate, then beaucoup extra credit!

It is unlikely that many of these will be deorphaned by literature searches alone; biochemical slogging will be required. An interesting approach was just published in Nature in which an ORF was assigned a biochemical function by first experimentally determining its three-dimensional structure (via a structural genomics effort) and then bombarding it computationally with various small molecules. Successful docking of a number of adenine analogs gave a short list of candidate substrates and even a possible reaction. That latter trick is neat: by docking compounds that represent high-energy (transiently present) intermediates, the possible reaction can be guessed. In this case, the ORF was successfully shown to be a deaminase for several adenosine-like molecules (including adenosine itself).

Since the crystal structure had already been determined, determining the structure with one of the docked compounds was tractable with an excellent match to the docking prediction. The authors performed further docking to propose extending this annotation to 78 eubacterial and archeal ORFs.

There is a nice bit at the end describing some of the conditions that helped this effort to succeed and how general or specific they are. For example, the ORF in question belonged to a large enzyme family by sequence similarity, which narrowed the list of candidate reactions. Your commonplace ORF-that-looks-like-nothing-but-ORFs won't be helped by that. Also the enzyme did not undergo gross structural rearrangements on binding substrate, a phenomenon that would certainly confound this approach. The enzyme also functioned on well-characterized metabolites; enzymes that work on uncharacterized compounds may remain mysteries. However, even with these caveats, this approach is likely to yield further fruit, particularly since the structural genomics projects are really cranking out the structures.

No comments: