The new Science has an extremely impressive paper tackling the problem of orphan enzymes. Due primarily to Watson-Crick basepairing, our ability to sequence nucleic acids has shot far past our ability to characterize the proteins they may encode. If I want to measure an RNA's expression, I can generate an assay almost overnight by designing specific real-time PCR (aka RT-PCR aka TaqMan) probes. If I want to analyze any specific protein's expression, it generally involves a lot of teeth gnashing & frustration. If you're lucky, there is a good antibody for it -- but most times there is either no antibody or one of unknown (and probably poor) character. Mass spec based methods continue to improve, but still don't have an "analyze any protein in any biological sample anytime" character (yet?).
One result of this is that there are a lot of ORFs of unknown function in any sequenced genome. Bioinformatic approaches can make guesses for many of these and those guesses are often around enzymatic activity, but a bioinformatic prediction is not proof and the predictions are often quite vague (such as "hydrolase"). Structural genomics efforts sometimes pull in additional proteins whose sequence didn't resemble anything of known function, but whose structure has enzymatic characteristics such as nucleotide binding pockets. There have been one or two of such structures de-orphaned by virtual screening, but these are a rarity.
Attempts have been made at high-throughput screening of enzyme activities. For example, several efforts have been published in which cloned libraries of proteins from a proteome were screened for enzyme activity. While these produced initial papers, they've never seemed to really catch fire.
The new paper is audacious in providing an approach to detecting enzyme activities and subsequently identifying the responsible proteins, all from protein extracts. The key trick is an array of golden nano anglerfish -- well, that's how I imagine it. Like an anglerfish, the gold nanoparticles dangle their chemical baits off long spacers (poly-A, of all things!). In reverse of an anglerfish, the bait complex glows after it has been taken by its prey, with a clever unquenching mechanism activating the fluorophore and marking that a reaction took place. But the real kicker is that like an anglerfish, the nanoparticles seize their prey! Some clever chemistry around a bound Cobalt ion (which I won't claim to understand)results in linking the enzyme to the nanoparticle, from which it can be cleaved, trypsinized and identified by mass spectrometry. 1676 known metabolites and 807 other compounds of interest were immobilized in this fashion.
As one test, the researchers applied separately extracts of the bacteria Pseudomonas putida and Streptomyces coelicolor to arrays. Results were in quite strong agreement with the existing bioinformatic annotations of these organisms, in that the P.putida extract's pattern of metabolized and not metabolized substrates strongly coincided with what the informatics would predict and the same was true for S.coelicolor (with a P<5.77^-177 for the latter!). But, agreement was not perfect -- each species catalyzed additional reactions on the array which were absent from the databases. By identifying the bound proteins, numerous assignments were made which were either novel or significant refinements of the prior annotation. Out of 191 proteins identified in the P.putida set, 31 hypothetical proteins were assigned function, 47 proteins were assigned a different function and the previously ascribed function was confirmed for the remaining 113 proteins.
Further work was done with environmental samples. However, given the low protein abundance from such samples, these were converted into libraries cloned into E.coli and then the extracts from these E.coli strains analyzed. Untransformed E.coli was used to estimate the backgrounds to subtract -- I must confess a certain disappointment that the paper doesn't report any novel activities for E.coli, though it isn't clear that they checked for them (but how could you not!). The samples came from three extreme environments -- one from a hot, heavy metal rich acidic pool, one from oil-contaminated seawater and a third from a deep sea hypersaline anoxic region. From each sample a plethora of enzyme activities were discovered.
Of course, there are limits to this approach. The tethering mechanism may interfere with some enzymes acting on their substrates. It may, therefore, be desirable to place some compounds multiple times on the array but with the linker attached at different points. It is unlikely we know all possible metabolites (particularly for strange bugs from strange places), so some enzymes can't be deorphaned this way. And sensitivity issues may challenge finding some enzyme activities if very few copies of the enzyme are present.
On the other hand, as long as these issues are kept in mind this is an unprecedented & amazing haul of enzyme annotations. Application of this method to industrially important fungi & yeasts is another important area, and certainly only the bare surface of the bacterial world was scratched in this paper. Arrays with additional unnatural -- but industrially interesting -- substrates are hinted at in the paper. Finally, given the reawakened interest in small molecule metabolism in higher organisms & their diseases (such as cancer), application of this method to human samples can't be far behind.
Ana Beloqui, María-Eugenia Guazzaroni, Florencio Pazos, José M. Vieites, Marta Godoy, Olga V. Golyshina,, Tatyana N. Chernikova, Agnes Waliczek, Rafael Silva-Rocha, Yamal Al-ramahi, Violetta La Cono, Carmen Mendez, José A. Salas, Roberto Solano, Michail M. Yakimov, Kenneth N. Timmis, Peter N. Golyshin, & Manuel Ferrer (2009). Reactome array: Forging a link between metabolome and genome Science, 326 (5950), 252-257 : 10.1126/science.1174094