Monday, November 06, 2006

From biochemical models to biochemical discovery

.

An initial goal of genome sequencing efforts was to discover the parts lists for various key living organisms. A new paper in PNAS now shows how far we've come in figuring out how those parts go together, and in particular how discrepancies between prediction & reality can lead to new discoveries.

E.coli has been fully sequenced for almost 10 years now, but we still don't know what all the genes do. A first start would be to see if we could explain all known E.coli biology in terms of genes of known function -- if true, that would say the rest are either for biology we don't know or are for fine-tuning the system beyond the resolution of our models. But if we can't, that says there are cellular activities we know about but haven't yet mapped to genes.

This is precisely the approach taken in Reed et al. First, they have a lot of data as to which conditions E.coli will grow on, thanks to a common assay system called Biolog (a PDF of the metabolic plate layout can be found on the Biolog website -- though curiously marked "Confidential -- do not circulate"!). They also have a quantitative metabolic model of E.coli. Marry the two and some media that support growth cannot be explained -- in other words, E.coli is living on nutrients it "shouldn't" according to the model.

Such a list of unexplained activities is a set of assays for finding the missing parts of the model, and deletion strains of E.coli provide the route to which genes plug the gaps. If a given deletion strain fails to grow in one of the unexplained growth-supporting media, then the gene deleted in that strain is probably the missing link. The list of gene to test can be made small by choosing based on the model -- if the model is missing a transport activity, then the intial efforts can focus on genes predicted to encode transporters. Similarly, if the model is missing an enzymatic reaction one can prioritize possible enzymes. The haul in this paper was to assign functions for 8 more genes -- a nice step forward.

It is sobering how much of each genome which has been sequenced is of unknown function, even in very compact genomes. Integration of experiment and model, as illustrated in this paper, is our best hope for closing that gap.

No comments: