Friday, November 03, 2006

Phosphopallooza.

Protein phosphorylation is a hot topic in signal transduction research. Kinases can add phosphate groups to serines, threonines & tyrosines (and very rarely histidines), and phosphatases can take them off. These phosphorylations can shift the shape of the protein directly, or create (or destroy) binding sites for other proteins. Such bindings can in turn cause the assembly/disassembly of protein complexes, trigger the transport of a protein to another part of the cell, or lead to the protein being destroyed (or prevent such) by the proteasome. This is hardly a comprehensive list of what can happen.

Furthermore, a large (by some estimates 1/4 to 1/5) amount of the pharmaceutical industries efforts, including those at my (soon to be ex-) employer Millennium, are targeting protein kinases. If you wish to drug kinases, you really want to know what the downstream biology is and that starts with what does your kinase phosphorylate, when does it do it, and what events do those phosphorylations trigger.

A large number of methods have been published for finding phosphorylation sites on proteins, but by far the most productive have been mass spectrometric ones (MS for short). Using various sample workup strategies, cleverer-and-cleverer instrument designs, and better software, the MS folks keep pushing the envelope in an impressive manner.

The latest Cell has the latest leap forward: a paper describing 6,600 phosphorylation sites (on 2,244 proteins). To put this in perspective, the total number of previously published human phosphorylation sites (by my count) was around 12,000 -- this paper has found 50% as many as were previously known! Some prior papers (such as these two examples) had found close to 2,000 sites.

Now some of this depth came from many MS runs -- but that in itself illustrates how this task is getting simpler; otherwise so many runs wouldn't be practical. The multiple runs also were used to gather more data: looking at phosphorylation changes (quantitatively!) over a timecourse.

One this this study wasn't designed to do is clearly assign the sites to kinases. Bioinformatic methods can be used to make guesses, but without some really painful work you can't really make a strong case. And if the site shouldn't look like any pattern for a known kinase -- good luck! There really aren't great methods for solving this (not to say there aren't some really clever tries).

Also interesting in this study is the low degree of overlap with previous studies. While the reference set they used is probably quite a bit lower than the 12K estimate I give, it is still quite large -- and most sites in the new paper weren't found in the older ones. There are in excess of 20 million Ser/Thr/Tyr in the proteome and many are probably not phosphorylated, but certainly a reasonable estimate would be north of 20K are.

For drug discovery, the sort of timecourse data in this paper is another proof-of-concept of the idea of discovering biomarkers for your kinase using high-throughput MS approaches (another case can be found in another paper). By pushing for so many sites, the number of candidates goes up substantially, since many sites found aren't modulated in an interesting way, at least in terms of pursuing a biomarker. This is noted in Figure 3 -- for the same protein, the temporal dynamics of phosphorylation at different sites can be quite different.

However, it remains to be seen how far into the process these MS approaches can be pushed. Most likely, the sites of interest will need to probed with immunologic assays, as previously discussed.

No comments: