Wednesday, July 01, 2009

Gene Expression from A-Z

I was playing with the data from an early RNA-Seq paper just to have a general idea of what such data looks like and to check out some favorite genes. It was also an exercise in learning the latest Spotfire -- I had Spotfire back at MLNM but it's been over 2 years and a completely new interface was rolled out.

An easy way to find favorite genes was and compare across the three tissues (brain, liver, muscle) is to set up a trellis plot with expression as the y-axis and the gene name as the x-axis, and then use the filtering tools to find my genes. Of course, it's hard to avoid looking at the overall plot -- and picking out some fortuitous patterns.

What immediately jumps out are the three semi-blank vertical zones (on the original you can spot a fourth very thin one convincingly in the original; it's vaguely there in the PNG shown here). What are these? Take a guess before reading below.


The big one are all genes starting with "Olf" -- the olfactory receptors. This is a large subfamily of type I G-protein coupled receptors (GPCRs) whose discovery netted a Nobel Prize. In general, these are expressed solely in the olfactory epithelium, but a little more on that later.

The thin line to the left of it has genes starting with Mirn -- micrornas, which this particularly sequencing effort wasn't very tuned for. The next one to the left has genes starting with Ig -- immunoglobulin genes. Since B-cells are not one of the samples, low expression there is no shocker. The very thin line to the right of the Olf cluster which you might not see all start with Vr1 -- the vomeronasal receptors, another bit of specialized GPCRs involved in pheromone recognition.

Of course, especially having an interactive display, you can find other patterns. A block of genes starting with Mrp have very similar, high expressions in all three tissues -- the mitochondrial ribosomal proteins. A clump enriched for names starting with Psm shows a similar pattern -- the proteasome subunits.

I don't recommend spending a lot of time doing this analysis -- the visual cortex is too good at picking up patterns & clearly gene names were not picked to make this a great way to find biology. But it is mildly fascinating.

One further note. While the Olf cluster has a lot of low expression, it isn't devoid of expression (below; ignore the sides as I'm still learning how to quite get the boundaries set precisely in SF). Furthermore, some of the same genes are seen in all three samples. Now, this could be erroneous due to improper fragment mapping or some other transcriptionally active gene that overlaps these, but I think we should also be open to the idea that some of the olfactory receptors may have been co-opted for other purposes. After all, if there is a battery of diverse proteins with a spectacular range and sensitivity for different compounds, why wouldn't some be used for something other than exploring the environment?

No comments: