The question that occurred to me is how similar are dogs and humans precisely where oncogenic mutations happen in human. Dogs get cancer too -- I've known two dogs lost to cancer -- but of course their pattern of environmental insults is likely to be different. But how different are the actual hotspots for mutations? In other words, can we expect to see the same oncogenic mutations in dogs as humans or do species-specific differences prevent this?
So I pulled a mutation table out of one of the TCGA solid tumor publications and slurped out all missense or nonsense mutations with 10 or more occurrences (an arbitrary cutoff). That gave me 64 different recurrent mutations (with 45 positions represented total), which included a lot of familiar faces: APC and BRAF and IDH1/2 and KRAS and others. I aligned those proteins to dog and pulled out the human and dog codons. Ran into my first snag there -- the table used Ensembl transcripts without version numbers and that accession for BRAF now has a bizarre 40 amino acid insertion in it (which is, NOT COOL to have a protein translation change wildly between accession versions). Once a hack was put in for that, I had my 45 codon pairs mapped.
Three quarters -- 34/45 (76%) of the codons are the same. There are two cases in which the amino acid in dog is different than human -- both in ZNF814 and both non-conservative changes of Asp to Gly -- curiously both positions have Asp to Glu mutations in TCGA, which might be conservative. So maybe those aren't real.
That leaves 9 cases where a different synonymous codon is used in dog than the one in human. The original idea was to look for cases in which codon choice might change the available amino acids by a single mutation. There are two cases in which the human codon can be mutated in one step to the stop codon seen in the TCGA data but the dog codon cannot . CDKN2A has R58 and R80 encoded by CGA but in both cases they are CGC codons in dog. So human can mutate by transition to TGA but dog cannot.
The one other codon choice which might alter dog mutation spectra is at TP53 codon 273: it is CGG (Arg) in human which can go to TGG Trp; the dog AGG could also mutated to Trp but that is now a transversion rather than a transition, so less likely.
All the other cases cover cases where the third position is neutral with respect to which amino acid: PIK3CA Arg 88 mutates to Gln, and that's just as probable with human CGA as dog CGG. EGFR Gly 553 mutates to Val; that's unaffected by the choice of third position nucleotide.
This, of course, covers only a limited amount of the possible ways differences in the local sequence environment might influence mutation. I also didn't attempt to look at multi-amino acid changes such as the recurrent activating in-frame deletions in EGFR.
I could imagine this approach being developed into an exercise for mid-level bioinformatics students. The tools involved aren't particularly difficult and the concept is straightforward, making it a useful teaching exercise. One could imagine assigning different non-human organisms to different students to analyze, enabling the class to see how phylogenetic distance is reflected in a specific biological context.
I will sign off now, having woven a small tale -- I had hoped to find a warp in codon usage in creatures that woof, but it turns out that dog won't hunt.