Okay, a bit late with this as it came out in Science about a month ago, but it's a cool paper & illustrates a number of issues I've dealt with at my current shop.. Also, in the small world department one of the co-author's was my eldest brother's roommate at one point.
The genetic code uses 64 codons to code for 21 different symbols -- 20 amino acids plus stop. Early on this was recognized as implying that either (a) some codons are simply not used or (b) many symbols have multiple, synonymous codons, which turns out to be the case (except in a few species, such as Micrococcus luteus, which have lost the ability to translate certain codons).
Early on in the sequencing era (certainly before I jumped in) it was noted that not all synonymous codons are used equally. These patterns, or codon bias, were specific to specific taxa. The codon usage of Escherichia is different from that of Streptomyces. Furthermore, it was noted that there is a signal in pairs of successive codons; that is that the frequency of a given codon pair is often not simply the product of the two codon's individual frequencies. This was (and is) one of the key signals which gene finding programs use to hunt for coding regions in novel DNA sequences.
Codon bias can be mild or it can be severe. Earlier this year I found myself staring at a starkly simple codon usage pattern: C or G in the 3rd position. In many cases the C+G codons for an amino acid had >95% of the usage. For both building & sequencing genes this has a nasty side-effect: the genes are very GC rich, which is not good (higher melting temp, all sorts of secondary structure options, etc).
Another key discovery is that codon usage often correlates with protein abundance; the most abundant proteins show the greatest hewing to the species-specific codon bias pattern. It further turned out that highly used codons tend to be most abundant in the cell, suggesting that frequent codons optimize expression. Furthermore, it could be shown that in many cases rare codons could interfere with translation. Hence, if you take a gene from organism X and try to express it in E.coli, it would frequently translate poorly unless you recoded the rare codons out of it. Alternatively, expressing additional copies of the tRNAs matching rare codons could also boost expression.
Now, in the highly competitive world of gene synthesis this was (and is) viewed as a selling point: building a gene is better than copying it as it can be optimized for expression. Various algorithms for optimization exist. For example, one company optimizes for dicodons. Many favor the most common codons and use the remainder only to avoid undesired sequences. Locally we use codons with a probability proportional to their usage (after zeroing out the 'rare' codons). Which algorithm is best? Of course, I'm not impartial, but the real truth is there isn't any systematic comparison out there, nor is there likely to be one given the difficulty of doing the experiment well and the lack of excitement in the subject.
Besides the rarity of codons affecting translation levels, how else might synonymous codons not be synonymous? The most obvious is that synonymous codons may sometimes have other signals layered on them -- that 'free' nucleotide may be fixed for some other reason. A more striking example, oft postulated but difficult to prove, is that rare codons (especially clusters of them) may be important for slowing the ribosome down and giving the protein a chance to fold. In one striking example, changing a synonymous codon can change the substrate specificity of a protein.
What came out in Science is using codon rewriting, enabled by synthetic biology, on a grand scale. Live virus vaccines are just that: live, but attenuated, versions of the real thing. They have a number of advantages (such as being able to jump from one vaccinated person to an unvaccinated one), but the catch is that attenuation is due to a small number of mutations. Should these mutations revert, pathogenicity is restored. So, if there was a way to make a large number of mutations of small effect in a virus, then the probability of reversion would be low but the sum of all those small changes would be attenuation of the virus. And that's what the Science authors have done.
Taking poliovirus they have recoded the protein coding regions to emphasize rare (in human) codon pairs (PV-Min). They did this while preserving certain other known key features, such as secondary structures and overall folding energy. A second mutant was made that emphasized very common codon pairs (PV-Max). In both cases, more than 500 synonymous mutations were made relative to wild polio. Two further viruses were built by subcloning pieces of the synthetic viruses into a wildtype background.
Did this really do anything? Well, their PV-Max had similar in vitro characteristics to wild virus, whereas PV-Min was quite docile, failing to make plaques or kill cells. Indeed, it couldn't be cultured in cells.
The part-Min part wt chimaeras also showed severe defects and some also couldn't be propagated as viruses. However, one containing two segments of engineered low-frequency codon pairs, called PV-MinXY, could but was greatly attenuated. While its ability to make virions was slightly attenuated (perhaps one tenth the number), more strikingly about 100X the number of virions was required for a successful infection. Repeated passaging of PV-MinXY and another chimaera failed to alter the infectivity of the viruses; the attenuation stability through a plethora of small mutations strategy appears to work.
When my company was trying to sell customers on the value of codon optimization, one frustration for me as a scientist was the paucity of really good studies showing how big an effect it could have. Most studies in the field are poorly done with too few controls and only a protein or two. Clearly there is a signal, but it was always hard to really say "yes, it can have huge effects". Clearly in this study of codon optimization writ large, codon choice has enormous effects.