Tuesday, April 21, 2009

Is Codon Optimization Bunk?

There is a very interesting paper in Science from a week ago which hearkens back to my gene synthesis days at Codon. But first, some background.

The genetic code (at first approximation) uses 64 codons to encode 21 different signals; hence there are some choices as to which codon to use. Amino acids and stop can have 1,2,3,4 or 6 codons in the standard scheme of things. But, those codons are rarely used with equal frequency. Leucine, for example, has 6 codons and some are rarely used and others often. Which codons are preferred and disfavored, and the degree to which this is true, depends on the organism. In the extreme, a codon can actually go so out of favor it goes extinct & can no longer be used, and sometimes it is later reassigned to something else; hence some of the more tidy codes in certain organisms.

A further observation is that the more favored codons correspond to more abundant tRNAs and less favored ones to less abundant tRNAs. Furthermore, highly expressed genes are often rich in favored codons and lowly expressed ones much more likely to use rare ones. To complete the picture, in organisms such as E.coli there are genes which don't seem to follow the usual pattern -- and these are often associated with mobile elements and phage or have other suggestions that they may be recent acquisitions from another species.

A practical application of this is to codon optimize genes. If you are having a gene built to express a protein in a foreign host, then it would seem apropos to adjust the codon usage to the local dialect, which usually still leaves plenty of room to accommodate other wishes (such as avoiding the recognition sites for specific restriction enzymes). There are at least four major schemes for doing this, with different gene synthesis vendors preferring one or the other

  • CAI Maximization. CAI is a measure of usage of preferred codons; this strategy tries to maximize the statistic by using the most preferred codons. Logic: if these are the most preferred codons, and highly expressed genes are rich in them, why not do the same?

  • Codon sampling. This strategy (which is what Codon Devices offered) samples from a set of codons with probabilities proportional to their usage in the organism, after first zeroing out the very rare codons and renormalizing the table. Logic: avoid the rare ones, but don't hammer the better ones either; balance is always good

  • Dicodon optimization. In addition to codons showing preferences, there's also a pattern by which adjacent codons pair slightly non-randomly. One particular example; very rare codons are very unlikely to be followed by another very rare codon. Logic: even better approach to "when in Rome..." than either of the two above

  • Codon frequency matching. Roughly, this means look at the native mRNA and its uses of codons and ape this in the target species; a codon which is rare in the native should be replaced with one rare in the target. Logic: some rare codons may just help fold things properly

A related strategy worth mentioning are special expression strains which express extra copies of the rare tRNAs.

There is a lot of literature on codon optimization, and most of it suffers from the same flaw. Most papers describe taking one ORF, re-synthesizing it with a particular optimization scheme, and then comparing the two. One problem with this is the small N and the potential for publication bias (do people publish less frequently when this fails to work?). Furthermore, it could well be that the resynthesized design changed something else, and the codon optimization is really unimportant. A few papers deviate from this plan & there has been a hint from the structural genomics community of surveying their data (as they often codon optimized), but systematic studies aren't common.

Now in Science comes the sort of paper that starts to be systematic

Coding-Sequence Determinants of Gene Expression in Escherichia coli
Grzegorz Kudla, Andrew W. Murray, David Tollervey, and Joshua B. Plotkin
Science 10 April 2009: 255-258.

In short, they generated a library of GFP variants in which the particular codon used was varied randomly and then expressed these from a standard sort of expression vector in E.coli. The summary of their results is that codon usage didn't correlate with GFP brightness (expression), but that the key factor is avoidance of secondary structure near the beginning of the ORF.

It's a good approach, but a question is how general is the result. Is GFP a special protein in some way? Why do the rare tRNA-expressing strains sometimes help with protein expression? And most importantly, does this apply broadly or is it specific to E.coli and relatives?

This last point is important in the context of certain projects. E.coli and Saccharomyces have their codon preferences, but if you want to see an extreme preference, look at Streptomyces and its kin. These are important producers of antibiotics and other natural product medications, and it turns out that the codon usage table is easy to remember: just use G or C in the 3rd position. In one species I looked at, it was around 95% of all codons followed that rule.

This has the effect of making the G+C content of the entire ORF quite high, which engenders further problems. High G+C DNA can be difficult to assemble (or amplify) via PCR and it sequences badly. Furthermore, such a limited choice of codons means that anything resembling a repeat at the protein level will create a repeat at the DNA level, and even very short repeats can be problematic for gene synthesis. Long runs of G's can also be problematic for oligonucleotide synthesizers (or so I've been told). From a company's perspective, this is also a problem because customers don't really care about it and don't understand why you price some genes higher than others.

So, would the same strategy work in Streptomyces? If so, one could avoid synthesizing hyper-G+C genes and go with more balanced ones, reducing costs and the time to produce the genes. But, someone would need to make the leap and repeat Kudla et al strategy in some of these target organisms.

Wednesday, April 15, 2009

Sequencing's getting so cheap...

Here's a decidedly odd gendanken experiment which illustrates what next-gen sequencing is doing to the ocst.

A common way of deriving the complete sequence of a large clone is shotgun sequencing -- the clone is fragmented randomly into lots of little fragments. With conventional (Sanger) sequencing these fragments are cloned, clones are picked and each clone sequenced. By using a universal primer (or more likely primer pair; one read from each end), a lot of data can be generated cheaply.

If you search online for DNA sequencing, a common advertised cost is $3.50 per Sanger read. This probably doesn't include clone picking or library construction, but we'll ignore that. Read lengths vary, but to keep the math simple lets say we average 500 nucleotide reads, which from my experience is not unreasonable, though very good operations will routinely get longer reads.

So, at that price and read length it's $7.00 per kilobase of raw data. For shotgunning, collecting 10X-20X coverage is quite common and likely to give a reasonable final assembly, though higher is always better. At 10X coverage, that means for each 1Kb of original clone we'll spend $70.00.

Suppose we have an old cosmid -- which is about 50Kb of DNA including the vector. So to shotgun sequence it with Sanger sequencing, if building & picking the library were free, would be around $5200 for 15X coverage. Pretty cheap, right?

Except, for a measly $4700 you can have next gen sequencing of it (and that actually includes library construction costs). 680Mb of next gen sequencing -- or 1172X coverage. Indeed, if you left the E.coli host DNA in you'd still have well in excess of 100X coverage of E.coli plus your cosmid. So if you had multiple cosmids, you could actually get them sequenced for the same price, assuming you can distinguish them at the end (or they just assemble together anyway)!

Sequencing so cheap you can theoretically afford 99% contamination! Yikes!

Of course, it's unlikely you'd really want to be so profligate. Rather than resequence E.coli, you could pack a lot of inserts in. But it does underline why Sanger sequencing is quickly being relegated to a few niches (for example, when you need to screen clones in synthetic biology projects) & the price of used capillary sequencers is reputed to going south of $30K.

Sunday, April 05, 2009

Two Myeloma Patients

TNG and i closed out the ski season a week ago. It's some great time together, but it also ends up being at times a bit of a solitary activity, leaving lots of time to think. Sometimes it's when he's in a lesson, but in general skiing is contemplative for me. It needs to be; if I think too hard about my technique I end up crashing spectacularly. I guess when it comes to skiing, I'm a Taoist.

Ideally, I'm thinking about beautiful scenery or admiring TNG's developing technique. But other thoughts invariably intrude, and more than a few times I find myself pondering multiple myeloma, as on a ski trip last year I met the second myeloma patient I ever knew.

For the last several years at Millennium, myeloma occupied a lot of my time. Because myeloma was the first disease where Millennium found success, this was natural. It was also two pronged. One goal was to better understand Velcade in myeloma to further develop the drug in that disease, such as going for first line treatment. But it was also seen as an important opportunity to learn how the drug works, so that intelligent decisions could be made about other cancers.

At quarterly company meetings there were often myeloma patients onstage to tell their story. One that particularly stuck in my mind was an oncology nurse who developed the disease, tried Velcade and almost immediately switched to something else; she experienced the full brunt of peripheral neuropathy while on Velcade and could tolerate it. In some ways this seems like a curious choice to inspire your troops, but it did exactly that. We had done good things, but needed to do better. And most people came out of those meetings pretty charged up.

However, these were big presentations on stage, not face-to-face meetings. Even though I occasionally got to rub shoulders with some of the clinical giants of the field, I never met any patients. Not surprising, but somewhat noteworthy.

Last year we were away in New Hampshire for a ski weekend & I struck up a conversation with a group in the lobby. Somehow, it arose that one of their number had cancer, and I couldn't help but ask what sort & it turned out it was myeloma. As is common, someone who should have been enjoying their golden years was instead faced with this dread disease.

Myleoma most commonly strikes late in life. Myleoma arises in most, if not all, cases when a DNA rearrangment occurs within a cell which creates antibodies. Certain rearrangements are necessary for the correct creation of antibodies; these alterations lie at the heart of the system for creating a wide array of antibodies to defend against a wide array of invaders. But sometimes the cut-and-paste glues the wrong two things together, and that can drive a myeloma. Myleoma shows up most commonly late in life. Perhaps this is because the switching machinery loses its edge as life goes on, or perhaps it is just that eventually the wrong number comes up on the immunologic dice.

My chance meeting in that lobby was particularly poignant as it had not been long before that I had met my first myeloma patient, and that was no random stranger. Every year growing up the family would travel west to see my grandparents in Kentucky, and in one direction or the other we would stop by my aunt and uncle in Ohio. My cousins are much older than I, so it was often just my aunt & uncle and my family. With no children to play with, I didn't play a lot of board games there. But I had a lot of fun, as my uncle took me to the Reds or his garden patch or to see a train. He'd murder me in croquet. He took me to the print shop at his high school & show me how to print up a bunch of notepads. In later years, I'd feel humble after failing to explain to him what I did for a living, realizing I had slipped deep into the land of jargon. And he'd try to convince me that no bumpkin from AVon could have written those plays; much more likely they came from the Earl of Oxford.

Eventually, I flew the nest and I no longer saw them on an annual schedule, but he never missed a family wedding and I even made it to one family reunion. I'd avidly read his Christmas letter to catch up with the rest of the clan. Of course, you couldn't believe everything in it, as he was a notorious prankster. Yes, those birthday checks with the crazy name were real ("Fifth Third Bank" -- who's going to believe that?), but he had not been truthful about his WW2 service -- the Army probably doesn't even have dedicated mess kit repair units. No, he actually was a decorated signalman. Only once did he tell a story that didn't happen stateside; it is more than a little guilt for me that I can't remember any details. It wasn't that I wasn't listening, but somehow it didn't stick.

So it really hit home when I found out that this great man, who had given so much to me and others (he was recorded weekly reading for the blind) had been diagnosed with myeloma. It seemed a bit ironic that now that I had a strong personal motivation, I was no longer working in the field. But I did have a long phone chat with him & tried to be useful, though he had been well briefed by his doctor and there wasn't a lot for me to do. I mentioned things like stem cell transplants, and he remarked that he was eighty four, and while he wasn't going to give up there were limits to what he would do; life quality was important.

A goal of modern oncology is to have a patient die with their disease, not of their disease. I do not know how to score this case. About a month and a half before our ski trip a cerebral hemmorhage felled my uncle. Was this myleoma's fault? Thalidomide's? Or a not unlikely result for an elderly american in generally good shape? We cannot cheat death forever, and something must end life. On the other hand, in no way could myleoma be given a free pass -- it certainly gave him undeserved misery near the end.

About a month and a half after the ski trip, I attended a very nice memorial service for him, where dozens of his former students turned out to testify how he had changed their lives. We learned things we never knew about him (he played the tuba?) and remembered the good times.

Whenever I think about myeloma now, I can't help but remember him. I also remember that patient I met in the hotel, and sometimes I still can feel the wetness of his parting friendly gesture on my hand. I didn't ask what medication he was on, but I can assume it wasn't Velcade or Revlimid. Might he been on thalidomide? If so, do standard poodles need to go through STEPS?

Saturday, April 04, 2009

Too Many Closings

These are dire economic times, with the signs all around. In the town where I live, several stores have closed in the town center -- and my favorite imported goodies store has frighteningly bare shelves & a nearly empty cheese cooler. I fear the worst.

On a much bigger scale, today's Boston Globe carried a headline that the same newspaper may close unless significant labor concessions are made by its unions, confirming previous speculation that the Globe was hemorrhaging money from its owner the New York Times. This week marked another round of cutbacks in the newsroom, and it seems about every 6 months or so another redesign occurs to attempt to hide (and cope with) the shrinking number of pages.

One of those redesigns has been the elimination of a separate business section, with instead the business section contained within the Metro section -- they are physically, but not logically merged. And as many may know, yesterday's had an obituary for my recent employer, which also noted the recent or imminent demise of several other biotechs.

It should be noted that the Globe seems a tad slow on the news. Codon, of course, unloaded the majority of its staff two weeks ago. Okay, nobody squealed loudly. It is a bit more striking that the Globe article stated that the ultimatum to the unions had been delivered Thursday -- how could nobody at the newspaper been tipped off to that!

The possibility of losing the Globe is very sad too me, as I truly have newspaper in my blood. No, I don't mean my family has a history of careers in the industry (though we do seem to dabble in it); I mean I've been reading the newspaper since I can remember, so I've certainly assimilated a good deal of into my cellular structures! I too dabbled in the industry, delivering for one paper (which ended operations shortly after I quit) and doing high school sports photography and reporting for two others (one of which also appears to be bust). I also edited my high school's newspaper, so I can take a tiny claim to once being an ink-stained wretch (is it possible to be stained with bits?). All through college and beyond, I've always had a subscription to the daily paper. While various deficiencies in local delivery have in recent years tested my loyalty, I still subscribe. Perhaps not much longer -- but not by my choice.

I do want to allay any concerns that this particular enterprise might be headed for a similar fate. Fear not dear readers! While revenue has stayed completely flat, in these times that must be considered an accomplishment. Omics! Omics! balance sheet remains out of the red -- as it always has. And just think -- any future revenue would mean infinite revenue growth!

Wednesday, April 01, 2009

New DNA Service Makes Dates -- Via Dogs!

Ever notice how a couple sometimes resemble the family pet? A new startup company believes that this is the secret to dating success, and that DNA typing is a way to guarantee romantic bliss.

Date My Dog's DNA will test both you and your dog's DNA and then apply proprietary computer algorithms to find your perfect match. Dogless individuals can also be typed, though they will only be matched with someone who has registered their dog in the service.

Why should this work? According to President and CEO Jack Russell, our choice of dog is driven by fundamental personality traits. By examining the DNA, traits can be matched between dog and human. "While it is useful for purebred canines, the real power comes with mixed breeds, as you may not realize which tendencies you are keying in to", says Russell. "Just imagine", he continues, "all the painful breakups due to date-dog incompatibilities; we believe we can prevent most of these".

Can the technology be put to other uses? Vice President for Marketing K. Charles Cavalier suggests that once pre-conception DNA screening becomes routine, they plan to move into this area. Would this be eugenics hidden behind a wagging tail? Replies Cavalier: "We think each couple will choose very differently. For example, if you have two border collies you might enjoy a bright but hyperactive child. On the other hand, if you have a bloodhound you might prefer a quiet, contemplative child who likes to observe the world." Continues Cavalier "We think parent-child bonding is critical to a child's mental and social development. You've already bonded with your dog; why not leverage that bond into a better one with your child?"

Seed funding for the company has been provided by the Kaltnassnase Fund.