Thursday, October 29, 2009

My Most Expensive Paper

Genome Research has a paper detailing the Mammalian Gene Collection (MGC), and if you look way down on the long author list (which includes Francis Collins!) you'll see mine there along with two Codon Devices colleagues. This paper cost me a lot -- nothing in legal tender, but a heck of a lot of blood, sweat & tears.

The MGC is an attempt to have every human & mouse protein coding sequence (plus more than a few rat)available as an expression clone, with native sequence. Most of the genes were cloned from cDNA libraries, but coding sequences which couldn't be found that way were farmed out to a number of synthetic biology companies. Codon decided to take on a particularly challenging tranche of mostly really long ORFs, hoping to demonstrate our proficiency in this difficult task.

At the start, the attitude was "can-do". When it appeared we couldn't parse some targets into our construction scheme, I devised a new algorithm that captured a few more (which I blogged about cryptically). It was going to be a huge order which would fill our production pipeline in a expansive new facility we had recently moved into, replacing a charming but cramped historic structure. A new system for tracking constructs through the facility was about to be rolled out that would let us finally track progress across the pipeline without a human manager constantly looking over each plasmid's shoulder. The delivery schedule for MGC was going to be aggressive but would show our chops. We were going to conquer the world!

Alas, almost as soon as we started (and had sunk huge amounts of cash into oligos) we discovered ourselves in a small wicker container which was growing very hot. Suddenly, nothing was working in the production facility. A combination of problems, some related to the move (a key instrument incorrectly recalibrated)and another problem whose source was never quite nailed down forced a complete halt to all production activity for several months -- which soon meant that MGC was going to be the only trusty source of revenue -- if we could get MGC to release us from our now utterly undoable delivery schedule.

Eventually, we fixed the old problems & got new processes in place and pushed a bunch of production forward. We delivered a decent first chunk of constructs to MGC, demonstrating that we were for real (but still with much to deliver). Personnel were swiped from the other piece of the business (protein engineering) to push work forward. More and more staff came in on weekends to keep things constantly moving.

Even so, trouble still was a constant theme. Most of the MGC project were large constructs, which were built by a hierarchical strategy. Which means the first key task was to build all the parts -- and some parts just didn't want to be built. We had two processes for building "leaves", and both underwent major revisions and on-the-fly process testing. We also started screening more and more plasmids by sequencing, sometimes catching a single correct clone in a mountain of botched ones (but running up a higher and higher capillary sequencing bill). Sometimes we'd get almost right pieces, which could be fixed by site directed mutagenesis -- yet another unplanned cost in reagents & skilled labor. I experimented with partial redesigns of some builds -- but with the constraint of not ordering more costly oligos. Each of these pulled in a few more constructs, a few more delivered -- and a frustrating pile of still unbuilt targets.

Even when we had all the parts built, the assembly of them to the next stage was failing at alarming rates -- usually by being almost right. Yet more redesigns requiring fast dancing by the informatics staff to support. More constructs pushed through. More weekend shifts.

In the end, when Codon shut down its gene synthesis business -- about 10 months after starting the MGC project -- we delivered a large fraction of our assignment -- but not all of it. For a few constructs we delivered partial sequences for partial credit. It felt good to deliver -- and awful to not deliver.

Now, given all that I've described (and more I've left out), I can't help but be a bit guilty about that author list. It was decided at some higher level that the author list would not be several miles long, and so some sort of cut had to be made. Easily 50 Codon employees played some role in the project, and certainly there were more than a dozen for whom it occupied a majority of their attention. An argument could have been easily made for at least that many Codon authors. But, the decision was made that the three of us who had most shared the project management aspect would go on the paper. In my case, I had ended up the main traffic cop, deciding which pieces needed to be tried again through the main pipeline and which should be directed to the scientist with magic hands. For me, authorship is a small token for the many nights I ran SQL queries at midnight to find out what had succeeded and what had failed in sequencing -- and then checked again at 6 in the morning before heading off to work. Even on weekends, I'd be hitting the database in the morning & night to find out what needed redirecting -- and then using SQL inserts to redirect them. I realized I was on the brink of madness when I was sneaking in queries on family ski weekend.

Perhaps after such a checkered experience it is natural to question the whole endeavor. The MGC effort means that researchers who want to express a mammalian protein from a native coding sequence can do so. But how much of what we built will actually get used? Was it really necessary to build the native coding sequence -- which often gave us headaches in the builds from repeats & GC-rich regions (or, as we belatedly discovered, certain short runs of G could foul us up)? MGC is a great resource, but the goal of a complete catalog of mammalian genes wasn't realized -- some genes still aren't available from MGC or any of the commercial human gene collections.

MGC also torture-tested Codon's construction processes, and the original ones failed badly. Our in-progress revisions fared much better, but still did not succeed as frequently as they should have. when we could troubleshoot things, we could ascribe certain failures to almost every conceivable source -- bad enzymes, a bad oligo well, failure to follow procedures, laboratory mix-ups, etc. But an awful lot could not be pinned to any cause, despite investigation, suggesting that we simply did not understand our system well enough to use it in a high-throughput production environment.

I do know one thing: while I hope to stay where I am for a very long time, should I ever be looking for a job again I will avoid a production facility. Some gene synthesis projects were worse than MGC in terms of demanding customers with tight timelines (which is no knock on the customers; now I'm that customer!), but even with MGC I found it's just not the right match for me. It's no fun to burn so much effort on just getting something through the system so that somebody else can do the cool biology. I don't ever want to be in a situation where I'm on vacation and thinking about which things are stalled in the line. Some people thrive in the environment; I found it draining.

But, there is something to be said for the experience. I learned a lot which can be transferred to other settings. That which doesn't kill us makes us stronger -- MGC must have made me Superman.


James said...

Sounds unfortunately like the discussions I remember from undergrad about whether it made more sense to go straight to grad school or work as a tech.

Doing the exciting work is way more fun than doing the grunt work to enable other people to do exciting research.

Hope your present employment lets you spend more time doing breakthrough stuff yourself.

Paper Research said...
This comment has been removed by a blog administrator.
Anthony said...

Just curious. What was the budget for this project? Would be interesting to know how much the easy first 1000 ORFs cost vs the difficult last 1000.

Keith Robison said...

I'm not sure what we actually charged for the MGC project, but in those days we tended to quote around $0.75/bp for short, easy stuff and $1.50-$2.00 for the long, hard stuff. Most of MGC would have been the latter.

The short, easy category probably averaged 1.5Kb, so perhaps just over $1K/gene. The long, hard stuff had a lot of 3-5Kb stuff (and some longer), so a lot of genes at as much as $10K each.

If you Google for "gene synthesis", some places are as low as $0.35/bp -- but I'll bet that's only for really easy (i.e. codon optimized) sequences. Without getting actual quotes, it's hard to know what you'll get charged -- and until it shows up you don't know when (or if) it does. Most things will show up, but it's still tricky to figure out how hard it will be to make something.

Anthony said...

Thanks. I was just wondering if everyone would have been better off if Codon Devices had been selling clones to scientists directly. Currently MGC clones go for ~$700 each. Cheaper to order a full length cDNA and PCR clone it. Buying a few is OK but in large numbers not really justifiable.

Anthony said...

P.S. Correction. $700 is the price for ORFome clones which is not necessarily the same as MGC clones (but probably the same when synthesized).

Keith Robison said...

Well, you've hit the crux of the question. MGC has cloned or built all these genes so you can get them essentially overnight -- but a lot of money went into building them & there is expense to maintaining the collection.

Alternatively, you can try to clone things yourself or have them built, but that involves some uncertainty of success and a delay in your project. Indeed, the whole reason we were building clones is that these had failed to be PCR cloned in multiple attempts.

So, is it worth investing in having a ready-to-go bank of mammalian ORFs, or is it really better to pay-as-you go, especially if you think that synthetic biology will get continually cheaper.

Anthony said...

My strategy these days is to PCR from a polyA cDNA prep and if I don't get it (usually anything over 1000bp) I order something. I was considering buying the FANTOM3 collection recently ($30,000 for 100,000 cDNAs vs the same amount for 50 ORFs) but really like you say if synthesis gets cheaper every year it's difficult to justify the investment. The individual clones will likely get cheaper too.