Genome Research has a paper detailing the Mammalian Gene Collection (MGC), and if you look way down on the long author list (which includes Francis Collins!) you'll see mine there along with two Codon Devices colleagues. This paper cost me a lot -- nothing in legal tender, but a heck of a lot of blood, sweat & tears.
The MGC is an attempt to have every human & mouse protein coding sequence (plus more than a few rat)available as an expression clone, with native sequence. Most of the genes were cloned from cDNA libraries, but coding sequences which couldn't be found that way were farmed out to a number of synthetic biology companies. Codon decided to take on a particularly challenging tranche of mostly really long ORFs, hoping to demonstrate our proficiency in this difficult task.
At the start, the attitude was "can-do". When it appeared we couldn't parse some targets into our construction scheme, I devised a new algorithm that captured a few more (which I blogged about cryptically). It was going to be a huge order which would fill our production pipeline in a expansive new facility we had recently moved into, replacing a charming but cramped historic structure. A new system for tracking constructs through the facility was about to be rolled out that would let us finally track progress across the pipeline without a human manager constantly looking over each plasmid's shoulder. The delivery schedule for MGC was going to be aggressive but would show our chops. We were going to conquer the world!
Alas, almost as soon as we started (and had sunk huge amounts of cash into oligos) we discovered ourselves in a small wicker container which was growing very hot. Suddenly, nothing was working in the production facility. A combination of problems, some related to the move (a key instrument incorrectly recalibrated)and another problem whose source was never quite nailed down forced a complete halt to all production activity for several months -- which soon meant that MGC was going to be the only trusty source of revenue -- if we could get MGC to release us from our now utterly undoable delivery schedule.
Eventually, we fixed the old problems & got new processes in place and pushed a bunch of production forward. We delivered a decent first chunk of constructs to MGC, demonstrating that we were for real (but still with much to deliver). Personnel were swiped from the other piece of the business (protein engineering) to push work forward. More and more staff came in on weekends to keep things constantly moving.
Even so, trouble still was a constant theme. Most of the MGC project were large constructs, which were built by a hierarchical strategy. Which means the first key task was to build all the parts -- and some parts just didn't want to be built. We had two processes for building "leaves", and both underwent major revisions and on-the-fly process testing. We also started screening more and more plasmids by sequencing, sometimes catching a single correct clone in a mountain of botched ones (but running up a higher and higher capillary sequencing bill). Sometimes we'd get almost right pieces, which could be fixed by site directed mutagenesis -- yet another unplanned cost in reagents & skilled labor. I experimented with partial redesigns of some builds -- but with the constraint of not ordering more costly oligos. Each of these pulled in a few more constructs, a few more delivered -- and a frustrating pile of still unbuilt targets.
Even when we had all the parts built, the assembly of them to the next stage was failing at alarming rates -- usually by being almost right. Yet more redesigns requiring fast dancing by the informatics staff to support. More constructs pushed through. More weekend shifts.
In the end, when Codon shut down its gene synthesis business -- about 10 months after starting the MGC project -- we delivered a large fraction of our assignment -- but not all of it. For a few constructs we delivered partial sequences for partial credit. It felt good to deliver -- and awful to not deliver.
Now, given all that I've described (and more I've left out), I can't help but be a bit guilty about that author list. It was decided at some higher level that the author list would not be several miles long, and so some sort of cut had to be made. Easily 50 Codon employees played some role in the project, and certainly there were more than a dozen for whom it occupied a majority of their attention. An argument could have been easily made for at least that many Codon authors. But, the decision was made that the three of us who had most shared the project management aspect would go on the paper. In my case, I had ended up the main traffic cop, deciding which pieces needed to be tried again through the main pipeline and which should be directed to the scientist with magic hands. For me, authorship is a small token for the many nights I ran SQL queries at midnight to find out what had succeeded and what had failed in sequencing -- and then checked again at 6 in the morning before heading off to work. Even on weekends, I'd be hitting the database in the morning & night to find out what needed redirecting -- and then using SQL inserts to redirect them. I realized I was on the brink of madness when I was sneaking in queries on family ski weekend.
Perhaps after such a checkered experience it is natural to question the whole endeavor. The MGC effort means that researchers who want to express a mammalian protein from a native coding sequence can do so. But how much of what we built will actually get used? Was it really necessary to build the native coding sequence -- which often gave us headaches in the builds from repeats & GC-rich regions (or, as we belatedly discovered, certain short runs of G could foul us up)? MGC is a great resource, but the goal of a complete catalog of mammalian genes wasn't realized -- some genes still aren't available from MGC or any of the commercial human gene collections.
MGC also torture-tested Codon's construction processes, and the original ones failed badly. Our in-progress revisions fared much better, but still did not succeed as frequently as they should have. when we could troubleshoot things, we could ascribe certain failures to almost every conceivable source -- bad enzymes, a bad oligo well, failure to follow procedures, laboratory mix-ups, etc. But an awful lot could not be pinned to any cause, despite investigation, suggesting that we simply did not understand our system well enough to use it in a high-throughput production environment.
I do know one thing: while I hope to stay where I am for a very long time, should I ever be looking for a job again I will avoid a production facility. Some gene synthesis projects were worse than MGC in terms of demanding customers with tight timelines (which is no knock on the customers; now I'm that customer!), but even with MGC I found it's just not the right match for me. It's no fun to burn so much effort on just getting something through the system so that somebody else can do the cool biology. I don't ever want to be in a situation where I'm on vacation and thinking about which things are stalled in the line. Some people thrive in the environment; I found it draining.
But, there is something to be said for the experience. I learned a lot which can be transferred to other settings. That which doesn't kill us makes us stronger -- MGC must have made me Superman.