Tuesday, May 12, 2026

Personal Reflections on Craig Venter: Expressed Sequence Tags

Venter's big Expressed Sequence Tag - soon known as ESTs -  paper came out in June during my final summer as an intern; I photocopied it at my internship. I looked it up, and it had only a bit over 600 sequences - a small beginning to what would become an industry-wide exercise.  That was a momentous time for me, as only the past December had I pivoted from thinking I would get a degree in experimental plant molecular biology to focusing on computational genomics. Venter was not the first person to feed an RNA library into a DNA sequencing workflow without prior screening; I believe Gregor Sutcliffe holds that milestone.  But Venter did it on a much larger scale and with much more flash.  He pitched it as a way to skim the cream of the human genome when the official Human Genome Project was just getting going.  And as would characterize much of his career, it is the response of others in the community that was as much his contribution as his own work.
Venter proposed patenting the sequences, which caused him to butt heads with then chief of the U.S. Human Genome Project effort, James Watson - as well as with Health & Human Services Secretary Bernadette Healy.  Venter would leave the NIH with venture backing, be shouldered out of Human Genome Sciences by William Haseltine, and go on to found The Institute For Genome Research aka TIGR.

The idea that someone might lock down the human genome's contents within the walls of a private intellectual property garden set many to opposition - and not necessarily successfully.  An early effort funded, if I recall correctly, by the French charity Genethon roared out with great elan - and then great embarrassment.  At least some of the cDNA libraries they had chosen had used yeast DNA as carrier during library preparation. If you were screening by hybridization, then few probes would light up any accidentally cloned yeast material.  But with just sequencing en masse, the yeast was seqwuenced along with everything else.  Of course in those days there wasn't a full yeast genome yet, so the extent of the problem wasn't realized as quickly as it would be today. 

The U.S. pharmaceutical company Merck became very concerned that their freedom-to-operate would be constrained by gene patents.  So they funded the Washing University Genome Center in St. Louis to launch its own huge EST effort, with data immediately deposited in GenBank. By constantly generating potential prior art, the Merck-WashU data torrent would reduce the risk that Merck would discover some important disease gene and then find Craig Venter or William Haseltine holding the keys to the gate for pursuing it.  Or anyone else. or 

After all, just about anyone in the genomics field started huge EST programs.  Mining out interesting hits was my bread-and-butter at Millennium was much of my bread-and-butter in the late 1990s.  Turned out a high school friend was my opposite at Incyte, and a future colleague in a similar role at Curagen. All of these companies filed enormous numbers of patents - I know I had around 140 going at one time.  We claimed internally we just wanted freedom to operate, but realistically this is much as ICBMs and MAD work - if we could cudgel an opponent with our patents and they with theirs, hopefully nobody would dare launch a lawsuit.

Interestingly, important things sprung from these companies that had essentially nothing to do with ESTs. Human Genome Sciences latched onto imbrutinib, which eventually became a blockbuster in others hands (read For Blood or Money for the whole story). Incyte would go through an insect-like metamorphosis, shedding its genomics operations in California and taking the treasury, ticker symbol and name to Delaware to focus on kinases with significant success. Millennium used the money raised in the genomics bubble of '99 - at one point we had a market capitalization in excess of companies such as Biogen that had serious products - and stumbled into Velcade.  Curagen's legacy was to spin out 454 and the first commercial massively parallel sequencing instrument.

Millennium ended up abandoning most of the patents on ESTs.  There's a whole cluster on ACE2, which we pursued for many indications but never imagined what it would become notorious for - an important viral entry receptor.  A few others were issued.  But I'm unaware of anyone ever actually suing for infringement based on an EST patent - even before the Myriad decision from the Supreme Court that probably iced the whole area (I have a general prior that anyone lacking a Doctorate in Law who claims to understand the implications of Myriad is deluding themselves)

The intellectual property issues also crimped the value of the databases companies such as Human Genome Sciences and Incyte were selling.  I had a colleague who mentioned that at one point he was the only SmithKline Beecham scientist allowed to look at the HGS database - SKB management was terrified of protracted battles over whether a given drug target had come from HGS or from SKB, so they all but shut down the exclusive access to HGS they had paid so much for.  At Millennium we finally licensed Incyte on the fear what we had from public and our own effort might miss some crucial drug targets - and then spent a lot of effort tracking whether we found something with Incyte or not.  That could get complex - since we were attempting to assemble ESTs into possible transcripts, a given transcript might be part Incyte and part other - but was the non-Incyte portion sufficient to make the call that led to the target becoming interesting?

There was also the circus of companies such as Human Genome Sciences, Incyte, and perhaps some others who are escaping my memory, continually pushing the human gene estimate upwards.  Starting with the 50K number, which was apparently just an off-the-cuff remark by Wally Gilbert that became textbook material (shades of "print the legend!"), each press release seemed to rachet it up.  75K 100K 150K 200K.  Of course, this all looked silly when the genome project came in around 22K

I know on Incyte's side - I never saw HGS' database - the had a clever push in a breadth-first strategy.  After a certain point in time, Incyte stopped running the Sanger sequencing to its full read length.  We often ran to 600 or 700 bases, but IIRC the Incyte libraries were capped at 250 bases.  Which meant the could turn their sequencers over more often, generating even more reads.  Since reads, and possible numbers of genes they represented, had become bragging points, it was a very rational strategy from that standpoint. Especially since all the efforts had mined out all the low hanging fruit, so it was harder and harder to find novelty.  Of course, shorter reads are less likely to have good protein signatures and are harder to assemble correctly, but since nobody was auditing Incyte's gene count claims the latter might be seen as feature not bug.

So ESTs were important peeks at the genome, which allowed many drug discovery projects to launch sooner - though finding any success of those programs is a very challenging (and perhaps impossible) task.  The EST revolution which Venter unleashed helped fuel the genomics bubble of the late 90s.  A month's worth of EST sequencing in the largest labs of the 90s would not even be rounding error on modern RNA-Seq, but ESTs are the intellectual forebears.  

No comments: