Thursday, June 11, 2026

Craig Venter Reflections: Human Genome

Few bombshells have hit genomics like the day in May 1998 when Craig Venter and Applied Biosystems announced he would be launching Celera Genomics to sequence the human genome by a complete shotgun approach.  A couple of colleagues at Millennium had just announced they were leaving to form a contract research organization which would perform high throughput sequencing on demand - they were certainly rocked back on their heels (though not blown over; Orion Genomics remains successful to this day with a focus on palm oil genomes).  The public project was also rattled - Celera was promising to deliver a private genome much sooner.  In some ways, it was the EST brouhaha all over again - except with important twists.
The big question was whether the approach could even work?  If you shotgun sequenced the human genome would you be able to computationally assemble a useful draft, or would you end up with a hopeless tangle of nonsense and tiny sequence islands?  One of the leading lights of genome assembly at the time, Phil Green, declared it to be computational intractable - and in a paired piece Eugene Myers declared it was doable.  Myers would go to Celera to test his ideas.

The public project was a multinational consortium resting on a complex political foundation.  By this time I think there were no longer voices calling to not bother sequencing and just build a set of physical, cytogenetic, and recombination maps - though perhaps some of those were left.  Many who had urged waiting to sequence just didn't see the cost of full scale sequencing as worth the scientific haul; others thought that the project should focus on developing radical new sequencing technologies to drop the cost.  

The Celera announcement certainly goosed the sequencing phase, though it would stick to sequencing BACs and PACs which had been physically mapped into a "golden path" across each chromosome.  Something which has received renewed attention is that the original plan to ensure the golden path roughly equally sampled a set of anonymous donors was inadvertently ignored, resulting in most of the public genome coming from a single donor.

Along with the question of whether it was possible to computationally assemble a whole shotgun sequence of a genome as complex and repeat-rich as human, there was also the required patience.  With the public clone-by-clone effort run at many sequencing factory labs across the globe, bits of the human genome came into complete focus on literally a daily basis.  With whole genome shotgun data collected over many years - even with the raft of automated sequencers ABI sold their child Celera throughput was still limited - there really wasn't any unified look at the genome for years.  Plus, you couldn't actually test that key question of "will it assemble" until the whole dataset was generated.

To provide an intermediate test, Celera sequenced the Drosophila melanogaster genome.  There were public projects for Saccharomyces yeast and Caenorhabditis worms, but not fruit flies.  While only about a 20th the size of human, it was still about 30 times larger than many of the bacteria anybody had shotgun sequenced.  Failure on fruit fly would cast deep shadows on the human effort; success would boost confidence - and success it was.

How would the public project respond?  Well, as noted primarily by goosing the pace of the human project and all but erasing any "let's slow down and wait for better tech" talk.  The public project also declared that mouse would be sequenced by a whole genome shotgun.

Eventually, both sides were pulled together for a joint announcement of completion, a compromise which satisfied no one.  That grudge match may well have contributed to no Nobel Prize for the Human Genome Project.  It would be interesting, if one can (I assume it is possible) to get both the original public draft and the original Celera genome (which to nobody's surprise, turned out to be Venter's) and compared them to the best available references today.  Which assembly had more misassemblies? Which was more contiguous? And so forth.   Celera relied on getting length constraints from lambda clones and pUC clones - metersticks far shorter than the ultralong libraries used today to resolve the complexity of human genomes.  The public assembly had those BAC clones to provide constraints often several hundred kilobases in length - plus the overlap between BACs providing further information.  How much did this improve things?

It's also the case, which I covered in my counterfactuals a decade ago, that the public project was providing many additional types of data that Celera wasn't generating, which anchored the public assemblies to cytogenetic, physical, and recombination maps. But how much did users at the time really care about that?  

It would also be interesting to learn from anyone who had access to Celera's genome database at a research institute or biopharma as to how useful it was?  I know someone who was at SmithKline Beecham when they had licensed Human Genome Sciences' EST database, and was actually the only person allowed to look at it - the company had paid a lot for the access and then got the shakes over whether they'd be paying excessive royalties on all their future drugs.  We didn't license it at Millennium, but were frantically mining the public data as it emerged to try to squeeze out any possible drug targets that hadn't shown up in all the EST data we had - at that point we had public data, our own data, and the Incyte database.

Interestingly, Celera's main lasting contribution to human biomedical science had little to anything to do with their genome database.  A compound discovered in Celera's own drug discovery labs against a well known target discovered long before Celera's founding  - Bruton's Tyrosine Kinase aka BTK.  That compound would become imbrutinib and become a megablockbuster - but for completely different owners.  The book For Blood or Money tells the story in great detail.   When Quest Diagnostics bought Celera in 2011, that compound had long since gone to another owner.

We now live in an era where a complete novice can generate a much better assembly of their personal genome than any available in 2001, with only their personal kitchen as a laboratory.  Venter's vision of shotgun sequencing the genome has become the only way any genome is solved.

But long before that, Venter and much of his band had moved on from Celera to a new nonprofit, the J. Craig Venter Institute - and a new challenge.  And that will be tomorrow's topic













No comments: