Thursday, November 17, 2016

HGP Counterfactuals, Part 7: Wrapping Up

It's been interesting revisiting a bunch of now ancient history of the Human Genome Project with the goal of exploring other possibilities.  I started by considering the entire concept of alternative histories, then reviewed the construction of physical maps, strategies which were considered for sequencing the clones comprising the minimum spanning map of the genome and the actual sequencing technologies employed, then considered scenarios in which no HGP is launched or the project is given a much smaller budget and forced to focus on technology development.  Tonight, I'll close this out by trying to summarize some of the ideas that came out through this process, as well as some further thoughts on the whole exercise.  Plus some references to the two megaprojects to which the HGP is often compared, the Manhattan Project and Project Apollo.

Walking back through "the forgotten maps" was useful for reminding myself what an important role the various physical maps played, and how these maps were critical to ensuring the correctness of the final genome and for anchoring that genome to the practical maps used by geneticists (cytogenetic and genetic maps).  It also revisited the biggest setback for the HGP, the discovery that Yeast Artificial Chromosomes (YAC) maps would not be able to supply sequence-ready clones due to the tendency of Saccharomyces to recombine between repeats.  But most of all, it was a reminder that the majority of these technologies were not particularly scalable or easily replicated with other organisms.  If every new organismal genome required a similar effort, we'd have very few such complete genomes.

I assigned blame for this, and later for the compression of sequencing technologies to only automated fluorescent Sanger, on the pressure to produce results.  The human genetics community that had fought for the project wanted tangible results quickly.  Mapping technologies were really distinct from sequencing technologies, and indeed many genome centers appeared to be more mapping-focused or sequencing-focused.

Particularly at the time, mapping-focused labs would quibble with my claim that these technologies were not relevant to sequencing.  These groups tended to favor the many levels deep hierarchical sequencing strategy I deride as "map into the ground", generating minimum-spanning sets at each level so as to minimize the amount of actual sequence generation required to attain the desired final coverage.

Conversely, dropping sequencing costs made all those levels of cloning and mapping clearly cost ineffective.  Given that the goal was to sequence a genome, focusing on dropping the cost of actual sequence generation should have always been the goal.  

The idea that network effects and production demands killed sequencing innovation originated with George Church in a conversation late in my graduate student career, when it was clear that fluorescent Sanger instruments were going to win out over his multiplex method.  There was almost certainly additional envelope for improvement of the multiplex method, but with only one lab pursuing it versus dozens and dozens of fluorescent labs, even multiplex's strongest proponent could see how the wind was blowing. 

The no HGP scenario requires a large leap of imagination; this isn't a case of "what if a singular, binary event had reversed".  I really went into that exercise with no idea where it would lead; it was a bit of a Eureka! moment when I fell on the idea that Craig Venter's genomics career would have stayed on largely the same course.   It's not that he was aloof from the public project; if I remember correctly his lab generated one of the first greater than 100 kilobase segments of the human genome.  Smaller-scale network benefits for fluorescent sequencing might have slowed his pace, but I don't believe it would have stopped him.  Perhaps the one possible stop, which I didn't explore, would have been if the conditions leading to Celera hadn't appeared until the awful fall of 2001, which crashed the stock market.  Even before then, genomics stocks had started cooling off.  My assumption there still would have been a hot market for genomics stocks even without a genome project is based on Walter Gilbert nearly launching a company in the late 1980s, halted only by a different stock market crash.

For the piece imagining an HGP restricted to funding technology development, I probably strayed a bit far into trying to dredge up all the technologies that were considered (and almost certainly missing a few).  This was a favored strategy in the Church lab, but certainly not how the HGP proceeded.  It wasn't until I was mostly through writing it that it occurred to me the level of risk it would have involved; there was a danger that at the end of ten or fifteen years that no radical technology would have emerged able to do the job better than how the project was actually done.  At that point, a private genome would have been available.

The other half of that consideration is complex.  On the one hand, most of the ideas that were floated and frozen during the 1990s had new life later on.  Among the demonstrated methods conceived around the early days of the genome project are optical maps, sequencing-by-synthesis and nanopore sequencing.  Other technologies, such as electron microscopy, are seeing renewed interest.  Sequencing-by-hybridization didn't pan out, but seeded interest in the kmer algorithms used for short read sequencing.

Conversely, very few of these are capable of working without a reference genome.  Pacific Biosciences is the one proven sequencing technology capable of de novo mammalian genome assembly to a reasonable quality. Oxford Nanopore's Clive Brown is generating MinION datasets of his own genome that may soon add a second technology to that list.  BioNano Genomics can map a human genome to high resolution in a matter of days; combined with PacBio sequence data a truly de novo reference genome can be constructed.  But this is much harder with short read technologies such as Illumina, though with technologies such as 10X, iGenomX and Dovetail quite good de novo genomes can be built.  

There's also the question of the degree to which these technologies were enabled by developments outside the field, such as improvements in computing power, electronics and imaging technology.  I covered some of these in my previous piece on "Why Next-Gen Now", though slipped and forgot to consider the computing angle (James Hadfield flagged that one).

A corollary to this is that without a good, public reference genome, interest and aims of companies would almost have certainly shifted.  Short read sequencing might have emphasized even more RNA-Seq.  Mapping technologies might have received even more interest, and ideas for turning extracting sequence from these maps might have been driven to fruition.  

I started down this road in an attempt to grapple with Kuman Thangdu's questioning the value of the HGP.  Not proposing, as he emphasized, that it hasn't had impact, but rather has that impact been sufficient to justify the enormous expense of the project (e.g. Return on Investment aka ROI).  In the past for such challenges, I've usually focused on the numerator of the ROI equation.  Here I tried to think a bit about the denominator, though a bit indirectly.

Scientists don't tend to like to think explicitly in terms of ROI, but implicitly it was in many of the arguments against the project.  After all, comparing options via ROI is really an opportunity cost argument; can I get more bang for my buck?  HGP proponents engaged in horrible hype to try to get support, at least at times.  I do remember the caveat that it might take another fifty years to understand the genome, but all to often there were promises of huge changes in healthcare would rapidly accrue.

It hasn't been for a lack of trying.  In terms of changing medical practice, the HGP should have been seen as bet with a reasonable, but not guaranteed, chance of success.  Numerous factors have made that bet not quite pay off as hoped.  Much of the reality is that the genetic components of complex diseases have turned out to be enigmatic, and rarely have they proven to be amenable to `intervention with drugs.  This problem of druggability continues to bedevil efforts, though many(including my employer) continue to chip away at expanding the definition of what gene findings can be converted into therapies.

It is important to remember that the HGP was given a very large budget; the emphasis was on cutting costs enough to get the project done in the given timeframe, not in creating a reproducible and scalable method to sequence many human genomes.  The Manhattan Project and Project Apollo were run on similar schemes, with Apollo like the HGP have an explicit deadline.  I've heard it claimed that on Apollo the watchword was "the only thing you can waste is time"; the deadline ruled all.  Manhattan didn't have a specific deadline; it was at least initially believed to be a head-to-head race with the Axis powers.  Interesting fact I learned only recently: the cost of the Manhattan project to build the atomic bomb was less than the cost of the project to develop the carrier for those weapons, the B-29 Stratofortress.

Still, the genome has had some impressive wins for understanding biology, and I do remain confident that many of these will translate into improved patient outcomes.  I've long been a fan of cancer genomics, which I feel is the best way to understand the key genetics of tumors in the context of actual patients.  In a similar vein, I strongly believe rare Mendelian diseases are each an important clue into normal functioning.  I'm always intrigued by how many rare Mendelian traits, particularly tumor syndromes, will have profound effects on specific tissues despite the gene products being ubiquitously expressed.  Rapid, inexpensive genome sequencing made chasing ultra-rare Mendelian traits practical; low-cost exome and genome sequencing of huge cohorts is making it possible to find genetic knockouts across entire human populations.

As I noted in another recent piece, having a reference genome sequencing radically changes how information is organized and the types of experimental approaches which are feasible.  For example, even if super-long, super-accurate sequencing became available, Illumina machines would still have lots to do, as researchers keep devising ways to use its power to convert various types of non-sequence information, such as ribosome occupancy, transcription factor binding, RNA structure and many more, into sequence tags.  Because of all these effects, it becomes increasingly hard to find drugs whose development wasn't somehow significantly touched by the genome project.  The effects of having a complete human genome have permeated biology.  And the ability to rapidly generate reference genomes from any organism had enabled almost every organism anyone is interested in to experience a similar benefit.  

Well, that's it for now.  Somehow I pulled off a semi-coherent seven-part series on seven consecutive days.  It did help that for the first four days I cobbled together a draft of the following day's piece. I hope you've found this interesting and a bit provocative, and as always interesting contrary opinions are most welcome in the comments section.

No comments: