Omics! Omics!: HGP Counterfactuals, Part 4: Sequencing Tech Landscape Circa 1992

In this series leading to a pair of Human Genome Project alternative histories, I've been warming up with a trio of analyses of the technology landscape. At first I couldn't decide on the order to post these in, but then it was clear to me: a progression of scale from physical maps of the genome to how to organize the sequencing of the BACs, and now to the actual sequencing technologies which were in play. In particular, how there was a very rapid evolution from 1992 (when I first was deeply exposed to this angle by attending Hilton Head) to 1997 (when I defended, but also the outcome was clear). In this time period, a large number of possible options essentially compressed to a single one: automated fluorescent dideoxy sequencing. That outcome was not clear in 1992

I'm doing this from memory; unfortunately it is difficult to double-check most of these remembrances from anything online. Most of the techs that were presented as promising at the 1992 Hilton Head meeting vanished without leaving any strong record in the published literature. The technologies could be organized on several axes, but I'll divide into Sanger methods and Maxam-Gilbert methods and exotics.

Both Sanger and Maxam-Gilbert methods rely on electrophoresis to separate a ladder of DNA by size in an electric field. Amazingly, with the correct conditions it is possible to engineer this so that molecules differing by a single nucleotide are resolved from each other, at least in certain size ranges. In a slab gel, a paper-thin acrylamide gel is used for the separation. These gels were cast between two huge (in some cases 1 meter in length) glass plates which had to be utterly clean; the process for cleaning them once disastrous for me as an undergraduate. Casting the gels was a skilled, manual practice. Finally, in the original versions of this, one plate was carefully removed without disturbing the gel, the gel dried and then placed next to photographic film. The film was exposed by the radioactivity incorporated in the sequencing reactions. That was the sequencing state-of-the-art circa 1990.

The two methods differed in how the ladder was generated. Sanger methods use polymerase to extend from an oligonucleotide primer. Each reaction contained a small amount of terminating dideoxy nucleotides; this would generate a ladder of fragments in each reaction ending at a specific nucleotide. Read the four reactions in four separate lanes, and one could construct a complete sequence. Maxam-Gilbert sequencing used chemistry to break the DNA in four base-specific ways (or at least in a way which provided the necessary information to reconstruct the sequence). These originally were quite nasty (e.g. hydrazine), and that reputation never went away, but by the early 1990s nothing more dangerous than dilute hydrogen peroxide was used. In the traditional version, the end of one strand would be labeled with radioactivity.

You can find much more detailed descriptions of these processes in Wikipedia or many biology textbooks, though Maxam-Gilbert has faded from most textbooks. I go through these details only because each of the methods that was proposed made deviations from these bases but followed largely along those lines. The approaches that broke completely, which I here call exotics, won't be touched on today; I'll explore those a bit in the second counterfactual. I justify this by the fact that none of these methods made significant contributions; most made no contribution (that could be none; I just am not confident enough in that).

One approach to solving the problem of scaling up the existing methods would be the "no change" option. This is the plan Fred Blattner at University of Wisconsin came up with: he would sequence the E.coli genome using radioactive Sanger sequencing. To make things more interesting, he'd do this as coursework for undergraduates. When progress by this approach was slow, James Watson notoriously had an ABI fluorescent Sanger instrument shipped to Wisconsin. According to the lore, Blattner simply refused to open his "gift", though eventually he switched over (I think it was as two other groups were closing in on lab E.coli strain genomes). Wins points for keeping sequencing low capital cost, as those glass plates and power supplies really weren't expensive, but not much hope for substantial reductions in cost.

Two groups were trying twists on Maxam-Gilbert sequencing. Instead of taking an individual DNA and end-labeling it prior to the destructive chemistry, these methods took a pool of unlabeled DNA and digested it with a restriction enzyme. After running the chemistry and gels, the DNA from the gel was transferred onto the surface of a membrane and fixed there with UV light. By hybridizing a radioactive probe adjacent to a restriction site, a sequencing ladder starting with the restriction cut would be revealed. Strip the probe and hybridize with a new one: a new ladder appears! Hence, these methods leveraged the chemistry and the slow electrophoresis steps over many templates.

Walter Gilbert had a lab at Harvard using a method which applied this approach to entire genomes. This method, developed by George Church while a graduate student, was termed "genomic sequencing" (a moniker it did not succeed in monopolizing) Wally's group was going to sequence Mycoplasma capricolum, a bacterium with a small genome (though later Craig Venter would pick and even smaller Mycoplasma genome for his group's second bacterial genome). The genomic DNA would be split into aliquots, each aliquot digested with a different restriction enzyme and then sent through chemistry, electrophoresis and blotting. With knowledge of some starting sites (generated on an ABI fluorescent instrument), the blots could be probed to light up a ladder. Searching within that sequence for one of the other restriction enzymes would yield another probe, which would be synthesized and used to reveal yet another ladder. This cycle would repeat until the desired sequence was fully revealed. Gilbert also hoped to replace radioactivity with chemoluminescence.

This approach had several key advantages. In particular, the need to clone, pick, grow, prep, sequence and track multitudes of small insert clones was eliminated. As the sequencing hopscotched from read to read, the initial assembly was trivial; just overlap a read with the read that yielded the probe for generating that read.

On the other hand, the problems were extensive as well. Should a repeat be encountered, a probe would reveal the union of all sequencing ladders containing the probe. Conversely, the method might stumble on a long "desert" sequence lacking any of the needed restriction sites, or one that by perverse chance contained all of them in close proximity. Probe synthesis was expensive and slow; the slow part meant it was critical that a sequencing project have lots of start points so that there was always another round of probes waiting to hybridize. A catch with Maxam-Gilbert reactions is that they can be read either 5' to 3' (as Sanger reads always are) or 3' to 5' ("backwards"); woe to ye who loses track of which orientation the probe was picking up. Finally, and probably the worst problem, is that each hybridization is relatively inefficient; only one readable ladder will appear on each probing.

On the other hand, George Church and collaborators (at Collaborative Research Inc, later renamed Genome Therapeutics) were using the twist called multiplex sequencing. This method generates short insert clones, but the vectors are available in twenty different versions, with each version flanking the insert with a different pair of tags. Let's call these vectors A-T; more could have been possible up to some limit of signal/noise. The tags in turn are flanked with rare restriction sites. Make twenty libraries, one in each of the vectors. Make pools with one clone of each vector type. Now subject those pools to restriction, chemical treatment, electrophoresis and blotting. By probing the blot serially with the 40 multiplex probes, each end of each clone could be read out. So long as the rare restriction site wasn't in an insert, the process would be very efficient; all lanes should have useful data along their entire length.

A vivid memory from my first or second Hilton Head meeting was our group leader presenting the initial results from using the multiplex method to sequence a set of Mycobacterium cosmids. A focus of the talk was on the efficiency of the process. At some point, perhaps the official Q&A, a key member of another team charged the microphone and angrily suggested that we were downplaying the labor involved -- look at all the names on the slide! Of course, the reality was that included a bioinformatics grad student who hadn't been involved in sequence generation as well as a lot of people who had touched the project but weren't invested in it full-time. The reality was that multiplex sequencing was quite efficient in personnel.

The only other twist on sequencing technology I remember was a group from University of Utah which intended to adapt the multiplex method to Sanger sequencing. All the same steps, just substituting Sanger dideox termination reactions for the Maxam-Gilbert destructive chemistry. Alas, that approach never seemed to take off.

The rest of all the groups were using ABI sequencers, as far as I can remember. Or perhaps a few were using other companies' similar instruments. One of the professors at Delaware had tried out DuPont's entry and proclaimed it a dud, but several other companies were competing in the space.

I wish I had some old Hilton Head program books to go by, because it would be interesting to track the falling away of the other methods. I'd guess that by 1995 only the Church/Genome Therapeutics group was using anything other than fluorescent Sanger instruments (and we weren't zealots; a tech and I filled in a bunch of gaps in the E.coli genome using PCR and Sanger; some of those gaps were actually negative in length, just that the overlaps were too short to confidently call). By the time I defended in 1997, Genome Therapeutics was starting to switch to ABI instruments.

What happened? I'll offer two explanations: the press of getting things done and network effects.

The getting things done part was a key part of every sequencing center. Most were led by biologists, who had interesting pet projects to tackle. For example, University of Oklahoma was sequencing a BCR-ABL fusion and Lee Hood's group was sequencing the T-cell receptor locus. Many groups were sequencing various model organisms on a cosmid-by-cosmid basis, such as Saccharomyces or Schizosaccharomyces or Caenorhabditis. This led to a pressure to generate results quickly, both for their own intellectual efforts and to prove that all this genome sequencing was worth doing. Plus they had informatics groups that needed exercising. These are all good things, but they produce a huge pressure to generate results now. Immediate results are not compatible with technology development. They also could create distractions; sequencing Mycoplasma by the genome walking method required starting points, but a library of cosmids (or BACs) would have the starting points already known (the vector ends). So something even slightly novel, such as multiplex Sanger, would need to show results quickly or soon be dropped. Of if not dropped, the ABI camel could nose into the tent. First a low level to feed those demanding data, then more as those demands grew and before long why keep wasting time on the development stuff when ABI is working? The demand of data generation would be expected to overwhelm all but the most dedicated technology development program.

The other driver I would suspect are network effects, the benefits derived from large systems. If some small improvement or insight on multiplex sequencing arose, it came from the Harvard/GTC team; nobody else was running the platform. But dozens of labs around the world, and not just genome centers, were running fluorescent sequencers. Even splitting that across the different platforms, its a huge effect. Some quirk seen in an ABI instrument in St. Louis would soon be checked against every other ABI in the world. ABI also generated an enormous warchest to plough into instrument and chemistry improvements, such as dideoxy terminators and capillary instruments. Further improvements in the platform led to more sales and further increases in the benefits emerging from the network.

I honestly did think through this whole series; this dive into the actual history was not just a stalling tactic. Thinking over how things were done has helped clarify my thinking of what might happen in an alternative timeline. In particular, the problems of network effects and getting things done will play significant roles in my history gaming.

Omics! Omics!

Monday, November 14, 2016

HGP Counterfactuals, Part 4: Sequencing Tech Landscape Circa 1992

No comments:

Post a Comment