Friday, December 09, 2016

Siddhartha Mukherjee's The Gene, An Intimate History & the Crafting of Scientific Stories

Back in 2011,  I read and reviewed Dr. Siddhartha Mukherjee's book on the history of cancer therapy, The Emperor of All Maladies.  I liked the book, but as is my character I also listed some criticisms.  It was a very pleasant surprise to one day discover an email from Dr. Mukherjee engaging me on my points.  A real author, writing me!  Fast forward to this fall, and I had some inexplicable inertia to reading his new book, The Gene, An Intimate History.  This time he drove the process forward, asking if I'd like to read and review the book and if so could his publisher send me a copy?  Wow!  Having just finished the book, here goes the full review.

Mukherjee begins each chapter with well-chosen epigrams, and there are numerous references throughout to literature and popular culture.  While I was pondering how to tackle this review, a portion of one of my favorite books occurred to me as a useful, albeit imperfect, guide.

"You like to tell true stories, don't you?" he asked, and I answered "Yes, I like to tell stories that are true".
Then he asked, "After you have finished your true stories sometime, why don't you make up a story and the people to go with it?
Only then will you understand what happened and why"

As part of my own story-telling, I've deliberately left out the lines just before and just after this passage from Normal Maclean's A River Runs Through It.  The line immediately afterwards could constitute a spoiler (it is close to a statement of the novella's theme), but more importantly both set up the emotional punch of the section.  But I'm trying to discuss how and why we construct stories in a much less emotional sphere, so I have omitted them.  

Maclean's book is a work of fiction, but based on his own life.  I once found online an article from one of the newspapers based where the book is set, and one of the commenters was aghast that Maclean had transposed some ugly events from Chicago to Montana.  When we know a subject well, we may object to deviations from the story we expect, even if they are appropriate and serve a greater aim.  Re-telling a story always changes it, but when done well those changes help the reader to understand.

As I said, this analogy is imperfect; I am in no way suggesting Mukherjee has done anything but report facts and reasonable opinions.  His is a work of non-fiction.  But in condensing a vast amount of genetic knowledge and history into just under 500 pages, he was forced to choose what to say and what to leave out. How to order the information, what to juxtapose and what to separate.  All for the goal of trying to impart an understanding on the user of "what happened and why".  Just as I am attempting in this piece to extract bits from Mukherjee's book to illustrate the what and why of my reaction to the book.

We try to think of science as being strictly true to facts, and it should be, but the reality is that the same facts will be written very differently in a paper's introduction, a comprehensive review of a topic or a historical review.  Those last two will also be executed very differently if writing for a Trends journal versus one of the Annual Reviews series; having more or less space for a topic is a key consideration in how to treat it.  So the challenge as a reviewer is to assess how well the author has organized and selected what to report, given the constraints of the chosen medium.

 So much time and so little to do. Wait a minute. Strike that. Reverse it. 
Willy Wonka and the Chocolate Factory

Mukherjee has chosen a very ambitious task: to educate a reader in what they need to know to understand the modern field of human genetics, particularly when it comes to our ability to predict health and disease and the possibilities in front of us for gene therapy and reproductive selection.  Mukherjee early on declares some areas that simply won't be touched, such as genetically modified organisms (GMOs) in the environment.  Indeed, once we leave Mendel we leave plant genetics behind for the remainder of the book.  He is going to cover a lot of ground without delivering an unreadable doorstop.  He isn't interested in exploring or settling historical disputes; only a few are even hinted at.  At least of the examples in his coverage of biotechnology have been the subject of epic legal battles over patents (cloning, insulin, erthyropoietin, with CRISPR obviously now heating up) ; none of that is here.  Nor is this a textbook, though it would make an excellent core reading for a non-majors course.  Instead, it is intended to be read from end-to-end.  Reading it won't lead to a mastery of the topic, as there will be many holes, but rather a good basic understanding of many of the key topics.

What does he cover?  A wide range of topics are touched on, though some very lightly.  The book starts with an exploration of ancient ideas of how human form arises during embryonic development.  Before we know it, we're watching Darwin and Mendel develop their works (I've discussed Mukherjee's approach to Mendel in a previous post).  This is the portion of the book with one the deepest focus on particular individuals, and also one of the few where Mukherjee switches back-and-forth between separate narrative threads (doing so very well).  

But before long we're off to fly genetics which leads into population genetics.  Other chapters tackle the cracking of the genetic code, development of molecular cloning technology, the early days of sequencing, developmental biology, the genome project, transgenic mammals and many other topics. These are for the most part very lean, but well-built.  This never feels hurried; there is just enough color and background in each chapter to round out the main narrative.  One approach Mukherjee uses to streamline his text is to place some topics in on-page footnotes; I'll return to this topic later in the review.

All of this is framed by Mukherjee's very personal interest in the topic of genes and complex traits.  Three of his paternal relatives suffered from debilitating mental illness.  So the question of the degree to which this is inheritable is very, very near the discussion.  He also ponders the different trajectories of his identical twin aunts.  Mukherjee covers many topics in the influence of genes on behavior, particularly the mechanisms by which genes drive our physical sex and also the evidence for genes influencing our sexual preferences.  

Mukherjee also covers much of the historical and social background to the genetic science.  The rise and fall of eugenics is covered, exposing the many errors and contradictions.  Mukherjee also covers the rise of reproductive technologies in their intersection with genetic science: the emergence of prenatal screening and ultimately pre-implantation screening.  

Throughout, Mukherjee emphasizes that genes and alleles are not inherently good or bad; only by knowing the environmental context can we assess the impact of a gene.  Some mutations are disastrous in all contexts, but many more may be harmful  in some contexts and not (or even beneficial in others).  There is, as he states repeatedly, no sharp divide between nature and nurture: their intersection is omnipresent.

There are sadistic scientists who hurry to hunt down errors instead of establishing the truth - Marie Curie

While reading the book, I compiled a set of quick notes (Evernote is good for this), though I didn't start until deep into the fruit fly section.  Given my personal proclivities, only a few were notes of praise; what I like I like as a whole.  Many of the notes flagged topics or events seemingly uncovered, but many were later erased as Mukherjee had simply tackled them somewhere other than where I first expected them.  Again, there are options and choices in how to organize a scientific story, and Mukherjee has made his own personal selections.  A related sort of note flagged various footnotes, suggesting that their content either was too terse or belonged in the text.

I am happy to say that there are very few issues of fact that I will quibble with.  The unfortunate misplacement of the white gene on the Drosophila Y-chromosome rather than X could be a simple, one character typo.   He strays a bit into anachronism by describing Dorothy Hodgkin's Nobel status in the context of Rosalind Franklin being unusual as a women scientist but not unprecedented; Hodgkin wouldn't receive her award until two years after Watson and Crick took theirs.  Shotgun sequencing doesn't ever involve anything resembling a shotgun.  Introns can be huge, as Mukherjee states, but aren't (as his text suggests) typically hundreds of kilobases.   More common were more nuanced concerns, which some might say aren't worth haggling over.  

For example, one fascinating aspect of development which Mukherjee touches on is symmetry breaking in embryos, the problem of going from a single egg to an embryo that has defined axes.  He discusses in detail the case of Drosophila, in which the mother sets up very specific molecular gradients within the egg.  So in a sense, in this system symmetry never actually exists to be broken; asymmetric females generate asymmetric eggs.  But this is almost certainly not how mammals break symmetry, but I feel the text in The Gene might lead readers to think so.

Another example where I would split hairs: Mukherjee suggests that an endosymbiotic origin of the mitochondria is held by "some scientists".  A quarter century ago I wrote a long term paper on the topic, and the molecular evolutionary evidence was very solid; I'm unaware of anything seriously casting doubt on the theory.  

But if I had reviewed this in manuscript form, I think I would have spent more time arguing for expanding certain topics.  Longer footnotes, footnotes moved to the text, totally new paragraphs added.  Now, I'm well aware that length is a dangerous quantity in a book for the lay public.  One could try to cut, but I found few obvious places to snip.  Once or twice a topic seems to be introduced in detail in two different places, but that is a rarity. As I said before, this is a very lean work.  Perhaps the highly speculative few pages on origin of life work could be excised as far from the main narrative.  

Thinking of the length constraint, most of my notes I would nix.  It's a shame that Barbara McClintock gets only a brief footnote, or that there isn't a bit more on Erwin Chargaff. I've thought it interesting that James Watson was dating Ernst Mayr's daughter around the same time Watson was tangling with Mayr over the future of genetics, but not everyone thinks that is imporant color. Many of my favorite experiments are absent (Seymour Benzer's nucleotide-resolution recombination maps) or just a quick footnote (Hershey-Chase; I have a thing for experiments run with kitchen hardware).  The RNA Tie Club is mentioned, but not the whimsy that each member was assigned a nucleotide or amino acid.

But I think the ones I would fight for in particular would be items that reinforce that science has detours and the limits of human wisdom.  An example of this is the omission of Watson and Crick being unable to fit the DNA structure, until Jerry Donohue pointed out they were using the wrong form (tautomer) of guanine in their models.

Or even something simple: Mukherjee mentions apoptosis, but gives the reader no hints how to pronounce it.  Given that working scientists still argue this point, it would again illustrate in a humorous way some of the smaller challenges in science.

Or another bit from that era, Crick and his "comma-less" code hypothesis and its undoing by the "uncles and aunts" experiment.  Or Gamow and his theories that the amino acids would directly fit into the codons, so all that was needed was more modeling ala Watson and Crick.  Mukherjee's may have streamlined a bit too far here, eliminating the intellectual excitement and confusion that reigned at the time.

Perhaps the small fact that is most missed is a real doozy, at least to me.  Fred Sanger was turned down on his grant application for sequencing a protein, because the reviewers did not believe proteins had a defined sequence!  That's a huge conceptual leap made in a very short time, and an important complement to the leap to considering that "boring" DNA could possibly be the information carrier, which Mukherjee covers well.

One more in this vein: in the section on BRCA1's relevance to breast and ovarian cancer, Mukherjee calls out the common misconception that these can only be inherited from the mother.  I wish he had pointed to the studies that have suggested this misconception is (or at least at times has been) rife within the medical community.

In a few places, I wish there were a few more ties back to earlier text to help connect concepts for the lay reader.  We've already run through Beadle and Tatum's one gene, one enzyme hypothesis when we get to Garrod and maple syrup urine disease; it would be good to reinforce here that Garrod's result prefigured Beadle and Tatum.  Similarly when briefly discussing homeobox genes, the opportunity to link these back to Ed Lewis' work is missed.  Crick's prescient idea that information might flow "backwards" from RNA to DNA is mentioned there, but not when reverse transcription is described several chapters later.

Several more of my notes suggest expansion of topics just touched on, often necessitating moving a footnote into the main text.  For example, Mukherjee footnotes neutral theory without using the term.  A longer exposition would have fit well into his theme that alleles can only be considered in the context of the environment; many alleles are simply neutral.  The misconception that most mutations are deleterious is pervasive, particularly by persons who are vocal in expressing their misunderstandings of evolutionary theory.

The treatment of mammalian sex chromosomes is performed in two separate installments, each of which I wish had gone a little longer.  On the gene content section, never mentioned is the pseudoautosomal region which allows pairing with the X.  In the section on X-inactivation within the chapter on epigenetics, the existence of genes which escape inactivation isn't mentioned.  To me, these aren't just chromosomal trivia, but interesting complications arising from nature.

Mukherjee ignited a small firestorm this spring by publishing a New Yorker article on epigenetics.  Largely he became a magnet for the antagonists within that messy space to attack their opponents.  In The Gene, delivered to the publisher before that kerfuffle, he certainly handles epigenetics very cautiously and decries the hype in the field.  Still, while I don't feel properly armed to confront Mark Ptashne (who was one of the critics in the blow-up), I'm surprised Mukherjee didn't marshal more of the evidence that histone marks have very real effects on gene expression.  Certainly the space of oncology, Mukherjee's specialty, has seen the successful or promising development of histone deacetylase inhibitors and histone methylase inhibitors.  There's also the evidence for certain cancer mutations ultimately operating by inducing histone marking changes.

Perhaps more surprising for Mukherjee not to cover are the known examples of alleles for which imprinting alters their expression in a clinically relevant manner: alleles whose phenotype is driven by which parent the allele is inherited from.  Again, this would reinforce his message that there is no simple path from allele to phenotype.  He does cover the story of the "Dutch winter", the horrific famine propagated on Holland at the close of World War II, which appears to have left an echo across generations.  That's a striking example on a grand scale, but I do feel that including the small-scale syndromes that are somewhat better understood would help drive the point that there does appear to be information transmitted between generations but not strictly with A,T,C,G.

In a similar vein, I often found that my "he didn't mention X" notes were balanced by what Mukherjee did mention.  With his power as author, he selected the exemplars for concepts, which then shifts the question of did he pick the right ones?  Are the ones I leapt to been too well-worn?  When illustrating the potential for heterozygotes to have advantage when homozygotes express a terrible disease, he explores whether cystic fibrosis mutant alleles might confer some protection against cholera.  The more common example to cite are beta hemoglobinopatheis (sickle cell and beta thalassemia) or other red blood cell expressed genes for which the disease alleles show a striking geographical overlap with malaria.   Textbooks particularly love this example since it ties to an illustration of that overlap; Mukherjee eschews images (other than some information flow diagrams) so that edge isn't present.

Most times I understand why text is in a footnote, but there are a few odd examples.  In particular, once a footnote in a smaller font takes up a third or more of a page, perhaps it really belongs in the text.  So a long footnote on alternative schemes of sex determination in reptiles and fish (which alas, doesn't mention that some fish don't always keep one sex their entire life) seems like it should have made the main text.  Perhaps even more surprising is a huge footnote late in the book specifically around Mukherjee's framing device of his family's history of mental illness.

Perhaps the biggest surprising omission is any coverage of the fact that many humans are perfectly happy despite having two defective copies of specific genes.  This again would drive home the message that "broken" genes aren't necessarily bad.  We all, along with our ape cousins, possess broken copies of the gene required for vitamin C synthesis.  If you have O blood type, then you lack any functional allele for a specific glycosyltransferase.  Perhaps most striking is the emerging story on PCSK9, in which null mutants may be more fit to live in the presence of Western diets.

Mukherjee briefly mentions the concept of "junk DNA" a few times, but never ventures boldly into this minefield.  I wish he had, as there are some interesting observations often lost in the dust-ups in that space.  For example, consider the notorious Fugu pufferfish, which if prepared incorrectly can kill the diner.  It functions with a genome far smaller than many other fish, indeed only about 10% the size of human.  Yet it is clearly a complex creature, and furthermore in many cases Fugu genes are laid out with the same intron architecture as human, simply with smaller introns.

Depending on your tastes, you may find other topics lacking in coverage.  Absent is coverage of the issue of genetic discrimination or potential problems from the areas of medical insurance, health insurance or employment. Given that Mukherjee is interested in so many other social impacts of genetics, such as how reporting discovery of a "gay gene" can change attitudes towards homosexuality, this is surprising.  The fact that pre-natal genetics is being used illicitly for sex selection in India and China is mentioned; the controversial topic of whether to ban abortions based on genetic testing is not.  Nor is there mention of the feats of some with genetically defined conditions, such as Down Syndrome or achondroplasia, that their community will be ultimately extinguished by prenatal testing.

What we should admire is the acute fulfillment of the unspoken assumptions, the smooth harmony of the whole activity, which only become evident in the final success - Carl von Clausewitz

I hope that all these points don't dissuade you from reading the book or recommending it to friends (or enemies).  None of these flaws or omissions are fatal, and the book as a whole is very good.  I certainly would have never had the stamina to research it or write it.  If you are interested in a specific era of genetic history, then there is likely to be a better book to cover that era: Invisible Frontiers for the race to clone insulin or Eighth Day of Creation for the solving of the genetic code.  But I am unaware of a book for the lay public which covers so well the grand sweep of biology and genetics.

A final thought.  It would certainly be unfair to complain that the book didn't cover something that happened after publication.  But it is informative to ponder how easily such events would drop in.  The idea of writing a human genome from scratch was contentious news this summer, splitting the synthetic biology community.  While this is not currently technically feasible, it would follow many of Mukherjee's other examples of technologies first envisioned, then executed in remarkably short order (cloning, transgenics, gene therapy).  If this had come out just before the penultimate draft of the book, I suspect Mukherjee would have been able to nearly effortlessly insert it.  That is to me the ultimate test of a scientific story told on a still changing field, and The Gene passes it.

No comments: