Wednesday, February 28, 2007

Tinker, Tailor, Soldier, Gene.

In very old movies (or so it is said), the good guys wore white hats and the bad guys black ones. This made it easy to sort out who was who and keep track of everyone's allegiances. In introductory classes, simplifications enable nice simple classifications: these are the enzymes of glycolysis, these are the Krebs cycle, etc.

The yearning for such simplicity remains, even in the face of far more complex situations. We want to have clean lists of pro-apoptotic or anti-apoptotic genes, but the reality is that some proteins play all the angles. Depending on the situation, their splice form, their translational state, etc., the protein can either push a cell towards or away from death. Similarly, we would like to classify proteins as either tumor promoting or tumor preventing and somehow alter the balance in the cancer patients favor, often by inhibiting the pro-tumor protein with a small molecule or antibody therapeutic.

One set of such molecular target are the Aurora kinases. Aurora was originally identified in the fruit fly and shown to be a serine/threonine protein kinase (an enzyme that adds a phosphate to the hydroxyl group on Ser or Thr). Later, 3 aurora homologs were identified in mammals, and the monikers Aurora A, B & C were eventually settled on. Aurora B and C are nearly identical, which has advantages since the exact role for C has been argued about a bit; from a pharmaceutical standpoint B and C are the same target, because it is probably impossible to generate a compound targeting the proteins which can distinguish them.

A number of pharmaceutical companies, including my past employer, have small molecules in development targeting either AurA or AurB/C or both; various arguments exist as to which specificity makes the most sense, and it can probably be only sorted out in the clinic. Vertex & Merck are, the last I heard, the most advanced with their compound (in Phase II), and in a stroke of luck it turns out to target another therapeutically interesting target, Jak2.

A lot of good biology has suggested that Aurora should make a good cancer target. Some of this is from various cell culture and xenograft (human tumor cells implanted in mice) studies, but there is also good evidence from clinical samples. Polymorphisms (e.g. this report) and amplifications (e.g. this one) in aurora genes have been observed in many human tumor samples. The fact that Merck/Vertex have taken their compound into Phase II suggests that they saw activity in Phase I, as only desperate biotech companies push a Phase I oncology drug candidate forward without some hint of efficacy.

An abstract in the most recent issue of Cancer Cell looks intriguing & food for thought. This report describes finding reduced Aurora A levels in several tumor models, with the reduction coming via gene silencing or deletion. Since many tumors are null for p53 function by one hook or crook, the idea that p53- tumors might benefit from lower Aurora A activity casts a bit of a question on Aurora A-targeting drugs, which might well include Aurora B targeting drugs as the two are too similar to truly avoid touching Aurora A to some degree.

It will be interesting to find out whether in the natural progression of some human tumors there is a phase where Aurora A is tumorigenic but a later phase where it is tumor suppressing. Perhaps, like many a spy in a Le Carre novel, AurA's reward for a job well done is a carefully arranged ambush when it has outlived its usefulness.

Tuesday, February 27, 2007

Which Biotech Ben?

As a follow-up to yesterday's post about bioeconomics, those huge pools of red ink: where did the money go? After all, nobody was shredding historic portraits for mouse bedding or running high throughput screens on treasury note extracts. The money went somewhere, but where? If the biotech industry has lost a lot of money, who was making out on the deal?

Employee compensation tends to be a big ticket item, so a lot went to the biotechies (especially the executives). Scientific supply houses obviously took a good sized slice of the pie. Insurance companies get their cut. Real estate is always expensive, plus the gazillion refittings of office and lab space, which means a cut for the construction industry. Outside law firms and financial advisors (deals! deals!) make out well too. Throw in the office supply houses, computer suppliers, travel agencies, catering firms, etc. Academia did well on licenses. It would be interesting to see a tally and figure out which industry did the best on biotech.

Of course, a big chunk of the money went to other firms in the sector -- some as suppliers (Invitrogen, ABI), but more than a little to other money-losing biotechs. For example, my previous employer had deals with Incyte, Xoma, Immunogen and others. I'm sure each of them had further deals with other cash burners in the sector. Some poor dollars might have worn out going from one red ink generator to another.

If the bills animated beings rather than lifeless paper, then what figure would be on the $100 bill? Would it be Dr. Benjamin Franklin, thrilled to be going from one exciting scientific endeavor to another? Or would it be Poor Richard, constantly complaining "Has no biologist ever heard 'A penny saved is a penny earned.'?

Monday, February 26, 2007

The Bio Economy

Derek Lowe has another post about the Biotech industry's glorious pool of red ink. A number of the comments are useful to think about -- perhaps most industries go through such a long boot phase, but we forget because we watch them when they are established (and often in decline).

Thinking about biotech losses reminded me of the parallels that are often drawn between the human world and the cellular world. The molecule adenosine triphosphate, or ATP, is often described as the currency of the cellular world. This is because it is the most common driver of reactions in the cell. There are other such molecules, such as ion gradients & ATP's cousin GTP, but ATP is far-and-away the most common currency -- the dollar of the molecular economy.

There is a key difference between human currency and molecular currency, one that I confess I never can quite convince myself I understand on the human side. While there are occasional money changing operations in the cell, such as using ATP to regenerate the other nucleotide triphosphates, when ATP is used to drive a reaction the energy the currency is consumed. While sometimes that energy sets up another process, many times the chain of payment ends there. When a kinase phosphorylates a protein, there is no regain of that ATP when the phosphate is kicked off by a phosphatase. When nucleotide triphosphates are used to build DNA or RNA, that energy is used forever.

But in the human world, our dollar bills don't crumble each time we use them. I bought a pizza tonight, the pizza shop will in turn pay its suppliers and employees, who will spend the money again and again and again. Nowadays the bulk of my transactions are purely electronic (such as the pizza purchase), so there is no money to crumble. It's never been obvious to me the economic equivalents of entropy; the economy seems too close to a perpetual motion machine to be believed.

There are a few other parallels though. For example, glycolysis initially requires an investment of ATP to yield far more ATP; you need money to make money! Some molecules use clever barter strategies to avoid needing to deal with currency -- for example, DNA topoisomerases perform bond swapping maneuvers so that they can rejoin the DNA they have broken without requiring any further ATP. Money makes the world go 'round; ATP makes bacterial flagella go 'round.

Gotta go -- as they say, time is ATP!

Sunday, February 25, 2007

Challenging writing assignment

An interesting idea -- try writing this in a new manner my internet site specific. Easy? I wish! Very stilted sentences -- perhaps can lessen with practice. Limits apparent in references: Cell? Science? -- Yes! Fine English scientific paper display? Negative! Erratic grammar style? Perhaps. Self name? First and middle; lacking familial. Site name? Nein! Events? Iraq fits. Gagarin fatherland? Nyet! My allegiance fails as well.

Despite my trials, life wins with it.

Can reader infer pattern?

Challenge: Increase length and clarity vis-a-vis my attempt.

Friday, February 23, 2007

The First Tree

I had the opportunity yesterday to visit Boston's Museum of Science & there were two special treats in store for me. First, a magnificent display of the late Bradford Washburn's mountain photography. Right next door was an exhibit on the life and impact of Charles Darwin.

The Darwin exhibit focuses on his life but also touches on why his theory is so central to modern biology. There are some of his actual notebooks (of facsimiles). On one of these, I believe an original, is the first evolutionary tree -- a sketch by Darwin early in his contemplations. At the top, in jubilant exclamation, is "I think!". What an understatement!

Tuesday, February 20, 2007

Tightening the Border?

Biology is a complex subject and it is sometimes very difficult to properly track one's ignorance of the topic. If you aren't aware of that, then sometimes a remarkable result isn't quite as remarkable.

Yesterday's news contained an item reporting that Genentech's Avastin, an antibody targeting Vascular Endothelial Growth Factor (VEGF), a protein which stimulates the growth of new blood vessels (angiogenesis), shows promise for treating gliomas, a deadly type of brain tumor. A little bit of the newswire item read:

An estimated 18,000 people are diagnosed with gliomas in the United States each year, according to the American Cancer Society.

They are difficult to treat because many drugs cannot reach the brain.

What this item failed to point out is that Avastin is precisely one of those drugs which wouldn't be expected to cross into the brain!

A defining characteristic of cells is a surrounding membrane made of lipids, fat-loving molecules. Embedded in these lipids are proteins. Molecules can enter cells by two routes: either going directly through the lipid layer or being transported across by specific proteins. Lipophilic ("fat loving") molecules go easily through the membrane, but hydrophilic ("water loving") molecules go through slowly if at all on their own. In particular, anything very large or significantly charged will not pass through the cell membrane without the help of a specialized protein, a transporter.

Transporters can be classified a few different ways. Passive transporters consume no energy to move their cargo, and hence can only move things down a concentration gradient. Active transporters can consume energy to move things against a concentration gradient. Exchangers (antiporters) swap one thing for another, such as sodium ions for potassium ions, and can transport one against a concentration gradient -- so long as the other is moving down a concentration gradient. Symporters move two compounds at once in the same direction.

The junctions between cells in a tissue are usually somewhat leaky, and so compounds transiting from one compartment (say the inside of the intestine) to another (such as the interior of a capillary) can either go through the cells or around them. Many drugs are actively transported, hitching a ride on some transporter that mistakes them for their proper cargo. But others simply diffuse between the cells or across the cells separating the two compartments.

The brain is a different story altogether: a system of super-membranes and tightly welded interfaces ("tight junctions") between cells provide a strong barrier. In general, anything which gets across this Blood Brain Barrier (BBB) is moved by active transporters in the cell membranes. Large proteins, and antibodies in particular, are something the BBB keeps out.

When we are healthy, the BBB is clearly a good thing, protecting the brain from stray chemicals that might harm its delicate workings. Many drugs that would otherwise harm the brain are excluded by the BBB, which means the drug can be safe to use. But when we have brain disease, the BBB becomes a serious challenge, as many important drugs will not cross it. And again, antibodies are high on the list of excludees.

I was lucky enough to attend ASCO last summer, the unbelievably yearly U.S. confab of clinical oncologists. There are dozens of things happening simultaneously, so you can never attend everything you want to. One session I did attend was on brain tumors, and there an interesting fact came out: in many brain tumors, the BBB becomes less functional in the neighborhood of the tumor. Indeed, what at least one group was trying was to inject patients with an imaging agent which normally doesn't cross the BBB and trying to correlate the ability to light up the brain with clinical outcome.

So this suggests the explanation for the curious anomaly which the newswire item overlooked. Antibodies shouldn't work in brain tumors, but perhaps Avastin works precisely because the brain tumor changes the rules -- and because Avastin may be working exactly where the rules have changed. Even with partial BBB functional breakdown, most of the tumor may still be inaccessible to many chemotherapeutic agents. However, right where the BBB is breaking down may be where angiogenesis active (indeed, this may be part of the driver of the breakdown) -- and so Avastin, by targeting this very process, can function. The exception to one rule works precisely because of another exception to the same rule!

Thursday, February 15, 2007

A Novel Wooden Shoe

It is increasingly clear that many pathogens do not simply assault our bodies, but truly attempt to infiltrate them and co-opt normal cellular processes to their ends. A number of recent papers have described kinase-based strategies employed by Plasmodium and Toxoplasma.

Protein kinases are critical players in cellular signal transduction. By attaching a phosphate group to a serine, threonine or tyrosine (and occasionally, at least known to date, histidine) they can radically change the local structure and charge distribution of the substrate protein. This can in turn trigger intramolecular rearrangements or alter protein-protein interactions, which can cascade in numerous ways. Tapping into this network is potentially a devastating way to sabotage the host's cellular machinery.

Kinases are opposed in their action by protein phosphatases, which knock the phosphates off the proteins. In most cases, kinase and phosphatase can easily interchange the states of the protein; there can be infinite back-and-forth so long as the protein sticks around. A few cases have been found where re-phosphorylation is presented by glycosylation of a residue -- the same side chain oxygen can't have both at once. But glycosylation can also be reversed, so this so-called yin-yang regulation simply adds a third possible state. A quick PubMed search reveals at least one reported example of host phosphatases being manipulated by a pathogen, albeit indirectly.

The newest Science adds a new pathogen sabot to those uncovered previously. Instead of merely stripping the phosphate off and leaving an exposed hydroxyl for re-phosphorylation, in this example the Shigella protein is a phosphothreonine lyase, an enzyme which actually modifies the side chain, abstracting both the phosphate and a hydrogen and leaving behind a double bond (see Figure 3D). Hence, the kinases so altered cannot be re-phosphorylated -- they are now permanently disabled. Since such dead proteins have the ability to tie up functional binding partners, acting as a dominant negative, this is potentially a very effective strategy. Also striking is the fact that a phospho-amino acid lyase activity had not described previously. It is difficult to believe this is the last novel biochemical strategem we will find in a pathogen's playbook.

Wednesday, February 14, 2007

Clearing the Gene Patent Thicket

The gene patent issue, which I addressed once before, continues to boil. Derek Lowe has two good back-to-back posts (with another anticipated) on the topic, triggered by a Michael Crichton OpEd piece in the NY Times. A few weeks back there was another opinion piece in the Sunday NY Times, which Hsien Hsien Lei has covered over at Genetics & Health(NY Times articles require free registration).

There are really two classes of concerns, and concern holders, in the debate. At the one end you have the Crichtons and many others who feel that any sort of patenting of genes is improper and immoral. At the other you have a lot of people (such as myself) who believe certain gene patents are appropriate, but that there is a lot of confusion generated by the legacy of past gene patents.

I can understand some of the concern of the Crichton camp. It is true that genes are natural monopolies -- in general, one can't invent around them easily if the goal is genetic testing. On the other hand, some of Crichton's complaints are simply those that are generally levied against any intellectual property protection in biomedicine: that it creates 'unnecessary' costs and unequal access to lifesaving information. But, as last weeks approval of the MammaPrint microarray-based breast cancer diagnostic test reminded, private companies do bring important health innovations to market. Without premiums for the investors to cover the very high risk of failure, such innovations might never reach market.

As an aside, the issue of failure in biotech is nicely covered in another of Lowe's postings, though Xoma? What pikers! Only 0.75B gone through in 25 years -- that's only 0.03B/year. I'm pretty sure a certain company in Cambridge that burned off closer to 1.5B in about 12 years, and Celera must have done even better than that in terms of bucks per year.

There are some other issues to consider in this space. If patent law is altered to exclude gene patents, will it exclude multigene tests? If I make a small change in a protein therapeutic, ala Aranesp, is that patentable? Are other purified preparations of natural products, such as natural-product derived pharmaceuticals, still patentable?

For those of us who feel that gene patents are appropriate, but under well defined restrictions, the current situation is clearly a mess. During the genomics gold rush, companies flooded the patent office with applications. The general assumption was that these patents would probably be worthless -- but that nobody could take the chance that the courts & Patent Office would decide otherwise. Until one was litigated, nobody knew how things stood -- and nobody felt they could afford to wait around and potentially find themselves naked. From the regulatory ambiguity of the time sprung a gazillion patents. The paralegals used to book me for an hour at a time just to sign patent forms -- since I wrote the software that tag things as 'worth' patenting, I was a co-inventor or sole inventor on many dozen applications. Most of my applications are dead, but there is a horrible mess out there.

Now these patents would just be irritating if they only gave fodder to writers, but there is a real cost to society of them. I was at a Celtics game recently with a friend and a bunch of his buddies, several from his law firm. One specialized in biotech law and was quite confident that none of those genome era patents would hold up under legal assault. But it is that very risk of litigation that hangs a cloud over everything. If you are working on these genes, prudence says that you must review all of those patents, and perhaps worry about them even though they are junk. The same sort of uncertainty that led to these patents continues to make them a problem.

So, I would like to make the following proposal. It won't interest the 'gene patents are evil' crowd, but I will claim it would make good public policy. An organization should be set up and funded with the goal of retiring mass numbers of the gold rush patents. At regular intervals, the organization would hold a Dutch auction to buy up blocks of patents. You couldn't sell them individual patents, only large batches. Once purchased, the organization would have the patents cancelled (if that isn't available in the current law, then that would require some legislation). Or, the organization would somehow be a legal black hole for the patents, forbidden to ever sell them or defend them in court. Not only would the regular auctions slurp in patents, but they would establish a market value for the patents -- and so profitable companies might just donate blocks of patents instead of selling them to reap tax benefits.

The last thing one would want to do is create more incentive for junk patents. The regular auctions would be capped so that these patents would be selling for cents on the dollar spent to get the patents in the first place. Only patents of a certain age range would be taken, perhaps nothing younger than 5 years old. Nobody's going to make a profit on this, but for companies stuck with lots of essentially worthless patents, this is free money. But because it is delivering a value to society, by reducing the overhead imposed by all those patents, I would argue it is a worthwhile expenditure.

This approach wouldn't solve the junk patent problem, and it clearly wouldn't address the patents that biotech executives think do have value. The controversial ones will all fall in that category, as they are controversial precisely because they can transfer money to entrepreneurs. Public debate about patenting is healthy & appropriate, but let's think carefully about unexpected consequences.

Tuesday, February 13, 2007

Valentine's Reading

Since tomorrow is Valentine's Day, I was going to suggest a good book appropriate to the date. As is often the case, that book suggests some others in a chain until we finally get back to another book appropriate for the day, though unfortunately for that very reason it is not a good book.

The first book presents a small challenge. While I would never consider myself a prude, its title could potentially cause filters everywhere to flag this site as unsuitable for the younger set (I'm sure a lot of elementary school kids read the site fanatically). But, I hate to be one to change content, especially in a book's title. There's nothing actually pornographic about the book, except the cover -- but only if you have six legs & antennae. So, I will write out the title, but you will need to translate one word.

The book is Olivia Judson's Dr.Tatiana's TCTGAANNN Advice to All Creation. The book is structured as a series of letters to an advice column, letters from various creatures perplexed by misadventures in their love life. Fish who wake up a different gender, mice who are sure their mates are cheating on them, etc. While the schtick could have worn thin, I enjoyed it throughout. She uses a lot of humor, but also details the myriad of reproductive strategies found across the animal world (if I remember correctly, some bacteria slip in near the end). Since reading the book, I can't help but read a story on a novel strategy and think: That would make a great Dr. Tatiana letter. I also get warm inside thinking pondering the notion that Dr. Tatiana should be required high school biology class reading. On the one hand, the students might actually want to read the book! Even better would be the reaction of certain folk, who would be having a hard time deciding whether to be more upset about the S word or about the E word sprinkled throughout (Evolution).

Judson turns out to be the daughter of Horace Judson, whose The Eighth Day of Creation is another must read. Eighth Day describes three of the major early thrusts of molecular biology: the assault on the nature of DNA and the genetic code, the quest to understand gene regulation and the first solving of protein structures. I won't claim it is a small book (686 pages -- and not a large typeface!) or light reading, but in many places you can begin to feel the excitement those pioneers felt as they pushed forward and some of the outsized personalities of the scientists. Some biotech books capture this: Invisible Frontiers (about the early days of recombinant DNA work & the race to clone insulin) and The Billion Dollar Molecule (about the founding of Vertex Pharmaceuticals) would fall into that category; two books I read more recently (and have forgotten the titles) failed miserably -- just the facts ma'am (which has something to do with my forgetting the titles).

Eighth Day is the work of a professional author and will weigh down your backpack. For a lighter touch, both physically & intellectually, try James Watson's The Double Helix. It is, of course, a memoir and Watson was willing to say outlandish things. The opening line is a classic: "I have never seen Francis Crick in a modest mood". I got to meet Watson two summers ago at a scientific meeting (it is a great sadness I never got to meet Crick) and he is just as verbally audacious in person. But again, it does give some feel for the excitement of the time and how high feelings ran.

But finally, please DON'T read Watson's sequel, Genes, Girls, and Gamow: After the Double Helix. Perhaps with a good editor it could have been boiled down into something enjoyable to read, but I'm not sure there would be enough left. Watson spends far too much time on his social life -- and particularly his love life (egad! it's in the title!). Valentine's Day or not, the last thing I want to read is an expanded version of anyone's, even one of the towering figures of 20th century science, little black book.

Sunday, February 11, 2007

New Cancer Mutation Survey

Tonight's Advance Online Publication section of Nature Genetics contains a new study with an enormous author list (including three former colleagues of mine at Millennium) which surveys 238 oncogenic mutations in 1000 tumor samples from 17 types of cancer. This is a big study, but it should be kept in mind that this is the warm-up for grander schemes.

Alas, Nature Genetics isn't cheap & I don't have access to an electronic subscription, so I haven't read the paper. But from the Abstract, Tables & Figures (JavaScript link on the Abstract page), Supplementary Items and Nature Genetics' blog entry, one can get the gist of the story.

First, some foundation. It is important in this context to think of cancer as an evolutionary disease. Many cells acquire mutations, but only those that acquire mutations that lead to the loss of appropriate growth controls can lead to cancer. Fully progressing to a a tumor requires multiple mutations in almost certainly a stepwise fashion; the odds against all happening simultaneously are too high. Presumably one mutation gives the cell a small advantage & it proliferates. A second favorable mutation within that population leads to a new winner, which proliferates again. And so on, until a full fledged tumor arises -- and then it continues further to select for more and more aggressive variants. Chemotherapy or radiation therapy adds new selective pressures, which now enhance or reduce the fitness of various mutants. It is likely that at all times the tumor is really a population of cells with different genotypes, with constant selection for more 'fit' (i.e. more likely to kill the patient).

There are many mutations that can contribute to cancer, but this paper concentrated on point mutations which activate oncogenes. There are several reasons for this focus. First, high throughput technologies exist for screening point mutations whereas translocations can be complicated to screen (because their molecular details may be quite different between examples). Second, there tend to be a small number of possible activating mutations in oncogenes, whereas there are many ways to inactivate a tumor suppressor. The false negative rate (calling a gene normal function when it is fact abnormal) is therefore going to be much lower for oncogenes.

One focus of the paper is apparently searching for oncogenic mutations that either frequently co-occur or seem to be mutually exclusive. This is summarized in Figure 2. Why would you find such associations?

Frequently co-occuring mutations suggest that they are in some way cooperative. For example, if the tumor can result if two pathways are turned on, but not either one alone (a molecular AND gate), then an expectation is that mutations activating both pathways would frequently co-occur.

On the other hand, mutually exclusive oncogenic mutations would suggest participation in the same pathway -- if one is turned on, you don't need the other one two. For example, if they are in two branches which converge, and activation of either one will create a tumor (a molecular OR gate), then it is unlikely that both will occur. Another case would be for one mutation to be upstream of the other; if the effect of both mutations together is the same as either one alone in activating the pathway, then there would be no selective pressure for both.

Supplementary Figure 3 shows nicely how which gene is mutated strongly depends on the tumor type. This is a well-known phenomenon, but cannot be said to be well understood: why are specific tumor types so driven by particular mutations. In the tumor suppressor world it can be even more stark: why do mutations in BRCA1, which encodes a critical gene for proper DNA maintenance in every cell type, lead to tumors primarily in female reproductive tissues? This is a general phenomenon: there is a long list of tumor suppressors which have been discovered by very tissue-specific cancer syndromes yet are parts of central cellular machinery shared by all cells.

Supplementary Figure 4 gives an overview of how much of the cancers are explained, at least in part, by the mutations surveyed, and Figure 1 shows how much of each tumor is explained by each mutation. The 3D figure has some merits, but personally I would have tried to combine Figure 1 with supplementary figures 3 & 4 in one combined figure (3 might be a stretch, but S4 would fit nicely placed next to the gene axis).

For example, 100% of the pancreatic cancers surveyed had at least one mutation. Given that mutations in KRAS are very common in pancreatic cancer, this isn't totally surprising. 75% of polycythemia vera (PV), a leukemia-like condition were explained; if you go back to Supplementary Figure 3, the JAK2 column is all marked PV. This is a known association, and perhaps one of the more explainable ones (JAK2 is a key regulator of differentiation in the cell lineage that goes haywire in PV).

On the other end of the scale, only 1% of kidney or prostate cancer had a mutation. So in these tumors, something else is going on. In both cases, it is likely that mutations in tumor suppressors explain many of the cases; both tumor types are known to often be mutated in certain suppressors. There is also a big middle: 36% of breast, 50% of colorectal, 32% of lung, etc. Again, in some cases mutated tumor suppressors may be at least part of the story, but there may be other oncogenes unexamined by this study playing as well.

The future promises many more such studies. There are other technologies which can type many thousand point mutants at a time (though there may be other trade-offs; I'm not an expert on this). Ultimately, many investigators want to just sequence away; pilot studies have already been published (if you have a Science subscription, there is a nice letters firefight in the current issue on the topic). But that is a ways away; even with $1000 genomes, the mixed genetic nature of any tumor will make life challenging. But in the meantime, one can expect to see more studies such as this one, but with more mutations and more tumors. More mutations should help fill in the gaps, whereas more tumors would allow much deeper probing of co-occurring and mutually exclusive mutations, as well as detect rare mutation-tumor pairings (such as those listed in Table 1).

Saturday, February 10, 2007

Am I Repeating Myself? (corrected)

Having committted to the Week of Science I feel, well, committed. Every day a post must go out. With other presssures & poor sleep habits, the danger of quality slippping is always present. Last night, I got paraaranoid I had already done a post on multiple test corrrrrection.

There are other risks. The Director of Ergonomics (though most people still refer to her by her previous position, Director Of Genomics) sometimes feels inspired to contribute. Since the keyboard isn't exactly optimized for her digits, the result can be a mess. If I leave the computer and then come back, I might overlooook some rather strange output.

Losss of sleep might also lead to non-sequiturs, especially with careless or accidental cut-and-paste operations. With other pressures & poor sleep habits, the danger of quality slipppping is always present.

I do write well, I think. Think! I, well write. Do I? My ambitions might not be those of Teddy Roosevelt -- "A man, a plan, a canal: Panama!" -- but I do try to do things well. There's always the risk I will have writers block & nothing to say. Was the first sentence ever spoken "Madam, I'm Adam!"?

Welcome to the crazy world of repeats: your genome is full of them. When we talk about repeats there are really a bunch of related but different phenomena.

Simple repeats are the repetition of a simple pattern -- or is that repetitititition of a simple pattttttttern? Mono, di, tri, tetra & so on patterns of nucleotides are in large arrays. These probably originate through various slippage mechanisms during DNA copying -- the polymerase loses track of where it is and copies the same stretch multiple times. Simple nucleotide repeats have been used extensively as genetic markers (though SNPs have largely supplanted them), since there is variation across populations in the number of repeats AND one can measure the length of these repeats in a PCR assay. Repeats are also phenotypically relevant -- changes in repeat number can alter what is coded. The most spectacular, and devastating, versions of these are the trinucleotide repeat expansions in diseases such as Huntington's Disease. Over several generations, the normal repeat of around a dozen CAG can expand to over a thousand -- with devastating effects -- somehow the string of glutamines generated by that are a problem.

Repeats can be direct or inverted. Direct repeats are just that -- direct repeats direct repeats direct repeats. Palindromes, such as the Panama and Adam bits above, are the closest analog to inverted repeats that we have in language, but there is a difference. When a biologist talks palindromes, they mean that the forward and reverse strands are the same -- the reverse complement of GATC is GATC.

Other repeats are dispersed throughout the genome. Some of these are functional: in order to generate a lot of ribosomal RNA, there are multiple rRNA genes distributed across the genome (often in tandem arrays). Others are remnants, pieces of genes that have been copied. This is sometimes through reverse transcription of mRNA, so the new copy lacks introns and possesses a poly-A region (a simple repeat!). Most of these copies are non-functional, but occasionally the new copy can be expressed. Those that do not acquire a new job decay from mutations, until they are no longer recognizable.

Another class of repeated sequences are various transposable elements, genetic sequences which make copies of themselves. Some of these, such as Alu elements, are present in gazillions of copies. Now each copy is not identical, as most were replicated long ago and have since acquired mutations. But, certain hallmarks remain. Some of these elements replicate through an mRNA mechanism, but there are also purely DNA transposons which copy themselves within the genome.

Still other repeated elements are copies of other genomes -- such as pieces of the mitochondrial genome which have been copied over. This is the same process which moved genes out of the mitochondria and into the nucleus, though no genes have retained function after the move in a long, long time. There are also oodles of copies of defunct retroviruses.

Because of Watson-Crick basepairing, repeats can do all sorts of things. In a messenger RNA, an inverted repeat structure (palindromes) can lead to a hairpin structure, as the first repeat binds to the second repeat. Recombination between repeats, if not paired carefully, can create even more interesting effects; in the diagram below, start at 1a, trace through to the x and then end at 2b; then do the same from 2a to 1b. The - characters are for display and do not represent genes.

1a >AB-CCD> 1b
2a >ABCC-D> 2b



Viola! One repeat has expanded, while the other has shrunk.
Even more spectacular results can ensue if the repeats are on different chromosomes




-- now you have translocations of pieces of one chromosome to another. Suppose S contained the centrosome of one chromosome and D the centrosome for the other one. Centrosomes are required for correct chromosome segregation during cell division, and only one per customer. Now one chromosome has two & will be yanked apart; the other has zero and will be lost.

Here's what I get for making jokes about making mistakes: a mistake. My inverted repeat example really shows direct repeats

Recombination between direct repeats on the same chromosome can also be radical:

+--+ recombine



Now we've lost DE -- if those were critical genes, or worse the centromere (required for cell division), things are going to go haywire.

Okay, here is the correct version for an inverted repeat

>ABCDEcF> top strand
<abcdeCf< bottom strand

which can be rewritten as single strands as




Let's hope the repair is better than the problem!

All of these effects can also happen on a micro scale. Direct or inverted repeats cloned into E.coli can often trigger recombination or select for deletion of the repeats (E.coli contains far fewer repeats & is less tolerant of them), turning your beautiful plasmid into genetic hash.

There are worse things then to repeat yourself, and there are lots of ways to repeat yourself. But I'll still worry about repeating myself: With other pressssssures & poor sleep habits, the danger of quality slippping is always present...

Friday, February 09, 2007

Testing! Testing! Testing! Guaranteed method to win the lottery!

I have a guaranteed way to win the lottery, which in my excessively generous spirit I will share with the world. Each and every one of you (of legal age) will be able to go out tomorrow and buy a winning ticket. That thrill of winning, can you taste it already?

Here is the recipe, and it is simple beyond belief. Most states have a game where you pick three digits (i.e. a number between 0 and 999), with a nightly drawing for those three digits, usually televised. Before the drawing, go to your local lottery outlet. Make some friendly banter with the person at the counter, then loudly announce to everyone present that you are about to purchase a winning ticket. Now proceed to buy 1000 tickets, covering the complete sequence of values from 000 to 999. Then sit back, tune into the nightly broadcast, and enjoy your good fortune.

Wait? What's that? You wanted to actually win MONEY? Well, since such games pay off at best at 50 cents on the dollar, you are now out at least $500. But you did have a winning ticket.

Many people can see through the scheme before it is executed, but it surprising how often papers are published which fall prey to exactly the same problem. In statistics it is known at the Multiple Test problem, and dealing with it are various attempts at Multiple Test Correction.

Suppose I have a typical human microarray experiment using the Affymetrix U133 chip. I have just bought about 47,000 lottery tickets -- er, I mean probesets. Given that the number of independent samples you have are much, much fewer than this, there is a probability that you will see 'regulated' expression between your two sample populations purely by chance. Farther downstream, you take your gene list derived from your data and throw it against GO or some other set of other gene lists. You've again bought thousands of lottery tickets.

Now this is not to say these analyses are worthless; array experiments are not lotteries -- if you are careful. Good software will incorporate multiple test corrections that make your P-values less impressive sounding to account for running so many tests, just like the state lottery doesn't pay 1:1 on lottery tickets. But be careful when you roll your own stuff, or if you don't pay attention to the correction options in someone else's package. Plus, even with multiple test correction you still have junk in your results -- but you now have an estimate of how much junk.

The simplest multiple test correction is the Bonferroni correction. You simply multiply your raw P-values by the number of tests you've run. So if you ran your gene lists against 1000 categories, your impressive 0.0005 P-value goes to a quite uninteresting 0.5. Bonferroni is brutal to your results, but is a very conservative correction. It is assuming independence of your tests, which in some cases is correct -- but more often not in a genomics experiment.

However, what if your tests really aren't independent? Some of those GO categories are supersets of others, or at least strongly overlap. Some genes are strongly co-regulated no matter what the situation. Whenever there is correlative structure in the data, Bonferroni is potentially overkill.

There are a large number of other methods out there, and new ones still seem to pop up along with new terms such as Family Wise Error Rate (FWER) and False Discovery Rate (FDR).

I'm no statistician, but I've actually coded up one FDR estimation approach: Benjamini-Hochberg. Here's the recipe:

  1. Pick your alpha cutoff (0.05 is common)
  2. Sort your P-values and rank them, smallest to largest.
  3. Calculate a new cutoff for each test, with the cutoff equal to alpha x r/t, where r is the rank and t is the number of test you ran. So the cutoff for the top P-value is alpha/t (r=1) and for the worst is alpha (because r=t).
  4. Starting at rank 1, go down your list until you find a P-value that is greater than its cutoff. These are your significant hits -- you don't look below there. Because you start at the top & iterate down, it is known as a step-down procedure.

Benjamini-Hochberg has an intuitive appeal: your best value is compared against the most stringent cutoff, but after all it is your best P-value. The second one won't be quite so good, but it goes against a slightly harder standard since you've already looked at the best test. By the time you get to your last test, you are at your original alpha -- but you are now looking at your worst P-value.

A final thought. The alpha cutoff for most drug trials is 0.05, or you expect to see such results by chance about 1 in 20 times. Some trials do much better than this, but there are plenty of trials published which are right in this neighborhood. Exercise for the student: explain how this is different than buying multiple lottery tickets.

Thursday, February 08, 2007

New fangled DNA sequencing

In one of his Week of Science posts PZ Myers over at Evolgen discusses how shotgun genome sequencing works (there is also a great primer on sequencing over at Discovering Biology in a Digital World) . He explicitly covers 'traditional' fluorescent Sanger sequencing & avoids (in his words) "some new fangled techniques". I'll take the bait & give a sketch of the leading class of new fangled methods.

For a quick review, there are three general steps to Sanger sequencing (and the Maxam-Gilbert method that was developed around the same time, but has faded away).

  1. Prepare a sample containing many copies of the same DNA molecule. These will usually be molecules cloned in an E.coli plasmid or phage, but could be from PCR using defined primers.
  2. Generate from that pool a set of DNA fragments which all start at the same place and end on a specific nucleotide; this can either be in 4 separate pools or you have a different color tag denoting which letter the fragment ends in
  3. Sort the fragments by size using electrophoresis. The temporal or spatial sequence of these fragments (really the same thing, but your detection strategy probably uses one or the other) gives you the sequence of the nucleotides.

Each of these steps has serious issues affecting throughput & cost

  1. Cloning in bacteria is labor intensive, whether that labor is humans or robots. It takes time & space to grow all those bacteria, and more time, effort & materials to get the DNA out and ready for sequencing
  2. This step actually isn't so bad, except it implies the next step...
  3. Electrophoresis requires space, even if you miniaturize it; when you image your sequencing ladder, you aren't getting many basepairs per bit of image. DNA doesn't always migrate predictably; hairpins and other secondary structures can distort things

A host of companies are working on new approaches. Most next generation sequencing technologies are generally described below, a class of methods known as sequencing by synthesis. The deviations from the description and some of the variation described provide the opportunities for the many players. Only one commercial instrument, developed by 454 Corporation & marketed by Roche Molecular Systems, has published major data (as far as I know; corrections welcome). George Church's lab has published data with their kit (do-it-yourself types can build one of their own with their directions), which a commercial entity (now part of ABI) is attempting to package up and improve.

  1. Fragment your DNA into lots of little pieces & put standard sequences (linkers) onto the ends of the sequences.
  2. Isolate individual DNA molecules, but without going into bacteria. Instead:

  3. Mix dilute DNA with beads, primers (which recognize the linkers) & PCR mix. By mixing these with oil in the right way, you can turn each bead into its own little reaction chamber -- a layer of buffer (with all the goodies) over the bead and encapsulated by the oil. Many beads will have attracted no DNA molecules, and some more than one. Both of these will be filtered out later.
  4. You can now amplify these individual DNAs without the products contaminating each other. One strand of the amplified DNA is stuck to the beads.
  5. Prepare a population of beads which each originated with a single DNA molecule
  6. Strip off the oil, used up PCR buffer, and the strand not fixed to the bead. Each bead now contains a population of DNA molecules all originating from a single starting molecule.

  7. Pack the beads into a new reaction chamber which is also an imaging device. You now have 400K to millions of beads on a slide.
  8. Anneal a short oligo primer to the sequences -- probably a primer binding
  9. Interrogate the DNA one position at a time to find out what nucleotide is present.

The details, and deviations, are what set the methods apart. For example, some methods require four steps to interrogate a position -- once for each nucleotide. In effect, the system 'asks' each molecule 'is the next nucleotide an A?' In other schemes, the question is in effect 'which is the next nucleotide?' -- all four are read out simultaneously. For many of the schemes which need to ask four times per nucleotide, the different strands may not be read synchronously. For example, if the first query is 'A' and the second 'C', then all sequences starting with 'A' are read on the first query -- and any which started 'AC' will have a C read on the second query.

The enzyme running the interrogation is either DNA polymerase or DNA ligase. Most schemes use polymerase, but others use ligase with a set of special degenerate oligo pools (collections of short DNAs in which some positions have fixed sequence and others can be any of the four nucleotides).

Detection schemes vary. Here are some of them

  • 454 uses pyrosequencing, which takes advantage of the fact that when polymerase adds a nucleotide to DNA, it kicks off a pyrophosphate. A clever coupled series of enzymatic reactions can cause the pyrophosphate to trigger a light flash. Since pyrophosphate looks the same no matter which nucleotide it comes from, pyrosequencing is inherently a four-queries per position method.
  • Many methods use fluorescently labeled nucleotides, with the fluorescence read by a high-powered microscope. You then need to get rid of that fluorescence, or it will contaminate the next read. One option is to blast it with more light to photobleach (destroy) the label. For ligation based sequencing, the labeled oligo can be stripped off with heat. One group uses clever nucleotides where the fluorescent group can be chemically cleaved.
  • Some groups are using some very clever optical strategies to watch single polymerases working on single DNAs; these methods claim to be able to sequence single native DNAs with no PCR

Some other things to look for in descriptions of sequencing by synthesis schemes

  1. What's the read length? Short reads can be very useful, but for de novo sequencing of genomes long is much better.
  2. How many reads per run? For applications such as counting mRNAs, this may be much more important than read length. Read length x #reads = total bases per run, which can be astounding. 454 is claiming in their adds 400Mb per run, and Solexa (now Illumina) is shooting for 1Gb per run. Since Solexa reads are about 1/16th the length (roughly 25 vs 400), that means Solexa is packing a lot more beads.
  3. Can it reliably read runs of the same letter (homopolymers) -- some methods have trouble with these, others do quite well.
  4. What is the accuracy of individual reads? For some applications, such as looking for mutations in cancer genomes, the detection sensitivity is directly determined by the read accuracy. Other applications are not as sensitive, but it is still an important parameter
  5. Can the method generate paired end reads (one read from each end of the same molecule)? This is handy for many applications (such as denovo sequencing), and essential for others.
  6. Run time. How long is a cycle & how many cycles per run? For some applications, it may be better to get another sample going than to run out the full read length (Incyte used this approach to great effect in generating their EST libraries)
  7. Cost to buy? -- well, if you want to own one.
  8. Cost to operate? -- well, if you want to do more than just brag you own one

One interesting trend is a gradual escalation of the benchmark for showing that your next generation technology is for real. While many groups first publish a paper showing the basic process, real validation comes with sequencing something big. 454 & Church sequenced bacterial genomes at first, but Solexa reported (at a meeting, not yet in print) using a few runs of their sequencer to run a human X-chromosome.

Another way to look at it is this: Celera & the public project spent multiple years generating shotgun sequence drafts for the human (and later mouse, dog, opposum, chimp ... and now horse) genomes. To get respectable 12X coverage of a 4Gb genome, you need 48Gb of data -- or about 2 months of running a Solexa machine or 4 months on the 454 (if 1 run/day -- I'm not sure of the run times). YOW!

Of course, there are other technologies competing in the next generation space as well. Most are still sketchy but intriguing (I heard one briefly presented this evening). The current round of sequencing-by-synthesis technologies are expected to bring the price of a human genome down to a few $100K, so to reach the oft-quoted goal of $1K genomes either a lot of technological evolution will be needed -- or another technological revolution.

Wednesday, February 07, 2007

Killer Co-evolution

To close up, for now, the story I've been spinning about cancer stroma from Day 1 & Day 2 of the Week of Science, I'll address a question which may have been prompted by yesterday's item about the symbiotic relationship between tumor and cancer stroma: How does this arise? What drives the stroma into being Benedict Arnold, and what chance is there to bring it back?

At the end of last year a paper came out in PNAS that looks at this question in a clever way. A challenge for studying cancer's interaction with its surrounding cellular environment is that it is very difficult to separate the two. How can you ever be sure you are looking at pure tumor or pure stroma?

The paper solves this problem by having the tumor come from one species and the stroma from another. Mouse xenograft models were built by injecting human tumor cell lines into immunodeficient mice. After tumors formed, the tumors were excised and then disaggregated into individual cells, and these cells sorted by flow cytometry. The tumor cells have higher DNA content than the mouse cells, so a DNA stain can sort one from the other. DNA from the mouse cells was then subjected to copy number analysis.

Copy number analysis is quite the rage these days, both for oncology and for looking at normal variation in the human genome. Most papers use array comparative genomic hybridization, or array CGH, to analyze copy number variation. This paper uses the closely related method ROMA, which differs in some key details but at a very high level is very similar. In short, the fragments from the genome are probed against a microarray which has markers spaced across each chromosome; by measuring the signals (and applying a lot of corrections, still being worked out), one can infer copy number changes ranging from complete losses of chromosome pieces to extreme amplifications.

ROMA provides another layer of filtration of the human tumor cells from mouse stromal cells, as the array probes shouldn't hybridize well cross-species. Normal tissue samples from the mice were used to normalize any murine copy number polymorphisms.

From seven tumors a number of genomic alterations were observed. This reinforces previous suggestions that the tumor stroma is co-evolving with the tumor, and that these changes are permanent since the genome itself is being altered. Two genes were observed to change copy number in models built from different tumor lines, while some other genes repeated in tumors built from the same line. However, no gene was universally observed to change copy number with the same cell line, suggesting that there are multiple co-evolutionary paths for successful tumor stroma.

This paper is just an crack into the field. In particular, they did not try to correlate their results with human clinical samples. The sample size here is very small, with only a few types of tumor lines tried. The functional roles of the altered genes was not explored. It is virtually a certainty (though I have no inside info) that such studies are ongoing -- especially since the lab involved has done all three of these in other papers. Of particular interest will be to better understand the mechanism of cancer stromal cell derangement. Is it purely an evolutionary selection for living near a tumor, or is the tumor somehow actively participating in the derangement by triggering mutagenic mechanisms or providing key survival signals?

A normal role for fibroblasts is to repair wounds, and hence the formation of tumor stroma may represent a repair attempt by the body which is co-opted by the tumor. Previous gene expression studies have identified a 'wound response signature' which is correlated with clinical outcome. Interestingly, the two genes reported to be the drivers of this signature did not show up in the ROMA analysis. This also suggests another line of experiment: do these mouse stromal cells exhibit the clinical signature?

Evolution, ecology & medicine all woven together -- it would be purely fascinating, if it weren't so deadly serious.

Tuesday, February 06, 2007

Killer Symbiosis

Okay, it's now day two of Omics! Omics!'s attempt to contribute to the Just Science Week of Science blog effort; some sort of feed mischief prevented yesterday's original post on cancer cellular ecology & reserve post on re-energizing E.coli from showing up there. Here's a new attempt to get through.

One of the actor's I introduced in the ocancer cellular ecology post is cancer stroma. This is essentially scar tissue built up of fibroblasts which have been recruited to the tumor site. Is this simply a benign bodily response? A border wall being erected by one side to exclude the other? Well, there is evidence that tumor stroma is far from a passive player, but rather an active ally of the tumor.

First though, we need to review some biochemistry. The metabolism of sugar to energy in our cells requires a complex pathway of events. The system can be divided into three basic subsystems: glycolysis, the citric acid (or Krebs) cycle, and the respiratory electron transport chain. The handoff from glycolysis to the citric acid cycle is the molecule pyruvate. However, this three stage system requires oxygen; in the absence of oxygen energy production halts at glycolysis. Which means the cell needs to deal with all that pyruvate, or the system stalls.

The first biotechnologists discovered that yeast has a very interesting solution to this problem: it converts the pyruvate to ethanol. But in animals, the endpoint is instead lactic acid. It is lactic acid buildup that gives you a burning sensation from tired muscles. Obtaining energy solely from glycolysis is far less effective than going the whole scheme (about 6X if I remember correctly), so animals tend to reserve it for special occasions -- such as quick bursts of muscle activity or other occasions when oxygen can't get to the cells fast enough. As my freshman bio prof pointed out, this leads to the white meat vs. dark meat dichotomy of chickens: chickens walk all the time, so the legs operate in the aerobic regime, whereas the wings are for short flights and operate anerobically. The oxygen storage protein myoglobin contains oxygen & gives the dark meat its characteristic color.

A German doctor named Otto Warburg made an interesting observation in the 1920's that tumors routinely rely on glycolysis for their energy production. This Warburg effect, for which he was awarded the Nobel prize, has stood the test of time, though its cause is still debated. Why would tumors choose to rely on an inefficient pathway for energy production? While tumors are often hypoxic, the Warburg effect is not dependent on hypoxia. One interesting possibility was published last year (and may not yet be free): the tumor suppressor protein p53 may be a key regulator of the glycolysis vs. respiration switch. p53 is inactivated through mutation or protein destabilization in many, many tumors, and switching to respiration may be the price to be paid for getting rid of the local cop.

A paper from a year ago (free!) laid out a fascinating story using immunohistochemistry (IHC). This is a technique which uses antibodies as stains for specific proteins on slices of tissue. With IHC, you see the cellular architecture in high resolution, overlaid with the distribution of one protein. Run a bunch of antibodies on closely related samples (such as successive slices from the same biopsy), and you can build up a picture of how multiple proteins are distributed. What this study reveals is a vicious symbiosis between tumor and stroma.

What the paper shows (the key figure summarizes things neatly) is a splitting of the conventional three-step pathway between tumor and stroma. The tumor expresses a key set of genes to optimize glycolysis. This starts with a glucose transporter (GLUT1) to bring the sugar in and upregulation of an enzyme to convert pyruvate to lactate (LDH5) and a transporter capable of exporting the lactate from the cell (MCT1). A key enzyme for funneling pyruvate (PDH1) into the citric acid cycle is shut off, with a key negative regulator of PDH1 (PDK1) upregulated.

The tumor stroma cells have a complementary set of regulations. Their glucose transport system is essentially shut down, but they instead upregulate a protein to import lactate (MCT1 again; it works both ways), another to convert lactate back into pyruvate (LDH5) and again upregulate the PDH1 gateway to the citric acid cycle. The stroma cells are also turning up their carbonic anhydrase genes, which reduce the acidity generated by the tumor cells' glycolysis.

These changes are not observed in normal fibroblasts, nor are most of them present in the normal endothelial cells in the tumor's vicinity (GLUT1 is upregulated in the tumor-associated blood vessel endothelial cells). Clearly what is happening is a division of metabolic labor between tumor and stroma. The stroma is not a passive bystander, but an active quisling.

[forgot to tag this; updating]

Monday, February 05, 2007

Catching some rays

Alas, due to a disagreement between myself & Blogger as to how to timestamp things (and Blogger won), my initial post for the Week of Science has so far failed to show up there, despite my attempts to unstick it. From lemons shall come lemonade, in this case an extra post.

A key goal of synthetic biology is to transfer completely novel traits to organisms; doing so demonstrates an understanding of the trait and perhaps might be useful for something, though the latter is hardly essential.

Last week's PNAS advance publication site has a neat item (alas, requiring subscription for the text) showing how simple it can all be. By adding a single protein to E.coli the biology of E.coli is changed in a radical way.

There is something uniquely pleasurable about sitting in a sunbeam, since the plain glass window takes out the nasty UV. The Omics! Omics! director of ergonomics (responsible for preventing repetitive stress injuries by ensuring regular breaks from the keyboard) certainly likes a good sunbeam. But while the sunbeam provides me warmth and pleasure, it can't feed me.

E.coli is normally the same way; it knows not what to do with sunlight. But by stealing one little gene from another microorganism, Carlos Bustamante's group has changed that. Certain single celled organisms called archeans use light to drive a pump protein called bacteriorhodopsin. The pumping action creates a gradient of hydrogen ions, a gradient which can drive useful work. By moving bacteriorhodopsin to E.coli, the bacterium acquires the ability to generate energy from light of the correct wavelength. Indeed, because a respiratory poison was present the bacteria are now dependent on light for any energy production -- and since E.coli has a flagellum which can propel it, this energy production can be observed as light-dependent swimming.

As mentioned before, the paper requires a subscription, but the supplementary material does not. You can watch a movie of a tethered E.coli responding to light, with the false coloring indicating the wavelength being shown (red for stop, green for go of course!). Furthermore, the individual tethered cells can be treated like little machines and the forces they generate measured, from this important details of the molecular machinery can be worked out. Showy, yet practical!

[trying to push this through the Week of Science system too]

Cancer Cellular Ecology

This post kicks off my contribution to Just Science 2007. While it won't be a theme held to strictly, many of my planned entries will be about, or touch on, cancer.

Carl Zimmer had an excellent recent post on the evolutionary aspects of cancer; here I will take a stab at the cellular ecology of cancer. It is both a fascinating topic on its own, and something which later posts this week will refer back to. For this last reason, this post will also be sprinkled with teaser references to posts which will show up later in the week.

I don't remember when I first heard about cancer, but it was at a tender age. I don't remember the context either. My paternal grandmother succumbed to leukemia long before my parents met, at a time when leukemia was considered incurable -- though within a few years the first effect chemotherapeutic agents would appear. Or perhaps it was hearing about the boy a street over who died of childhood leukemia. Most certainly I knew by 2nd grade, as that is when the kindly custodian at my elementary school died of cancer. So sometime I heard the word, and I was not one to withhold questions.

The answer I got that first time, and for many times later, is that cancer is part of the body gone haywire, an uncontrolled & chaotic growth that eventually crowds out normal tissue. One analogy is that of a weed which takes over the garden. It's a very simple answer -- suitable for an inquisitive elementary school student -- and it's also the model that long held sway. But the modern view is much more complex. That complexity extends the mystery of cancer, but also offers new opportunities for treating it.

First off, the modern view is of a tumor with some internal complexity; we now believe that many, and perhaps all, malignant tumors have at least two classes of cells: cancer stem cells (more later this week) and the bulk of the tumor. But furthermore, the tumor recruits other cells to assist it. Depending on the tumor type, these could include endothelial cells to build new blood vessels (a process called angiogenesis), fibroblasts which become the tumor stroma, and immune cells which may be co-opted to provide useful signals to the tumor. There's a fascinating story of the intersection of tumor, endothelial cells & stroma that will follow later. Other interactions may depend on the tumor type.

For example, take the interaction of multiple myeloma & the bone marrow. Normal bone marrow contains a host of different cell types & interacts with the nearby bone. Normal bone is maintained by a healthy balance between two opposed cellular factions: osteoclasts which break bone and osteoblasts which build bone. Both are derived from Greek, with osteoclast being my favorite because of it's onomatopoetic root 'clastos' (to break).

Multiple myeloma results from the derangement of a plasma cell, the final stage in B-cell development (free review). B-cells are the antibody producing cells, and go through a complex series of transformations. A unifying theory of B-cell malignancies (B-cell leukemias, B-cell lymphomas & multiple myleoma) is that each represents a cell leaving the tracks at a certain stage of B-cell development. Myeloma represents the derangement of the final stage, a plasma cell, whose normal job is to secrete large quantities of a single antibody. One of the clinical hallmarks of multiple myeloma is the overabundance of a single antibody species in the blood. An even more devastating effect is bone destruction; on some of the X-rays the patient's skullcase literally looked like swiss cheese.

This clinical sign has a relatively straightforward cellular explanation: Myeloma cells stimulate the numbers of osteoclasts. This benefits the myeloma cells via osteoclasts secreting various growth factors favorable to myeloma cells. Myeloma cells reciprocate both by stimulating mature osteoclasts and by encouraging the common progenitor of osteoclasts and osteoblasts to more often mature into osteoclasts. Myelomas may also send hostile signals to osteoblasts, encouraging them to commit suicide. A new garden metaphor appears: myelomas cultivate their surroundings & fertilize their soil.

Many of the active drugs for myeloma, including my former employer's drug Velcade, may work by both targeting the myeloma cells but also targeting these interactions with their cellular microenvironment. If a drug can block the stimulation of osteoclasts, or attenuate the inhibition of osteoblasts, then useful clinical benefits might ensue -- such as reduced bone damage but also perhaps dampening the stimulatory signals from the osteoclasts to the myeloma cells.

Many existing drugs may target other cellular ecological interactions in tumors. In particular, many drugs may antagonize angiogenesis, the formation of new blood vessels to feed the tumor. The first drug targeting angiogenesis specifically, Avastin, appeared about two years ago, and more will undoubtedly appear.

An important implication of this sort of thinking is questioning the very way we study cancer. Much early stage research is based on tumor cell lines in culture -- cell lines which do not have any cellular partners present. Mouse models using human cell lines may not recapitulate these cellular interactions, as the mouse and human cells may be communicating inefficiently due to different molecular accents. Even looking at humans, pharmacogenomics studies which extract signal only from tumor may be missing much of the picture. In addition, most cancer models and many early stage cancer clinical trials are based on reductions in the size of the tumor as the figure-of-merit. Therapies targeting the cellular ecology might not produce rapid, drastic changes in tumor volume. New models must be (and have been) developed & validated, and they are often more complex and take longer or more resources to execute. Complexity breeds complexity -- and we have no idea how many more complexities we will encounter.

But in the end, we have no choice. We can try to make simple models of what cancer is, but if you are trying to treat a complex disease you need an appropriately complex model. But if you're trying to explain cancer to a second grader, perhaps you need to fall back on the 'cancer is like a weed' explanation.

[Note: Blogger has an odd way of timestamping posts you save as drafts, so I'm adding this note to try to get this popped into the Week of Science]
[updating again to try to push this through]

Thursday, February 01, 2007

How do you sharpen a keyboard?

I've signed up for the Just Science 2007 blogging effort. The idea is for a bunch of science blogs to each post at least once a day for a week on purely scientific topics -- no bashing on pseudoscience/anti-science, no movie reviews, etc.

It's an interesting idea & of course a bit of promotion of this site. But it is also a bit of an obligation -- my pattern has been to post 4-5 times during the workweek, occasionally giving myself the night off if too many other things intrude (and indeed, I'm scheduled to give a seminar next week & I'm overdue on an academic collaboration & etc). No bailing this time: seven days, seven posts -- balanced against the day job & a normal family life.

Back in my student days, also known as 60% of my existence so far, there were ways to prepare for a big scholastic event. I never sharpened quills or picked out fresh nibs, but sharpening pencils was a time-honored tradition to prepare for critical tests. No electric for me -- only a hand-cranked model has the proper tactile & auditory bouquet to satisfy! For a brief while, a big writing assignment might mean finding our freshest typewriter ribbon & a nice stack of clean paper. Later this morphed into a good dot matrix ribbon & microperf paper. High school & college meant a return to pencil sharpening, so those #2's would be in top shape for the PSAT, SAT or GREs. Not only were the rituals practical (a dull #2 does NOT work on those #$#$$ circles as well), but they were a way to burn off a little nervous energy.

But for this, what do I do? Buy some high-end (Martha Stewart?) packets for my router? Break out a box of high energy electrons? Get a reserve bottle of ether for my Ethernet? Modern technology has ruined all the old rituals!

Well, there's always Mom's old standby -- get a good night's sleep. Perhaps that's just what I'll start on now...

Which is Which?

Gregor Mendel was a genius and gave us some simple rules to describe inheritance. However, a large part of subsequent genetics can be viewed as reconciling those simple rules with a greater biological reality -- by adding lots of complexity. For example, Mendel posited genes which assort entirely independently. This was one of the first rules to be modified with the discovery of linkage. Mendel posited a simple recessive and dominant system, but blended inheritance (such as red flowers x white flowers = pink flowers) showed up. And so on, and so on. If you tried to fully rewrite Mendel's simple rules, they would look like something a lawyer cooked up ("in section 3 subpart B we define a segregation distorter gene...").

An early molecular interpretation of Mendel is that each gene codes for a protein €(Beadle & Tatum) and variant alleles code for variant proteins (Pauling). These were again powerful simple principals which remain very useful, but have again undergone a lot of complexification.

In a species such as ours with two alleles for nearly every gene (minus the sex chromosome genes in males), an interesting question is whether the same amount of each allele is made. A good guess in biology is to guess the more complex case, and indeed that is reality: while in many cases the two alleles generate the same amount of mRNA, that isn't always the case. One of the initial observations of this was to explain the odd inheritance of certain conditions, in which the phenotype depends on which parent a particular allele is inherited from (again deviating from Mendel!). Differential marking ("imprinting") of DNA depending on which parent it is from leads to differential expression.

A new paper in Nucleic Acids Research (free!) uses SNPs in a clever way to extend this beyond imprinting. A particularly nice twist is that not only do they demonstrate differential expression of two alleles, but they use that information to map out some of the regulatory sequences which are driving the difference.

The basic idea, which has been published previously, is to develop assays for an mRNA of interest that can differentiate single nucleotide polymorphisms (SNPs) that vary between the two alleles. SNPs are a common form of genetic variation, and most are probably functionally irrelevant -- which is why they are so common, since there isn't selective pressure to ditch them.

Once they had these assays in hand, they used them on various cancer cell lines to find messages with differential expression between alleles. Then they looked upstream of the gene for SNPs which overlapped predicted binding sites for transcription factors, proteins which regulate the generation of mRNA for the gene. Finally, they tested these sites for binding to the predicted transcription factor. In eight cases they successfully identified SNPs that alter transcription factor binding.

This sort of information is particularly relevant to understanding cancer. One hallmark of cancer is a reduced ability to properly replicate the genome, with the result that mutations occur at a much higher rate. Some cancers are even in part due to the loss of key DNA integrity maintenance systems and have been shown by sequencing to be chock full of mutations. If you have two alleles of a gene and one promotes cancer growth (or resistance to an anti-cancer drug), then expressing more of that allele will be beneficial to the tumor. A small elevation in expression of a tumorigenic allele could make a big difference -- and might escape notice if just looking at bulk expression levels. The converse could apply for a tumor suppressor -- a negatively-acting regulatory SNP, perhaps combined with other negative regulatory mechanisms such as methylation, might reduce a tumor suppressor mRNA level below that required to keep cancer growth in check.

Transcription factor binding site SNPs are not the only way SNPs might alter mRNA abundance -- another recent paper showed a SNP which left the coded protein unchanged ("synonymous SNP") but reduced the stability of the mRNA. And the challenge of predicting the effects of SNPs on proteins was reinforced recently with the first identification of natural SNPs which are synonymous but still succeed in altering protein structure and function.

It is the curse -- and wonder -- of biology that there are no simple rules, or even simple exceptions. The fun part is figuring out how to leverage all those exceptions into tools to explore other facets of biology, as the mRNA SNPs -> transcription factor sites paper did.