Omics! Omics!: 2007

Thursday, December 27, 2007

A Vertex by the sea?

If one thinks like a builder, it is not difficult to scan the prime biotech zone in Cambridge and see it full at some point. I remember bicycling to the Harvard Medical School in the 90's past an empty zone with just a couple of lone buildings; those buildings are now thickly surrounded, save some parkland. There are still some parking lots that might be made over, but in general there isn't a lot of free space left. Some single-story buildings might go (someone must be eyeing the boarded up saloon up the street from Novartis, but not those close to residential land -- which is a lot of them -- and there isn't much room up. Between a general Cantabrigian disdain for high rises & fire department restrictions on where labs can go, up is not a great option for biotech.

Throughout the zone there are also other uses for what space there is. MIT owns much of the land, and must be wondering whether it will be hemmed in. Urban planning has shifted away towards favoring a variety of uses, and so some of the new development in the zone has gone to residences, hotels & shops -- a good thing, too! Hopefully ways will be found to preserve some of the grittier older businesses, the car repair shops & such that are so convenient. But space must be found, or the biotech industry will stagnate.

There is a lot of open space to the far east, where once a large railroad yard sat in the netherlands between Cambridge and Charlestown. New buildings are springing up there & the developers have already advertised in biotech real estate sections.

However, others are thinking of a really big conceptual leap. In a Globe article before Christmas it was revealed that Vertex is contemplating moving their entire operation to new buildings to be constructed at Fan Pier. This is an area just off the center of Boston and on the waterfront, and which is in an area which is becoming a magnet for development. Better road connections, thanks to the Big Dig, a new transit line, a new federal courthouse, and the new convention center have led to other businesses, such as restaurants and hotels.

It's not hard to see the attraction of the place. Walkable to downtown Boston and a short walk to the transit hub (intercity & commuter train, bus, subway) at South Station. Within site (across the water, traversed by a tunnel) of the airport.

The obvious uses of this space were offices (particularly legal ones; the courthouse is next door) -- but biotech? I wouldn't have thought of it, but somebody did. It's a bold move, one to announce that Vertex has arrived as a FIPCO (Fully Integrated Pharmaceutical Company). The developer has already started acquiring permits around buildings suitable for lab space. And the location has other perks -- nearby Red Line access to the Harvard & MIT campuses, so it's almost like being in Cambridge. The new transit line doesn't yet go many places, but if a proposed tunnel is dug it could connect to the Longwood Hospitals area.

Despite all the hubbub in Cambridge, Boston itself doesn't host much biotech. I think there is some incubator-type space over in Charlestown and maybe some bits elsewhere, but mostly the main city plays a subsidiary role. Remote sites such as a derelict state hospital have sometimes been proposed, but nothing much has happened -- perhaps this could jump-start other unconventional locations for biotech in the Hub of the Universe.

Thursday, December 20, 2007

Getting my hands wet

My undergraduate days were full of laboratory work. I got involved in student research as an undergraduate, spent a xmemorable summer being unsuccessful at DNA sequencing, and enrolled in a new very laboratory-intensive program. Between my mishaps in my coursework & my mishaps in my independent work, eventually it was suggested (and I didn't need much convincing) that maybe I could combine my hobby of computer programming with my focus on biology in some useful way.

After that, labs were on the radar intermittently. My department at Harvard insisted I do one 'wet' rotation, and so I spent a very enjoyable spring in a fly lab, mostly sorting flies & mutagenizing some but also mapping some transposon insertions by in situ hybridization. I also was a teaching assistant, and one year that meant running lab sections. My committee once seemed on the verge of insisting I do some lab work & I started cooking up a suitable experiment involving in vivo footprinting of DNA binding factor sequence preferences, but nothing ever happened.

At Millennium I failed to follow up on a golden opportunity, one that I rue to this day. Soon after joining one of the senior Mass Spec guys invited me to consider spending some time learning the equipment. Now, I had used a 'toy' mass spec as a high school intern to find the leak in some high vacuum equipment (successfully, which led to figuring out that you-know-who had inadvertantly put it there!) -- the spec is built into the instrument & you run a Pasteur pipet hooked to a helium tank around all the suspect spots -- when you see the helium spike, you've found the leak. Plus, my father worked on electronics for a mass spec that got a one-way trip into the Jovian atmosphere, so I had an interest in the things. Alas, I never quite followed through.

Later, it was suggested & I had started making concrete plans to learn how to work the HTS screening robotics at Millennium. Those plans were being laid when I got the pink slip (alas, they don't actually print the letter on pink paper!).

So, it is with some pride I can report I actually did some lab work yesterday. After several polite invitations from our headmistress of sequencing, I spent a bit of time observing & doing some 'scut work' -- having asked for as much as they dared give me. I labeled plates, recored them into the Laboratory Information Management System (LIMS) & later helped load & unload the robot preparing them. I sealed the plates (with a polite admonition that I was being too gentle with the roller). I also got to load the thermocyclers, select the right program & then unload them when done.

Not exactly the solo conquering of megabases that my imagination whipped up, but still very informative. You don't just pour DNA in one end of the sequencer and get data out the wires at the other end; there's a lot of manual tracking & care involved, even in a highly automated modern sequencing laboratory. It actually turns out that the process I observed will be upended in the near future for a much less hands-on, much higher throughput one. And, I couldn't help thinking of the various next-gen technologies, where you just do away with all those 96 or 384 well plates & instead run much larger numbers of sequencing reactions in a much smaller area.

It's all very different than when I was trying to use Sequenase, single channel pipettors, radioactive S35 & slab gels nineteen years ago. And one difference in particular relates to the title of this entry -- no water baths! You don't actually get your hands wet anymore -- nice dry thermocyclers do all the incubating instead.

Monday, December 17, 2007

The Nine Circles of Basel

Human society has long had a fascination with large human enterprises. Complex governmental systems, with ministers and sub-ministers, seem to appear in the literature of each civilization (at least the few I have sampled). Confucius's Analects are largely about how to properly govern. Those nine rings of Dante's Inferno don't just run themselves, but clearly have a hierarchy of devils managing them. Throughout history, churches, governments and the military have had organized systems of managers and sub-managers. While these are often the bane of our existence, we do occasionally get such joys as the novel Catch-22 or the movie Brazil -- and the terrors of 1984. (Does some professor somewhere teach a course in the Literature of Bureaucracy?)

According to the few sources I have read, modern corporate bureaucracy can largely trace its origins to the boom of railroads in the mid-1800's. Railroads required organization, or at a minimum freight would rot in the wrong place and at worst expensive rolling stock would be destroyed in collisions (the loss of the passengers & crew being not much concern of the railroad barons). Many of the American railroads were run, particularly after the American Civil War, by men with strong military experience, and so organized structures grew. And grew. And grew.

The Wall Street Journal carries an item (which, as the one silver lining to the Murdoch takeover, is free) that a key focus of Novartis' announced job cuts is to eliminate bureaucracy. Dr. Vasella, the CEO, was shocked (shocked!) that a mid-level manager in one Novartis division had 6 levels of employees below him, and believes this is too many.

I used to joke at Millennium that I was routinely being demoted. This was obviously so, as the number of people between myself and Mark Levin kept growing. When I joined, around 250 employees, there were two people interposed, but at times it was at least four or five. I reported to a group leader, who reported to the informatics head, who reported to the technology head, who reported to the CSO, who reported to Mark. At various times the higher levels would shift, but like the beach which does the same it rarely changed my routine existence. We'd get invited to different meetings, my impertinent emails would irritate different superiors, but overall it took some digging to find the real changes -- and by the time you did, the next reorganization would be upon us.

Six levels at first glance seems like a lot. If each manager had 10 reports, then that's a million employees, right? Much more than the employment of Novartis (calculated from another report at around 100K. Vasella has apparently decreed that no division shall have more than six layers of reporting (hmm, with a few layers at HQ, that would be how many levels total? :-). Suffering from the sclerotic plaques of bureaucracy? Apply the statin of restructuring!

I'm no fan of bureaucracy -- I have a particular talent for botching official forms -- but I hope that Vasella & his staff think carefully about the unintended side-effects of such a crusade. Too many layers stifle innovation. But too few may have consequences as well.

My jaundiced history of corporate organization leaves out some of the other drivers of layers. Yes, 6 layers should be sufficient to support 1 million employees -- if each manager has precisely 10 reports. Even that many is too many for some Human Resource experts' tastes, but more importantly sometimes a manager should have fewer. Someone might be a great manager of three but horrible at seven. Or you do really need 3 mass spectroscopists managed by one senior one -- but that's it. Also, some layers of management are a way to train & retain valued employees.

At first thought, the main danger of a blanket edict is that the organization will adjusted to fit the management dictum, not the other way round. Procrustes as organizational expert. While that is certainly not a foreign mode of operation for large companies, it is hardly what you want to encourage! Second, the organization will be stifled in new ways: sorry, we can't enlarge this successful organization without blowing it up, as we've hit our depth chart limit. Furthermore, the asterisks are likely to start rolling out -- and with them additional warpings of the original goal. Summer interns -- they don't count. Full-time contractors -- nah. Outsourcing -- well, of course not! And as routes to avoid limits are found, they will be used -- whether they make overall sense or not.

Is there an alternative? It's hard to say. Like most management mantras this paring, if applied judiciously, might be a good thing. Loosening up rules to minimize how far purchasing requests must percolate upwards are good. Identifying where additional layers are not adding expertise but only inertia is good -- but inertia tends to be in the eye of the beholder (or perhaps, bestopper). Legal requires such as Sarbane-Oxley don't exactly encourage a free hand either -- the shareholders like to know how you're spending their money, and if not them then Capitol Hill.

Of course, one solution is to work in a very small company. Then there can't be too many layers between you and the top, unless the company has a completely linear organizational structure! That's not to say that small companies don't face the same human & informational challenges or solve them easily, but too many layers of managers tends to be low on the worry list.

Thursday, December 13, 2007

Corporate DNA

Hsien-Hsien is frequently asking 'What's in your DNA?', and a correspondent of mine recently pointed out that "it's in our DNA" is becoming a bit of a corporate cliche. Indeed, there is at least one TV advertising campaign on those lines, though the ad writers would be disappointed to find out I can't name the company.

Now, I'm not always responsive to suggestions for blog posts, but since the writer shares both my mitochondrial & Y-chromosome genotypes, I was more easily swayed.

The question posed is this: what do companies asking this really mean, or more specifically what might it mean that they don't intend (very Dilbert-esque). Presumably they are trying to make a statement about deeply embedded values, but what does it really mean to have something in your DNA? For example, do they mean to imply:

A lot of our company is unfathomable to the human mind

There's a lot of redundancy here

Often we often repeat ourselves often repeatedly, often repeating repetitiously.
We retain bits of those who invade our corporate DNA, though with not much rhyme or reason

A lot of pieces of the organization resemble decayed portions of other pieces of our organization

Some pieces of our organization are non-functional, though they closely resemble functional pieces of related organizations

Most of our organization has no immediate impact on routine operations, or emergency ones

Most of our organization has no immediate obvious purpose, if any

Our corporate practices are not the best designable, but rather reflect an accumulation of historical accidents

Now, many of these statements may well be true about a given company, but is that what you really want to project?

What's in my company's DNA? Well, that's easy -- it's what the customer ordered! :-)

Monday, December 03, 2007

What Cell Is This, Who Laid To Rest?

With the holiday season upon us, I break out my seasonal music. So it is only fitting while I ponder various versions of Greensleeves that the NIH is finally starting to lay down the lay about the identity of cell cultures.

Cell cultures are great, but the problem is most of the cells pretty much look the same -- especially when growing in a culture dish. Now, I'm sure folks expert in the field will say they can distinguish many of them even without a microscope, but the fact remains that errors have frequently occurred and more than a few published studies are wrong because their cell culture wasn't what they thought it was. At my previous posting we burned a lot of effort on one large dataset that turned out to be useless for just this reason.

The prod in this case was an open letter by a number of researchers, and the response will apparently be to encourage referees to downgrade grant proposals and such which do not authenticate their cultures. 'bout time.

At Millennium we had licensed the Ingenuity database, which is a spectacular collection of biomedical facts culled from the primary literature (spectacular, but neither perfectly correct nor comprehensive, but amazing nonetheless). When gearing up for a big experiment on cell line X, we might try to pull all the knowledge of the database derived from experiments on cell line X -- the level of detail which Ingenuity provides. Of course, some of these would be contradictory -- and I found other cases where two published experiments were claimed to be precisely the same, but with very different results. The letter cites estimates that 20% of cell cultures are the wrong thing, which might explain a few of these.

Particularly in cancer research, this lack of a critical control is downright stupid. Nowadays, a somewhat pricey but powerful cell typing method are SNP chips. These will clearly disambiguate mouse from human (you won't get much signal if from the wrong species!), but can also probe deletions & copy number variation (though not balanced translocations). Given that most tumor cell lines are pretty fouled up in the DNA operations & maintenance department, expecting these cell lines to be stable is pretty unreasonable. Particularly if the experimenter is deliberately selecting for something (such as adherent growth, or growth in the absence of specific factors, or such), checking for major changes just makes sense -- and will also catch sneaky invaders who might take over such a culture.

Science published a good news article on the topic last winter -- alas, it will require subscription access.

Sunday, December 02, 2007

Metastasis research deficit?

The Boston Globe had an article starting on the front page (briefly free, then will require $$) titled 'Critics blast slow progress on cancer: Say costly drugs do little to extend lives'. As with most newspaper articles the piece is short on hard data and longer on quotes. The most data-rich element is a graphic on the front page comparing 5-year survival rates for isolated vs. metastatic cancers (colon 89.8% vs. 10.3%; prostate: 100% vs. 11.9%; lung: 49.1% vs 3.0% & breast: 98.0% vs. 26.7%).

Two data items of interest that are here. First, an author who is "writing a book on the 'dysfunctional' cancer research industry' claims that 0.5% of federal research dollars have gone to studies of metastasis. The other is that 92% of cancer drugs entering human testing fail to make it to market.

If that 0.5% number is correct, it is most unfortunate. Early detection is great, but there needs to more focus on preventing metastases and treating them once they occur. There is a quote from Judah Folkman that some promising initial results have been seen using angiogenesis inhibitors to prevent metastasis, but they are clearly very small studies.

A key point that the article hammers on is that cancer researchers have constantly been promising that cures were around the corner, and yet that hasn't been realized. It cites the 36-year war on cancer, which was promoted as ending cancer by the bicentennial in 1976. More recently, now FDA commissioner Andrew von Eschenbach was proposing eradicating cancer by 2015.

What the article fails to explore is that we can't really know if we are about to have a sharp turnaround. As the article states, the long term survival mark is 5 years -- which means we won't know if the new drugs of 2007 had a real impact until pretty close to 2012. It also fails to explore the idea that extending lives by 'a few months' may be the initial signal, but that further optimization may extend that -- or that drugs are initially tested in desparately ill populations, where the deck may be highly stacked in the tumor's favor. In earlier patient populations, more notable gains may be practical.

Personally, I think that proposing to eradicate cancer by some time in the very near future is a recipe for disaster: again, given that 5-year survival is the key benchmark, eradicating cancer by 2015 would either mean (a) nearly perfect early detection [for cancers where that is nearly synonymous with a cure] and/or (b) eradicating cancer with the drugs in current late-stage testing, since only they could hit the clinics in a big way by 2010 so that 5-year survival could be measured by 2015. The former is just not realistic, and there's no great buzz from the industry that something is there to fulfill role b. Instead, researchers should set more reasonable expectations based on what is realistic. New tools for exploring cancer genomes & personalizing treatment will (IMHO) start making an impact -- but not for 5-10 years as they are tuned & troubleshot.

One other interesting note: amidst bemoaning flat U.S. government support for cancer research, it is noted that various patient-driven organizations are pumping money in or setting up key resources (such as a myeloma tissue bank set up by the Multiple Myeloma Research Foundation). What is interesting, and will hopefully continue, is some of these private organizations trying to invest in proposals that are kind of 'out there'. According to the article, the Komen foundation (breast cancer) has announced plans to invest $600M "to find wild ideas that will break the mold". That's some serious money, and if it is used to fund the 'unfundable' it will probably mean a lot of money going for failures -- and a few spectacular advances. If patients are impatient with research progress, funding what the establishment doesn't is a good way to express that frustration -- and maybe make a huge difference.

Wednesday, November 28, 2007

The Incredible Shrinking Human Genome

When the human genome was still terra incognito (or, at least our knowledge of the sequence was something like my view of the world sans my glasses oft mistaken for bulletproof glass), a key question was how many genes were present. It was widely cited by textbooks that the number was somewhere in the 50K-70K range, or perhaps even 100K, and some of the gene database companies such as Incyte and HGS and Hyseq were gleefully proclaiming the number much higher (just think what you are missing without our product!). The number wasn't unimportant. If you had some other estimate of what fraction of genes might be good targets for drug development, then the total number of drug targets was dependent on your estimate of the number of genes -- and drug targets were saleable -- and patentable.

At some point, a clever chap at Millennium decided to try to pin down these estimates. First he went for the textbook numbers, which everyone thought were well reasoned from old DNA melting curve experiments estimating the amount of non-repetitive DNA. Surprisingly, he was unable to find any solid calculation converting one to the other -- for all his searching, it appeared that the human gene estimate had appeared spontaneously like a quantum particle.

Using some other lines of thinking (I actually have a copy of his neat document somewhere, though technically it is a Millennium secret -- nothing just ages out of confidentiality. Silly, isn't that!) he argued from estimates of the gene content of yeast and from what had been found from C.elegans for a new estimate. Now, I couldn't find the flaw in his logic but I couldn't quite get myself to accept the estimate. It was preposterous! Only 30K genes for human?

Well, of course the estimate came in even further south of there. And a new paper from the Broad has nearly nipped that down to 20K even. Alas, the spectacularly endowed Broad wasn't munificent enough to publish with the Open Access option for PNAS, so until I make another pilgrimage to the MIT Library I'm stuck skimming the abstract, supporting materials & GenomeWeb writeup.

In some sense, the analysis is inevitable. It's hard to look at one genome and get an accurate gene estimate, but with so many mammalian genomes it gets easier -- and this paper apparently focused on primate genomes, which we have an amazing number of already. It sounds like they focused on ORFs found in human mRNA data, which at least removes the exon prediction problem.

The paper has the usual caveats. The genome is finished -- but not so finished. Bits and pieces are still getting polished up, and while they are generally dull and monotonous a gene or two might still hide there (the GenomeWeb bit mentions 197 genes found since the 'completion' of the genome which were omitted). The definition of gene is always tricky, generally going along the lines of Humpty Dumpty in Looking Glass: 'When I use a word...it means just what I choose it to mean -- neither more nor less.'. Gene here means protein-coding gene, to the exclusion of the RNA-only genes of seemingly endless flavor that pepper the genome.

The other class of caveat is very short ORFs -- and some very short ORFs do interesting things. For example, many neurotransmitters are synthesized from short ORFs -- and tend to evolve quickly, making it challenging to find them (I know, I tried in my past life).

Will this gene accounting ever end? The number will probably keep twiddling back and forth, but not by huge leaps barring some entirely new class of translational mechanism.

Speaking of genes & accounting, one of the little gags in Mr. Magorium's Wonder Emporium, a bit of movie fluff that is neither harmful nor wonderful, is a word derivation. The title character hires an accountant to assay his monetary worth, and promptly dissects the title: clearly it is a counting mutant. I find mutants more interesting than accountants, but both have their place -- and I never before realized that one was a subset of the other!

Tuesday, November 20, 2007

Gene Logic successfully repositions, Ore What?

Gene Logic today announced that Pfizer has filed a patent based on a Gene Logic drug repositioning effort. This would appear to be one of the most significant votes of confidence in such efforts by an outside partner.

Drug repositioning is the idea of finding new therapeutic uses for advanced compounds, particularly compounds which are very advanced but failed due to poor efficacy in the originally targeted disease. A number of companies have sprung up in this field -- the two I am most familiar with are Gene Logic and Genstruct -- and at least some large pharmas have in-house programs.

The reality is that many existing drugs have origins in therapeutic areas which are quite different than those they started in. Perhaps the most notorious case is Viagra, which was muddling along as an anti-hypertensive until an unusual side effect was spotted. Minoxidil similarly began in the anti-hypertensive until its side effect was noted. The route to some psychiatric medications began with anti-tuberculosis agents and antihistamines. I doubt that's a complete list.

Gene Logic is one of the original cohort of genomics companies and has been through many iterations of business plan. If memory serves, they were one of several companies originally built around a differential display technology, a way of obtaining mRNA signatures for diseases which predated microarrays. Gene Logic later became one of the major players in the toxicogenomics space, and as part of that effort built a large in-house Affy-based microarray effort. They built microarray databases for a number of disease areas (I've used their oncology database), built a sizable bioinformatics effort, and even acquired their own CRO.

However, none of that could quite be converted into a stream of gold, so over the last year or so the whole mess has been deconstructed, leaving behind the drug repositioning business which had begun as a unit of Millennium (which is one reason I'm familiar with it). They'll even be changing their name soon, to Ore Pharmaceuticals (presumably Overburden and Slag, while appropriate for the mining theme, did not last long in the naming queue).

While there is certainly historical precedent for repositioning, the question remains whether companies can make money doing it, and whether those companies will simply be the big pharmas or the gaggle of biotechs chasing after the concept. Depending on the company, some mixture of in vivo models, in vitro models and computational methods are used. One way to think of it is doing drug discovery, but with a compound which already has safety data on it. There is also extensive interest in the concept in the academic sector, which is a very good thing -- many drugs which may be repositionable have little or no patent life yet, meaning companies will find it difficult to invest in them with any hope for a return.

Gene Logic / Ore has one repositioned drug which has gone through clinical trials, GL1001 (nee MLN4760). This is a drug originally developed by Millennium as an ACE2 inhibitor. Since I'm among the discoverers of ACE2, I tend to key an eye on this one. Millennium gave it a whirl in obesity, but now Gene Logic has found a signal in inflammatory bowel disease in animal models.

That Pfizer bothered to file a patent is significant, as it triggered a milestone payment -- amount unspecified, but these are usually something interesting. But that is still a long way from starting a new trial -- that will be the real milestone, and whichever drug repositioning firm can claim that will really be crowing -- that is, until somebody actually gets a drug approved this way.

Friday, November 16, 2007

Docs eager to winkle wrinkles, slow to hole a mole

I'm no fan of the local TV news broadcasts & therefore rarely catch them. So it was quite by accident that I caught a story last night that is the sort to give one the shudders.

The station had two staffers call 20+ dermatologists. One staffer would state that she had a suspicious mole to be checked out, whereas the other staffer would call requesting an appointment for cosmetic Botox. Now, at some level the results shouldn't be surprising, as if the pattern was the opposite it wouldn't have made the local news. But what was striking was the range of difference: in one case the mole would get an appointment several months in the future, but the same office would be willing to Botox away the next day. Yikes!

Perhaps more striking was the one doc who showed such a pattern who was interviewed on camera. She made no apologies and showed neither shame nor remorse. Her practice is no longer taking on new 'medical' patients, but is happy to accept new cosmetic ones. She did say that perhaps if the caller had been more persistant, perhaps they would have gotten a sooner appointment. Alas, the interviewer did not ask her to explain the ethics of the situation. It is not hard to imagine that many patients calling about a mole are on the fence as to whether to worry or not, and being given a long wait will push them back to complacency (hey, if the doctor's not worried about it why should I be?). Some small fraction of those persons may have early stage melanomas, with potentially lethal results from delay in removal.

It's not hard to guess at the driver of this issue: Botox is elective, probably not covered by insurance, and therefore patients will pay top dollar; mole screening is covered & governed by the immense pricing power of insurance companies.

Somewhere in the last week or so I saw an article commenting that increasing numbers of doctors from a wide variety of specialties are performing cosmetic procedures. A Polyanna might think this would provide the competition to drive the dermatologists back to doing the important stuff, but more likely the corruption will just spread to more specialties. In high school I once switched dentists because it was impossible to get appointments at my long-time dentist, but I went running back after a few appointments when I realized the new guy opened every session with an inquiry as to whether I might want to change out my cap for a better color match.

A real pessimist might note that these new-fangled genetic tests are coming down the pike, that they may also be considered elective and not covered by insurance, and may represent another monetary siren tempting docs to neglect treating disease.

By coincidence, I had just watched the one movie I know of that opens with a monologue on ethics, Miller's Crossing. Caspar & the interviewed doc would be unlikely to have any arguments in that department.

Thursday, November 15, 2007

Trapping the wily VEGF

A significant challenge in pharmacology is the correct dosing of the drug. "The dose makes the poison" is an adage in toxicology, but "the dose makes the drug" just as much. Too little drug and insufficient effect will occur; too much and the patient is likely to suffer from toxic side-effects.

Traditionally, drug dosages evolved completely empirically. Many drugs have profiles allowing very crude dosing -- "take two aspirins and call me in the morning" is remarkable advice, remarkable because it generally works. Other drugs, particularly in trials, are dosed by body size. This makes rough sense, as if you wish to obtain a certain concentration of drug the amount of body it will be diluted in should be taken into account.

Over time various influences on dosing have been realized. I ate a grapefruit today pondering whether one day this small pleasure will be forbidden to me; the metabolism of many drugs is altered by natural compounds present in grapefruit. Individual variation plays a major role as well, with some chemotherapy drugs at normal doses near-lethal to small fractions of the population, because those individuals metabolize the drug differently. Some drugs have notoriously narrow dosing windows: underdose a heart patient and they may have angina or other nasty events; overdose them and they can have nosebleeds which simply won't end.

It is hard enough to dose drugs for which there are decades of experience or which are relatives of drugs with long pedigrees. Dosing brand new agents with new activity profiles is far more difficult. Hence, there is a real need for compasses which could point the way to the correct dose.

VEGF is an important soluble signaling factor which stimulates angiogenesis, the formation of new blood vessels. Anti-angiogenesis agents have emerged as an important tool in oncology and also in the vision-robbing disease macular degeneration. VEGF can be targeted in a number of ways: the antibody drug Avastin (bevacizumab) directly binds VEGF, whereas multi-targeted ("dirty") kinase inhibitors such as Nexavar (sorafanib) and Sutent (sunitinib) knock out the cellular receptors for VEGF (which are tyrosine kinases) among their many targets.

VEGF-Trap is an investigation drug being developed by Regeneron, one of those feline biotech companies (9 lives!) which keep plugging along. VEGF-Trap is a pastiche of carefully chosen protein parts: pieces of two different human VEGF receptors plus a bit from a human antibody (IgG1) constant region.

In a new paper in PNAS (open access) the Regeneron folks show that VEGF-Trap forms stable, inert, monomeric complexes with VEGF which remain in circulation. By measuring the amount of free and VEGF-complexed VEGF-Trap in circulation they can measure VEGF levels and identify a dose which ensures that maximal trapping occurs. If insufficient drug is applied, then little or no free VEGF-Trap is detected.

One significant surprise, in both mice and humans, is that VEGF levels are higher than previously reported. Furthermore, VEGF levels do not differ greatly between individuals with cancer (human patients or xenografted mice) and those without. Human and mouse VEGF levels were very similar, when normalized for body mass. Maximal anti-tumor effects were observed in the mouse models at the dosing point where free VEGF-Trap was observed, suggesting that this method of VEGF measurement can guide dosing.

Can you do the same trick with bevacizumab? Not according to the paper: antibodies form multivalent complexes with their targets, and these complexes are removed from circulation by various mechanisms. Measurements of bound complex are therefore difficult and not informative.

During my previous job I got interested in whether VEGF, or other angiogenic mediators, might be useful for patient stratification. Several papers claimed that soluble angiogenesis factor levels were useful in predicting cancer outcome, but when I compared the measurements in the papers they weren't even on the same scale: the reported baseline measurements in normal individuals were completely different. It didn't invalidate the concept, but certainly prevented any useful synthesis of various papers.

John S. Rudge, Jocelyn Holash, Donna Hylton, Michelle Russell, Shelly Jiang, Raymond Leidich, Nicholas Papadopoulos, Erica A. Pyles, Al Torri, Stanley J. Wiegand, Gavin Thurston, Neil Stahl, and George D. Yancopoulos
VEGF Trap complex formation measures production rates of VEGF, providing a biomarker for predicting efficacious angiogenic blockade
PNAS published November 13, 2007, DOI: 10.1073/pnas.0708865104

Wednesday, November 14, 2007

Ring around the protein...

One of the journals I monitor by RSS is Nucleic Acids Research, and the usual steady flow of new items has become a torrent, mostly about databases. Yes, the annual Database Issue is on its way and the Advance Access shows the signs. And, it's all Open Access.

Every year this issue grows and grows, and each year I skim through all the little niche databases. They may be small and esoteric, but somebody has a passion for that niche & that's great!

I've always liked oddities & anomalies in biology: the rules are useful, but the mess is fascinating. Somewhere in my undergraduate days I came across the fact that there were known examples of circularly permuted proteins, proteins whose final sequence is attained by moving a segment from the tail end around to the front. But somehow the existence of proteins whose mature form is a circle (via a post-translational step) had escaped me. But now that void is filled, as I can loop around to CyBase, a database of cyclic proteins and peptides.

Why circlets? Well, one obvious advantage is two fewer free ends for proteases to make mischief of -- and many of these proteins are protease inhibitors. Indeed, the stability extends to other abuses, with the suggestion that these might make interesting scaffolds for drug development. Circles also make for attractive sequence profile displays. And not only does the database cover naturally cyclic proteins, but has tools to help you design your own!

Conan K. L. Wang, Quentin Kaas, Laurent Chiche, and David J. Craik
CyBase: a database of cyclic protein sequences and structures, with applications in protein discovery and engineering
Nucleic Acids Research Advance Access published on November 5, 2007.
doi:10.1093/nar/gkm953

Monday, November 12, 2007

Just how wrong is Marilyn vos Savant?

Marilyn vos Savant is a writer of a regular column in Parade magazine. These columns address many things, but often have interesting logic puzzles. Given that she is claimed to have the highest recorded IQ ever, whole sites have sprung up to find fault with her writings. Now, I'll confess I'm always looking for an angle -- and rarely finding one.

But this past Sunday, she gave me a bit of an opening: in response to a question as to whether there are any beneficial viruses. Her response:

No. Bacteria are living one-celled microorganisms. By contrast, viruses aren’t alive in a strict sense: They are the ultimate parasites and cannot replicate without a host. They invade the cells of animals, plants and even bacteria, then either lie dormant or get into the driver’s seat and cause changes that disrupt normal cell functioning, the very essence of disease.

The first two sentences and most of the third are dead on the money: bacteria are unicellular organisms, viruses aren't considered alive & invade other cells, where they can lie dormant or immediately go crazy. However, that last bit is the clincher. Apparently Ms. vos Savant is unaware that in the bacterial world there are examples of viruses benefiting their host by bringing along useful genetic stuff. Diptheria is one example, in which the toxin (which presumably helps the bacterial host) is encoded by a virus (phage).

Are there examples outside of bacteria? I don't know of any, but I'm hardly up on my viruses. Moreover, how would we know? Suppose there were viruses which were simply neutral (or nearly so), would we have ever detected them?

Also, in a broader sense some of those phage out there may be an important ecological control on bacterial nasties. So this could be another class of "beneficial" viruses.

Just because you are a parasite doesn't mean you're guaranteed bad!

Wednesday, November 07, 2007

Can you Flex your DNA?

One component of many employment benefit packages in the U.S. (I don't know about other countries) is a Medical Flexible Spending Account. At the beginning of the plan year you choose an amount to set aside from each paycheck which goes pre-tax into an account, the contents of which can be used to reimburse medical expenses. Depending on your tax bracket, this is equivalent to getting a 20-35% discount on your medical bills.

These accounts show the many fingerprints of bureaucracy. Contribution limits are stiff, often both a percentage of pay and an absolute maximum (current $4K, I think). There are, as one might expect, curiosities to the restrictions on what the money can be used for. Health insurance premiums are not coverable, but co-pays and co-insurance are. Both the basic eyeglass frame and the Elton John special are generally reimbursable. A periodontal visit is reimbursable, but the dental floss which might prevent or moderate that visit is not. In general, you must save all your receipts and then send them in to the plan operator. Some offer a handy debit card that works with one of the credit card networks, but if you spend on anything out-of-bounds you'll need those receipts to sort it out -- and you may need those receipts in any case. Lots of paper to save.

But perhaps the most burdensome requirement is use-it-or-lose-it; any money not spent by the end of the plan year is forfeited to the plan operator. Over the last year or so the IRS has loosened this restriction to allow some overlap in plan years, but not all plans allow such. So, you must carefully plan your expenses or the whole benefit is lost -- or worse. And, should you wish a mid-course correction, that's generally not allowed -- you can't change your contribution event unless something big happens, such as the addition or subtraction of a family member (the former is hard to plan precisely, the latter should never be planned!).

So, around this time companies offering covered items start urging folks to check their account balances and spend them before they lose them. Eyeglass merchants are at the head of the line, but so are laser vision correction places.

Which leads to the title question: can you use MedFlex account funds to pay for DNA testing? I honestly don't know, and really nobody lacking a tax law specialty has any business answering the question. But, if I were running one of the personal genomics startups out there I'd be finding out the answer. Perhaps a precedent has been set for other diagnostic procedures not (yet?) well recognized to be of medical value, such as the full-body scans which were heavily marketed a bunch of years back. For if these funds are available, then that is a ready pot of money which might be spent. One big ticket receipt sure wouldn't be a pain to submit, and if the companies were clever they could split the bill over two fiscal years (say one covering the scan and one the counseling) to enable two plan years to be charged with the expense. I don't know whether it would make good medical sense to have such a scan, but some folks on fence might be swayed if they could see it as a bargain.

Tuesday, November 06, 2007

Lung Cancer Genomics

A large lung cancer genomics study has been making a big splash. Using SNP microarrays to look for changes in the copy number of genes across the genome, the group looked at a large batch of lung adenocarcinoma samples. Note: the paper will require a Nature subscription, but the supplementary materials are available to all.

As with most such studies, there was some serious sample attrition. They started with 528 tumor samples, of which 371 gave high-quality data. 242 of these had matched normal tissue samples. All of the samples were snap-frozen, meaning the surgeon cut it out and the sample was immediately frozen in liquid nitrogen.

The sub-morphology of the samples is surprisingly murky; much of the text focuses on Non-Small Cell Lung Cancer (NSCLC), the most common form of lung adenocarcinoma, but the descriptions of the samples do not rule out other forms.

After hybridizing these to arrays, a new algorithm called GISTIC, whose full description is apparently in press, was used to identify genomic regions which were either deficient or amplified in multiple samples.

Many changes were found, which is no surprise given that cancer tends to hash the genome. Some of these changes are huge: 26 recurrent events involving alteration of at least half a chromosome arm. Others are more focused.

One confounding factor is that no tumor sample is homogeneous, and in particular there is some contamination with normal cells. These cells contribute DNA to the analysis and in particular make it more difficult to detect Loss-of-Heterozygosity (LOH), in which a region is at normal copy number but both copies are the same, such as both carrying the same mutated tumor suppressor.

Seven recurrent focal deletions were identified, two of which cover the known tumor suppressors CDKN2A and CDKN2B, inhibitors of the cell cycle regulatory cyclin-dependent kinases. The corresponding kinases were found in recurrently amplified regions; an neat but evil symmetry. Tumor suppressors PTEN and RB1 were found also in recurrent deletions. The remaining recurrent deletions hit genes not well characterized as tumor suppressors. One hits the phosphatase PTPRD -- the first time such deletions have been found in primary clinical specimens. Another hits PDE4D, a gene known to be active in airway cells. A third takes out a gene of unknown function, AUTS2.

In order to gain further evidence that these deletions are not simply epiphenomena of genomic instability, targeted sequencing was used to look for point mutants. Only PTRPD yielded point mutants from tumor samples, several of which are predicted to disable the enzymatic function of this gene's product.

On the amplification side, 24 recurrent amplifications were observed. Three cover known bad actors: EGFR (target of Iressa, Tarceva, Erbitux, etc), KRAS and ERBB2 (aka HER2, the target for Herceptin). Another amplification covers TERT, a component of the telomerase enzyme which is required for cellular immortality, a hallmark of cancer. Another amplification covers VEGFA, a driver of angiogenesis and part of the system targeted by drugs such as Avastin. Other amplifications, as mentioned above, target cell cycle regulation: CDK4, CDK6 and CCND1.

The most common amplification has gotten a lot of press, as it covered a gene not previously implicated in lung cancer: NKX2-1. A neighboring gene (MBIP2) was present in all but one of the amplifications, and so NKX2-1 was focused on. Fluorescent In Situ Hybridization (FISH), a technique which can resolve amplification on a cell-by-cell basis in a tissue sample, confirmed the frequent amplification of NKX2-1 specifically in tumor cells. Resequencing of NKX2-1, however, failed to reveal any point mutations in the tumor samples. RNAi in lung cancer cell lines with NKX2-1 amplification showed a reduction of a commonly-used tumor-likeness measure (anchorage-independent growth). This effect was not seen in a cell line with undetectable NKX2-1 expression, nor was it detected when MBIP2 was knocked down. Previous knockout mouse data has pointed to a key role for NKX2-1 in lung cell development. The protein product is a transcription factor, and the amplification of lineage-specific transcription factors has been observed in other tumors.

What will the clinical impact of this research be? None of the targetable genes which were amplified are novel, so this will nudge interest further along (such as in using Herceptin in select lung cancers), but not radically change things. Transcription factors in general have no history of being targeted with drugs, so it is unlikely that anything will come rapidly from the NKX2-1 observations. On the other hand, there will probably be a lot of work to try to characterize how NKX2-1 drives tumor development, such as to identify downstream pathways.

At least some of the press coverage has remarked on the price tag for this work & the surrounding controversy over the Cancer Genome Project that this represents. The claimed figure is $1 million, which does not seem at all outrageous given the large number of microarrays used (over one thousand, if I'm adding the right numbers) -- a few hundred dollars per microarray for the chip and processing is not unreasonable, and the study did a bunch more (analysis, sequencing, RNAi). If such a study were to be repeated at todays prices in the next 5 big cancer killers (breast, ovarian, prostate, pancreatic, colon), it means another $5M not spent on other approaches. In particular, the debate centers around whether the focus should be on more functional approaches rather than genomics surveys. As fond as I am of genomics approaches, it is worth pondering how else society might spend these resources.

It is also worth noting what the study didn't or couldn't find. A large number of known lung cancer relevant genes did not turn up or turned up only weakly. In particular, p53 is mutated in huge numbers of cancers but didn't really turn up here. The technique used will be blind to point mutants and also can't detect balanced translocations. Nor could it detect epigenetic silencing. If you want to chase after those, then it is more genomics -- which is probably one of the things that eats at critics, the appearance that genomics will never stop finding ways to burn money.

Weir et al. Nature Advance Publication. Characterizing the cancer genome in lung adenocarcinoma. doi:10.1038/nature06358

Wednesday, October 31, 2007

Scientific Easter Eggs

Tonight, of course, is Halloween, one of the many holidays which in the U.S. has a serious sweet tooth. After taking Version 2.0 around for the tradition of gentle extortion on this day, I indulge in my own rituals -- listening to Saint-Saens & reading The Raven. It isn't exactly the right time, but the confectionery angle got me thinking about other sweet holidays, and then to Easter Eggs -- of the scientific kind.

There was a recent complaint in Nature about the growing shift of information from the printed versions of articles to the Supplementary Online Material (SOM). I can definitely sympathize -- as the writer complained, key details have been migrating to the SOM, meaning that sometimes you can't read the print version and really tackle it scientifically. In particular, Materials & Methods sections of many papers have been eviscerated, with the key entrails showing up in the SOM. Most of the point of my print subscriptions to Science & Nature is to be able to read them during my Internet-free commute. Worse, the SOM becomes an appendage in danger of being lost or misdirected -- such as in a recent manuscript I reviewed which showed up without the supplements.

For better or worse, editors & authors have shared interest in shifting things from print to the SOM. For editors, online is cheap. For authors, it is a way to cram more in to fixed paper size limits. Clearly some material (such as videos) can only go into SOMs, and lots of supporting data really does belong there.

In computer code, an Easter Egg is a hidden surprise -- if you know the right combination of keystrokes or commands or such, something interesting (and generally irrelevant to the program) will show up. I'm not sure I've actually ever seen one -- I'm generally too impatient to deal with such things, but I do recognize they exist. Granted, perhaps some of that programming effort would be better spent wringing a few more bugs out, but it is a way for coders to blow off steam.

I propose that a scientific Easter Egg is the inclusion in Supplementary Online Material of valuable scientific data which is peripheral to the main thrust of the paper, but is nevertheless a significant advance. Such events are probably rare, as it requires a certain mindset to bury a possible Minimal-Publishable-Unit in another paper's SOM, but on the other hand it beats something never being published -- and perhaps it is interesting to some but viewed as too minor to merit a paper.

I'll give you an example, from the Church lab. George has long been burying stuff in papers -- for example, one of the footnotes to the original multiplex sequencing paper declared that the technology was being used to shotgun sequence Salmonella typhi AND Escherichia coli! Alas, the project was ahead of the technology & never completed. But a much better Easter Egg is in the first large-scale polony sequencing paper (PDF ; SOM). Supplementary Figure 2 is really an in-depth study of the site preferences of the Type IIS restriction enzyme MmeI -- driven by about 20K of sequencing examples. This is really a bit of restriction enzymology hiding in a sequencing paper. Because the enzyme is used in the method, it is relevant -- but not quite critical. The enzyme preferences are important because it could create biases in sequence sampling, but it is hardly the main point of the paper -- which is why it is in the SOM.

I'm sure there are even better examples out there. What is the most interesting tangential information you have seen in an SOM?

Shendure, J, Porreca, GJ, Reppas, NB, Lin, X, McCutcheon, JP, Rosenbaum, AM, Wang, MD , Zhang, K, Mitra, RD, Church, GM (2005) Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome Science 309(5741):1728-32. DOI: 10.1126/science.1117389

Church, G.M., and Kieffer-Higgins, S. (1988) Multiplex DNA sequencing . Science 240: 185-188. DOI: 10.1126/science.3353714

Monday, October 29, 2007

One lap around

Officially, I started this blog a year ago yesterday, with the first post about science coming the next day.

At times, I wonder what possessed me to assign myself a regular writing assignment. But, it's definitely been rewarding. I've learned from comments & emails, made a lot of new connections, and maintained an incentive to read a lot of papers that aren't directly connected to my current professional duties. I've also gotten to indulge my fondness for wordplay and pun-filled headlines.

I thought I knew what I'd write about, and in general I've stuck to it, though I've certainly strayed periodically a bit outside of biotech & bioinformatics (or figured out tenuous links to them). I've also covered some topics more than I ever would have guessed: I had no idea when I started I'd write so much about dogs!

What might I change for the next year? I really should be more active in blog carnivals -- I miss the deadlines far more than I hit them, and have probably shown up in as many from editors being kind as those I've submitted. I really should take some turns at editing a carnival edition. I also plan to try to join the Blogging on Peer-Reviewed Research bandwagon, though with diligence to only claim that icon when I have actually fully read the article (which dings all the papers I have access to only the abstracts!).

Keeping posts on a regular schedule has been challenging, which makes me appreciate all the more folks like Derek Lowe who post intelligent writing like clockwork. Sometimes there are good excuses (internet-free vacations), but too often the writing gets put off until too late at night. I've also noticed a tendency to follow-up flurries of writing with droughts -- the week long marathon in February, for example, followed by a weak March. Need to work on that. 182 posts over a year's time -- averaging to one every other day. I thought I'd be higher than that, but maybe that really is the comfortable hobby level.

While the writing is a solo effort, getting readership has been helped by an army of others. GenomeWeb is nice enough to feature me regularly in their blog, and a number of individual bloggers have helped through blogrolls, carnival invites & cross-links. The DNA Network is now my primary blog read each day (plus Dr. Lowe).

I'm also surprised at the number of article ideas I've let sit on the shelf -- some dating back to near the beginning. Either post them or kill them!

Thanks to all for reading this. I hope I'll continue to earn your eyes for another year.

Saturday, October 27, 2007

Genomics Lemonade

One of the many attractions of the next generation sequencing techniques is that they eliminate the step of cloning in E.coli the DNA to be sequenced. Not only does this step add complexity and expense, but it also detracts from results. Shotgun sequencing attempts to reconstruct the genome from a large random sample of fragments, but there are some pieces of DNA which clone poorly or not at all in E.coli, skewing the sample. These regions have often required labor intensive, expensive targeted efforts to finish.

However, when life gives out lemons, some break out the sugar and glasses. A new paper in Science Express (subscription required) turns this phenomenon around in a clever way. All those failed clonings weren't nuisances, but experiments -- into what can be cloned into E.coli. And since horizontal transfer of genes is rampant in bacteria, it's an important phenomenon with relevance to medicine (virulence genes are often transferred). And on a huge scale: 246K genes from 79 species, using 1.8 million clones covering 8.9 billion nucleotides.

The first filter was to identify short genes which rarely showed up in toto in plasmid clones, looking at short (<1.5Kb) genes since longer ones will rarely be complete in a short insert clone. Now, common plasmid vectors replicate at multiple copies per cell. To further refine the list, the authors also looked for evidence that these genes were underrepresented in long-insert clones, which typically are in vectors which replicate at a few copies per cell.

No one gene was poison from every species, but the 'same' gene from closely related species was often trouble. Species related to E.coli often had more toxic genes, perhaps because these species already had promoters which could drive significant expression in E.coli. So, they took examples from 31 species two such genes (both for ribosomal proteins) 3under the control of an inducible promoter, and showed much greater toxicity when the promoter was turned on. 15 randomly chosen control genes did not show toxicity.

What kind of genes transfer poorly? One major class are proteins involved in the ribosome, a class previously noted to be rarely found amongst genes thought to have been horizontally transferred. One posssible inference for this is that the ribosome is a highly tuned machine, with excess components able to fit in but not fully function. Interestingly, the proteins in direct contact with ribosomal RNA were found to be more likely to be in the toxic set.

Another test was to simply look at what E.coli genes can't be transferred into E.coli -- well transferred from single copy in a wild-like strain to multi-copy in a lab strain. Such genes are probably toxic purely due to dosage effects (such screens have been used to great effect in the past, e.g. this)

What's missing from the paper? Two quick questions came to my mind. First, how many of the genes are essential in E.coli? Second, what if you simultaneously knocked out the endogenous copy and expressed the foreign one -- would that lessen the toxicity?

There are other examples of leveraging trouble into something interesting that I have had some connection to.

During the early 1990's, no sequencing was going fast enough for young, impatient folk, especially E.coli. At one Hilton Head Conference, there was loose talk of a 'schmutz' genome project -- we would go through all the unalignable reads from all the genome sequencing centers, figuring that a significant fraction were E.coli contamination & therefore might help fill in the E.coli genome. Alas, we never actually pushed forward.

When I was at Millennium in the late 1990's, we were mining a lot of EST data from our own libraries, from the public collections, and from the in-licensed Incyte databaes. A constant minor nuisance was the presence of different contaminants in these collections, and at one point I had my group trying to clean this up. We could successfully identify a number of contaminants, which were sometimes very center-specific. For example, the Brazilian EST collections had contamination from the citrus (lemon?) pathogenic bacteria they were sequencing at the same time. I regarded this solely as a cleanup operation, and when we were done we were done -- but of course some people think more cleverly & so I was chagrined to see a paper by George Church and company using this technique to associate bacteria and viruses with human disease.

All that writing has made me thirsty. Lemonade anyone?

Thursday, October 25, 2007

At long last, a 2nd GPCR crystal structure

G-protein coupled receptors, or GPCRs, are a key class of eukaryotic membrane receptors. Roughly 50% of all small molecule therapeutics target GPCRs. Vision, smell & some of taste uses GPCRs. Ligands for GPCRs cover a wide swath of organic chemical space, including proteins, peptides, sugars, lipids and more.

Crystal structures are spectacular central organizing models for just about everything you can determine about a protein. Mutants, homologs, interactors, ligands -- if you have a structure to hang them on, understanding them becomes much easier. For drug development a 3D structure can be powerful advice for chemistry efforts, suggesting directions to build out a molecule or to avoid changing.

Because they are large, membrane-bound proteins with lots of floppy loops, GPCRs are particularly challenging structure targets. Efforts to build homology models relied on bacteriorhodopsin, which is not a GPCR but has the seven transmembrane topology of GPCRs. The first GPCR structure was finally published in 2000, of bovine rhodopsin. Cow rhodopsin has a significant advantage in that large quantities can be purified from an inexpensive natural source, cow eyes.

Since then, published crystallography of GPCRs has been restricted to further studies on rhodopsin (e.g. this mutant study). Rumors of further structures at private groups would periodically surface, but given the lack of publications & the high PR value of a publication, it seems likely these were just rumors. Now, after 7 years, the drought has been ended with a flurry of papers around the structure of a beta adrenergic receptor, the target of beta blockers.

The papers share a number of co-authors but describe two different approaches to solving the GPCR crystallization problem. For the beta-2-adrenergic receptor, a key problem is a floppy intracellular loop. In the pair (here & here) of papers online at Science, the troublesome 3rd intracellular loop is largely replaced with T4 lysozyme, a protein which has been crystallized ad infinitum. In the Nature paper & a Nature Methods paper describing the method, the intracellular loop is stabilized with an antibody raised against it.

The abstracts hint that B2AR and rhodopsin are strikingly different in some important ways, underlining the need for multiple crystal structures for a family -- with only one, it is impossible to determine what is general and what is idiosyncratic. Indeed, one of the papers reports that published homology models of B2AR were more similar to rhodopsin than the new B2AR structure.

Will these new approaches herald a flurry of GPCR structures? Perhaps, but they hint at what a hard slog it may be. A host of additional challenges were faced, such as the crystals being so transparent it was hard to position them in the beam. Will each GPCR present its own challenges? Only time will tell.

Sanguine Thoughts

Sometimes in life, you just want to lie back and stare at the ceiling. Other times, you have no choice, which is how I found myself for a while last Sunday morning. I was lying on a simple bed, staring at the ceiling of a high school gymnasium, with tubing coming out of my right arm.

I hate needles. One of the many reasons med school was out for me is that I hate needles. I can eat breakfast while watching a pathology lecture, but I can't stand the sight of a needle going into human skin (nor a scalpel). My fear of needles was so severe I had to be partially sedated once for a blood draw, which was most unfortunate as I then couldn't scream properly when the nurse speared some nerve or another & nobody realized the agony I was in. Sticking myself as an undergraduate didn't help, though at least the needle was fresh and had not yet gone into the mouse.

So, a number of years ago I resolved to fight this irrational fear by confronting it in a positive manner, and so I started to give blood regularly. For a while I was giving pretty much as often was allowable, but in the last few years I've slipped and missed a lot of appointments. But, the Red Cross still calls & I still get in a few times a year. And the needle phobia has been calmed from abject terror to tense dread, a marked improvement. Plus, I feel like I'm doing some good -- your odds of saving someone's life are certainly better for donating than for entering a career in drug discovery (though the latter has some huge tails -- a lucky few get to make an amazing impact)

Most blood drives are held in conference rooms, and so the ceilings aren't terribly interesting. Gym ceilings don't do much for me either. There is one memorable blood drive location I've been to: the Great Hall of the Massachusetts State House. But generally, it's iPod and random thoughts time.

This time, the iPod was giving me the right stuff (as in the soundtrack for the same), but my thoughts were roaming. Having done this a lot, one compares the sensations to previous times. For example, the needle in had a little more burn than usual, perhaps some iodine was riding in? On the way out was even more disconcerting: a warm dripping on my arm! The needle-tubing junction had just failed, but things were rectified quickly (though I looked like an extra from M*A*S*H while I held my arm up).

But most of all, I remembered why I had come to this particular drive, with no thought of letting the appointment slip. This drive was in honor of five local children with Primary Immunodeficiency, and three of them are from a family we know well. Their bodies make insufficient immunoglobulins, leaving the patients vulnerable to various infections. Regular (sometimes as often as weekly) infusions of immunoglobulins are the treatment for this. Some causes of PI are known, but others have yet to be identified.

Given that this family has two unaffected parents and three boys all affected, my mind wanders in some obvious directions. That pattern is most likely due to the mutation lying on the X-chromosome, which sons inherit only from their mother. Given the new advances in targeted sequencing, for a modest amount of money one could go hunting for the mutation on the X -- perhaps a few thousand dollars per patient. Such costs are certainly within the realm of rather modest charity fund-raising, so will we see raffles-for-genomes in the future?

If such efforts are launched, will patients and their families be tempted to go largely on their own, bypassing conventional researchers -- and perhaps conventional ethical review boards? If anyone with a credit card can request targeted sequencing, surely there will be motivated individuals who would do so. Some, like the parent profiled in last week's Nature, will have backgrounds in genetics -- but others probably won't. Let's face it, with a little guidance or a lot of patient reading, the knowledge can be acquired by someone willing to learn the lingo.

As I was on my way out, the middle boy, who is 5, was heading outside with his mother to play. He looked up at me and said sincerely "Thank you for giving me your blood". Wow, did that feel good!

Friday, October 19, 2007

What do Harry Potter, Sherlock Holmes, Martha's Vineyard & Science Magazine have in common?

As the Harry Potter series went on, more and more of the characters' names telegraphed a key component of their properties. One of the most blatant of these is Sirius Black, (spoiler alert), who turns out to be capable of transforming into a black dog (Sirius being the dog star). Black dogs show up elsewhere in literature: the hound of the Baskervilles is reported to be a huge black hound. On the Vineyard, there is a restaurant/bar whose apparel has spread around the globe with it's black Labrador log, The Black Dog. Now, joining the parade is Science (currently available in full in the Science Express prepublication section to subscribers only), with the identification of the gene responsible for black coat color, a locus previously known as K.

The new gene turns out to be a beta defensin, a member of a family known previously for its role in immunity. Dogs are unusual in having black driven by a gene other than Mc1R and agouti. Mc1R is a G-protein coupled receptor (GPCRs) and agouti encodes a ligand. Strikingly, beta defensins turn out to be ligands for Mc1R, closing the circle.

GPCRs constitute one of the biggest classes of targets for existing drugs, so one of the first tasks of anyone during the genome gold rush was to identify every GPCR they could. However, it is very difficult to advance a GPCR if it lacks a known ligand ("orphan receptor"), so drug discovery groups spent a lot of effort attempting to 'de-orphan' the GPCRs flowing from the genome project -- and very few had much luck. I haven't kept close tabs on the field for a few years, but it would seem there are still a lot of orphans left. Plus, from a physiological standpoint you don't just want to know 'a' ligand for a receptor but the full complement. This work is a reminder that new GPCR discoveries can come from a largely unanticipated angle.

It's been a huge year for dog genetics, and I've touched on a few items in this space. I suspect that someone really in tune to the field could easily fill a blog with it; I just catch the things in the front-line journals and the occasional stray from a literature or Google search. Much of the work this year has been on morphology, and there's still plenty to do. Many dog breeds have common abnormalities and those are beginning to be unraveled as well -- and many will likely have relevance to human traits. One I stumbled on recently is the identification of a deletion responsible for a common eye defect in collies.

The really big fireworks will come when behavior genetics studies really fire up in dog. Some traits have been deliberately bred into particular breeds (think herding & hunting dogs) and others inadvertently (such as anxiety syndromes). Temperament varies by breed, and of course just about any dog is more docile than their wild lupine relatives. There will be lots of interesting science -- and probably more than a few findings that will be badly reported and misinterpreted in the popular press. Let's hope, for his sake, that James Watson keeps his mouth shut about any of it.

BTW, Lupine? -- another telegraph character name. Fluffy, on the other hand, not quite the name you'd expect on a gigantic three-headed dog. Alas, there's only one Fluffy mentioned, so it might not be possible to map the genes responsible for that!

Thursday, October 18, 2007

Chlamydomonas swims across the line

Last week's Science contained the publication of the Chlamydomonas reinhardtii genome, an old friend of mine from my undergraduate days. One thing I find particularly illuminating is how the focus of Chlamydomonas research has shifted.

Chlamydomonas has been studied for a long time, and was the system where the uniparental genetics of organelles was discovered. Chlamy has two flagella, and a lot of genetics on flagellar function had been performed in the system. But, in general it was viewed as a convenient model system for studying photosynthesis and nutrient uptake. If I remember reasonably well, in the late '80s it was probably 75:25 plant physiology:flagellar function in the literature, and the flagellar work was viewed as basic cell biology. Most publications were either in basic cell biology journals or plant journals, with the most notable paper in a flashy journal being the report of a separate basal body genome -- a finding which has not withstood the test of time.

Around the time I was graduating, it looked like interest in Chlamy might fade. Genetic transformation had finally been developed, but a new model plant had shown up: Arabidopsis. It had many of the desirable characteristics of Chlamy (such as packing a lot into a small space), but the molecular genetic tools were being developed amazingly rapidly & as a land plant (and relative to some of kids' least favorite vegetables) appeared more desirable.

Chlamy's two flagella make it unusual, as land plants and fungi lack flagella. So the genome paper, and some earlier papers, really pounces on this. Flagella have gone from just being interesting cellular structures to interesting cellular structures with a lot of human disease interest. By performing various taxonomic comparisons, genes can be identified as present in all flagellum-bearing species but no non-flagellated ones, being conserved in photosynthetic eukaryotes but universally absent from non-photosynthetic ones. Lots of good stuff there.

What next for the plant that swims? Googling & PubMed reveal interest in biofuels & bioremediation. Chlamydomonas is hot -- and going to stay that way.

Wednesday, October 17, 2007

When Personal Genomics is Very Personal

Anyone interested in personal genomics should hunt down the new Nature (available online at the moment) and read the story of Hugh Rienhoff, whose third child (a daughter) was born with a still mysterious set of symptoms. Since her birth he has been bouncing around trying to get a diagnosis for her condition which resembles Marfan's and a similar disorder called Loeys–Dietz.

Rienhoff was trained as a physician under Victor McKusick and helped start a genomics firm (DNA Sciences), so he was a bit primed for this. Remarkably, he has apparently set up his own PCR laboratory in his house so he can perform targeted sequencing of candidate genes from his daughter's DNA -- using an unnamed contract research house. Alas, none of these searches have yet turned anything up.

Because of the similarity of his daughter's symptoms to the other two syndromes & because both of these syndromes involve TGF-beta signalling, as well as the well characterized role of TGF-beta signalling in muscle development & his daughter's muscular problems, Rienhoff & her doctor recently decided to put the child on a high blood pressure medication which is suggested to reduce TGF-beta signalling and to help in a mouse Marfan's model.

The story is a good illustration of the promise -- and the complications -- of cheap DNA sequencing to identify the causes of rare diseases. Small scale targeted sequencing hasn't worked out -- but given the large number of genes known to be involved in TGF-beta signalling the odds were never wonderful. Perhaps a full genome scan, or targeted resequencing using one of the new array-based capture schemes, might find a strong candidate mutation -- some of the other TGF-beta related syndromes are dominants, so perhaps this will be too & comparing the daughter's scan to the parents will single out the mutation. But, the results might be inconclusive -- no strong candidates. Or, perhaps a candidate is found because it is a de-novo mutation in the child & is likely to have a major effect (non-synonymous substitution, truncation mutant, etc), but in an utterly unstudied gene. At least that's something to go on, but not much.

The article touches on how patients with unusual clusters of symptoms often get lumped into 'dustbin' categories, syndromes whose common thread is an inability to assign the patients to another category. Personal genomics may be quite useful for cutting down on such diagnoses, as the genetic data may sometimes provide the compass to guide through the morass of symptoms. On the other hand, there will probably be whole new bins of genetic syndromes -- 'polymorphism in X with skeletal defects' -- again, it is something to go on, but they are almost guaranteed to pile up much faster than the experiments to sort them out can be run.

After reading the article, I can't help but hope that his daughter gets into one of the big sequencing programs, such as the recently announced Venter center 10K genome effort. There will be a lot to be gained by finding out the ordinary variation which makes each one of us different, but there should also be a bunch of slots reserved for patients for whom sequence results might, if they are lucky, give them some new options in life.

Tuesday, October 16, 2007

Innumeracy at the highest levels

I admire Richard Branson for his many entrepreneurial and adventuring efforts. I am especially wishing for the success of his spaceflight venture -- when Millennium changed travel companies a few years back I put Virgin Galactic at the top of my carrier preference list. Maybe I can arrange a business trip in the future.

But it is clear that Branson isn't the one doing the engineering math -- or let's hope so. I happened to scan a Boston Herald at a restaurant tonight -- I'm no fan of the Herald, but I'm a compulsive enough reader I'll skim it if it's free -- and saw that Branson had spoken before a business group in Boston. He is quoted as saying

You’ll go from (zero) to 4,000 miles an hour in 10 seconds - which will be quite a ride

Presumably Branson's gotten caught up in the thrill of the flight idea, but that's just ludicrous -- not that the Herald caught it. I haven't done such calculations since college physics, but with a little Excel help & my three best-remembered Imperial conversion factors (5280 ft/mile, 12 inches/foot & 25.4 mm/inch) and checking my memory of g in Wikipedia (remarkably, I remembered it!), the miles/h -> inches/hr -> mm/hr -> m/s series puts that at 178.8g! According to Wikipedia, the highest known G-force to be survived was 180+g in a race car accident. Amusement park rides don't even pull 10g (according to the same entry). Given the flight profile of SpaceShipOne, which is the basic technology platform for Virgin Galactic, a more realistic flight profile is 500 miles per hour -> 4000 in two minutes would be a more plausible 14.9g

Such innumeracy is frequently present in media articles in one way or another. Given this poor foundation, how will we ever equip patients to intelligently use genomic profile information? Surely there will be many good, trained persons stepping into that void, and just as surely there will be plenty of hucksters and worse.

Monday, October 15, 2007

Nobel Silly Season

For a number of years now I think of early October as Nobel season. With the prizes often come two rounds of silliness.

The fun silliness are the Ig Nobel prizes. Very silly, the humor is often juvenile, but they are also fun, poking fun at research on the fringe in one way or another. I've attended one ceremony and it is worth doing once (more if you enjoy it the first time).

The ridiculous silliness involves various media reports treating the geography of science Nobel prize awards as some sort of barometer of the state of science in those regions. A year or so ago Nature was moaning over the lack of European laureates. I can't find a link, but this year the talk was about the lack of American science Nobels (no, Al's Peace Award doesn't count as science!) and the dominance of Europeans. This was particularly absurd since 2 of the 3 physiology awardees did their work at American universities! Here is what appears to be Smithies' first mouse knockout paper, and the institution listed is U Wisconsin. Capecchi's came from U Utah.

But even if all the Nobels went to researchers at Lilliput, that would be useless for judging the state of science anywhere. Nobels generally go for work done many years before -- so if they say anything, it would be about the state of science 1-2 decades ago -- and they are hardly useful for that. The Nobel prizes are great opportunities to learn about top notch research, but they are just an idiosyncratic sampler, not a representative sample.

Friday, October 12, 2007

National Wildlife Genomics

Visitors to our house are likely to quickly notice a recurrent theme in the decor, starting with a garden ornament and continuing throughout the house. Pictures, books, dog toys -- even a trash can, with a common two-color scheme. Or, for those who think that way, two non-colors. An inspection of The Next Generation's quarters will reveal the mother lode: melanoleuca run amok. The house bears a bi-color motif: a motif of bi-color bears. Yes, we pander to pandas!

It is therefore with interest to see (thanks to GenomeWeb!) an item from Reuters that the Chinese government is funding a project to sequence the panda genome prior to the 2008 Beijing Olympiad. Wild pandas are found only in China and are considered a national symbol & treasure.

A panda genome should be of great interest to evolutionary biologists, as the panda is a bit of an odd bear. Indeed, until the arrival of molecular systematics its affinity for bears was unclear, with alternate groupings putting them on their own or with raccoons along with red pandas (which are not bears). With the lag time in populating libraries and such, the doubt about their taxonomy persists in many schools and many minds: TNG has already been tutored to defend the ursinity of Ailuropoda with the DNA argument. Pandas have adopted a nearly vegetarian lifestyle, consuming mostly bamboo -- and their digestive tracts probably haven't quite caught up to that change. Anatomical variations, such as the famous panda's "thumb", might also have detectable traces in the genome. Perhaps even some genetic drivers of their extreme cuteness will be identified!

However, if you were picking a bear to sequence for physiological insight, I'm not sure you'd pick pandas, as they don't hibernate, and hibernation is surely a fascinating topic. All those metabolic changes must leave an imprint on the regulatory circuits.

There is a clear solution to that. China is hardly the first country to sequence wildlife genomes identified with that country: the Aussies have been hopping through the kangaroo genome. So perhaps the Canadian's could go after the polar bear genome so the world can have a good hibernating bear to compare with the non-hibernating panda.

What other genomes might be sequenced as a matter of national pride? Are the New Zealanders launching a kiwi genome project? An Indian tiger (or king cobra) project? A Japanese crane sequence? One almost yearns for the lost central European monarchies, as then we would find out the genes responsible for a double-headed eagle.

Thursday, October 11, 2007

Opus #173, Programming on the Dark Side (C#)

I had commented a while back that I was contemplating shifting my programming focus from Perl to another language. The existing code base is split between C# and Python, with more C# but with a lot of code I need to think about in Python. I gave both a bit of a trial and also took some suggestions, and did come to a decision.

Hands down, C# is my language.

Now, language choice is a personal matter, and I don't dislike Python -- at some point I'll write down more impressions -- but C# is a great match. I really do like a strongly typed language, both from the standpoint of catching lots of silly mistakes at compile time rather than runtime but also because the typing provides lots of cookie crumbs for trying to reason out someone else's code (or old code of your own). That could also make for a long separate post.

There are really three powerful things to like about C#. First, the language itself. While by far I can't claim to have figured everything out, for the most part I can't argue with it. Lots of powerful concepts and a general feeling of consistency (as opposed, for example, to Perl's kitchen sink collection of stuff).

Second, there is the .NET class libraries. There is an awful lot there to cover many things you'd want to do, and again there is a reasonably strong sense of consistent design. Here I might find more to quibble over, but it generally hangs together.

Third, there is Visual Studio, a very slick integrated development environment (IDE). The help facility is very powerful for exploring the language, the error messages are generally good, and the ability to browse data in a running program is superb. Furthermore, you can perform a remarkable degree of editing on a running program -- there are many things not allowed, but a lot of runtime errors can simply be edited away and the program continued from where the exception occurred.

However, there is one key drawback to C# from a bioinformatics standpoint: you are not going with the crowd. There appear to have been at least two efforts to create C# bioinformatics libraries for C#, and both appear to have been stillborn. If you Google for "C# bioinformatics" or
.NET bioinformatics" you find stuff, but more idle talk than solid work. And I think there is an obvious reason for that.

All three of the legs are controlled, or at least perceived to be controlled, by the Emperor Gates. If you do click around some of the google links it's not hard to find disdainful comments about the perceived Microsoftity or Windowsosity of C#/.NET. There is an effort called MONO to port the whole slew over to UNIX boxes, but it's not clear this is perceived as more than a fig leaf. The name certainly isn't going to win friends among undergraduates -- "Have you gotten MONO yet?".

On the other hand, there is definitely corporate interest. Microsoft has been making increasing noises about bioinformatics, though perhaps focused further downstream than where I usually work. Spotfire, which is really useful for data exploration, I've heard provides a .NET API. Certainly during my interviews last year I saw C# books or heard mention of it at many of the companies.

So, it's a locally packed but globabaly lonely world to be a C# bioinformaticist. Luckily, it wasn't hard to build the critical tools I needed -- but I needed only a modest subset of what BioPerl, BioPython or BioJava would provide. However, there are some interesting ways to leverage those tool sets -- though that will have to be another subject for another time

Yet Another Far Out Sequencing Idea?

GenomeWeb carries the news that another little-known company, this time English, has thrown its hat into the Archon X-Prize ring.

Base4 Innovation has a website, but it's pretty sparse on details. A lot of cool buzzwords -- nanotechnology, single-photon imaging, direct readout of DNA, but not much more to go on. $500/genome in hours is the target throughput (no mention of error bars on those estimates!)

One of the interesting things to observe as the genome sequencing field heats up is how many non-traditional entrants are being attracted. When the genome sequencing X-Prize was first announced, one of my immediate ponderings was to what degree the entrants would simply be the familiar names in genome sequencing, and which would be out of left field. If I had to place wagers, I would put the outsiders as longshots -- but that's very different than writing them off.

The first X-Prize was personally very exciting, as it would appear to offer a route to realization of a permanent dream -- and I don't have $20M lying around for a trip to the ISS (sizable donations towards that goal, however, will not be refused!). For less than the price of a decent house in the Boston area one will soon be able to get a short trip to sub-orbital space (isn't that what home equity lines were invented for?).

The original X-Prize, though, had a very straightforward goal -- two flights to a certain altitude in a certain timeframe with requirements as to how much of the vehicle was reused (okay, perhaps not so simple to state). The genome sequencing prize has what are really much more (IMHO) comparatively ambitious goals which are harder to define -- after all, the space prize went for replicating a 40-year old feat with private money, whereas the genome sequencing prize will demand going far ahead of current capabilities in the areas of cost and speed.

The space X-prize was won hands-down by one competitor, with nobody else anywhere close. Well, one competitor claimed to the last minute they were close, but it started to smell suspiciously like a publicity stunt for their main sponsor, an utterly shameless internet venture (in applied probability) which also paid streakers to run through the Torino Olympic ceremonies. Will the genome sequencing race also have a runaway entrant, or will it be a photo finish. Stay tuned.

Tuesday, October 09, 2007

This Old Genome

I recently stumbled across a paper proposing a set of mammalian genomes for sequencing to further aging research. A free version of the proposal can be found via this site. I had previously posted some ponderings about what the most interesting unsequenced genomes are, and this would be one focused take on that question.

Despite the fact that it is clearly a process I will be familiar with, I'll confess a lot of ignorance about aging. The paper lays out a good rationale for the mammals it chooses (though with a mammalian focus, misses the opportunity to sequence the tortoise genome!).

This paper is also worth noting as something we will not see many more of. Not because there aren't plenty of interesting genomes to sequence, but because it won't be worth writing a paper about your plans to do so. Once genome sequencing becomes very cheap, a proposal to sequence a mammalian genome will become just a paragraph in a grant proposal at most, or more likely something mentioned only after the fact in an annual grant report. Certainly in the world of small genomes, such as bacteria, the trouble will be getting the samples to sequence, not the cost of sequencing.

On the other hand, even with really cheap genome sequencing, it will be a long time before all species are done -- even if some scientist has an inordinate fondness for beetle genomes!