Friday, December 22, 2006

My Year's End, My Millennium's End

Tonight will be my last post for 2006. The holiday week is a good time to clear one's head and focus on family and fun. Thank you all for taking a gander at this new blog -- it's been fun & I look forward to continuing. I have a bunch of topics scribbled down & a few dozen bookmarks of interesting papers.

Today was also my final day at Millennium. I am on the payroll through the end of the year, severance for a while next year, and various confidentiality agreements to my grave. But for me, this was the end. I turned in all my trappings -- laptop, badge, security token, sent a farewell message, and took my last box out.

For ten years, Millennium has been a constant in my life, yet nothing has been constant at Millennium. Eight desks, six bosses (one twice), uncountable reorganizations (and bosses' bosses), a multitude of department/group names -- I once joked that the company portal should have a banner "Today is Monday & you report to XXX in department YYY" so you could keep track of it. Two CEOs, 5 CFOs (I think), 3 big mergers (defined as changing my daily life!), four big 'restructurings'' -- one could invent dozens more measurs for how MLNM exemplified creative destruction.

As one might expect, that meant a lot different people. By my guess, somewhere between 5-10% of people who worked for Millennium the day I started will still be there after I leave. One back-of-the-envelope estimate is about 3K people who can claim to Millennium alumni, with perhaps close to 2K who were in Discovery at some point.

It's been an amazing experience & quite an education. I will miss it, and it seems a lot to hope that the next career waystation can match that. Fare-thee-well Millennium.

Thursday, December 21, 2006

Nature's The Year In Pictures

Great science does not require great photography, but it never hurts. Nature has a very nice collection of scientific images from the year (free!) . Only one is directly relevant to the usual topics here, an image of a microfluidic DNA sequencing chip, but they are all striking. Enjoy!

Wednesday, December 20, 2006

Blast from the past

In my senior year at Delaware I took a course in molecular evolution from a wonderful teacher, Hal Brown. Hal was the first (I'm pretty sure) to suggest that resemblance of many enzyme cofactors to RNA was a glimpse at an earlier RNA-dominated biochemistry. Another interesting brush with history is that when at Berkeley he got Dobzhansky's desk. He's a great guy & was a wonderful professor.

I had Hal's class fall semester that year, and still was planning to continue my undergraduate research line of molecular biology in a plant system; my overnight mental conversion to a computational genomicist wouldn't occur until Christmas break. So I picked a term paper topic that sounded interesting (and was!), but in retrospect was a glimpse at my future.

Very few eukaryotes have a single genome, as most have membrane-bounded organelles with their own genomes. For mammals, these are the mitochondria, and in plants and alga there are both the mitochondria and chloroplasts. The fact that chloroplasts and mitochondria had their own genetics, which had quirks such as uniparental inheritance, had been known since the early 60's, but their origin had been hotly disputed. The original theory that they had somehow blebbed off from the nuclear genome had been challenged by Lynn Margulies' radical notion of organelles as captured endosymbionts. By the time I wrote my paper, Margulies thesis had pretty much won out. But it still made a fascinating paper topic, especially when thinking about genomes.

The most fascinating thing about these organelle genomes is not that they have shed many genes they no longer need, as those functions are provided by the nucleus, but the real kicker is that so many genes for their maintenance & operation are now found in the nucleus. Over evolutionary time, genes have somehow migrated from one genome to another. For metazoan mitochondria, the effect is striking: only a tiny number of genes remain (human mitochondrial DNA is <20kb>100Kb were either viruses or complete plant chloroplast genomes, so there was some real data to ponder -- my first flicker of genomics interest.

One strong argument for the endosymbiont hypothesis is an unusual alga with the appropriate name Cyanophora paradoxa, which has chloroplast-like structures called cyanelles -- but the cyanelles retain rudimentary peptidoglycan walls -- just like the cyanobacteria postulated to be the predecessors of chloroplasts. Another strong argument is that for a set of enzymes found both in organelle and cytoplasm, in most cases the organellar isozyme treed with bacterial enzymes (and in the right group: proteobacteria for mitochondrial enzymes, cyanobacteria for chloroplast ones). Even the exceptions are interesting, such as a few known examples where alternative transcripts can generate the appropriate signals to lead to cytoplasmic or organellar targeting.

I hadn't really kept up closely with the field after that (not for lack of interest: one graduate posting I considered was on a Cyanophora sequencing project at Penn State). So it was neat when I spotted a mini-review in Current Biology on the current state of things. What is interesting, and new to me, is that the Arabidopsis sequencing effort had revealed that nuclear-encoded proteins of chloroplastic origin covered a wide spectrum of metabolism and not just chloroplast-specific functions. What is interesting in the newer work is that Cyanophora does not share this pattern: here nuclear-encoded genes of chloroplast origin are strongly restricted to functioning in the chloroplast.

Things get even wierder in some other unicellular creatures, which executed secondary captures: they captured as endosymbionts eukaryotes which had already captured endosymbionts.

The same issue contains some back-and-forth arguing over the distinction between organelles and endosymbionts, which I don't care to take a stand on, but they illustrate another case I wasn't very familiar with: a sponge which has apparently taken in an algal boarder. If we can figure the mechanisms out & replicate them, the applications might enable some people to truly be 'in the green' and 'looking green around the gills' will take on a whole new meaning!

Tuesday, December 19, 2006

Next-Gen Sequencing Blips

Two items on next generation sequencing that caught my eye.

First, another company has thrown its hat in the next generation ring: Intelligent Bio-Systems. As detailed in GenomeWeb, it's located somewhere here in the Boston area & is licensing technology from Columbia.

The Columbia group last week published a proof-of-concept paper in PNAS (open access option, so free for all!). The technology involves using reversible terminators -- the labeled terminator blocks further extension, but then can be converted into a non-terminator. Such a concept has been around a long time (I'm pretty sure I heard people floating it in the early-90's) & apparently is close to what Solexa is working on, though Solexa (soon to be Illumina) hasn't published their tech. One proposed advantage is that reversible terminators shouldn't have problems with homopolymers (e.g. CCCCCC) whereas methods such as pyrosequencing may -- and the paper contains a figure showing the contrast in traces from pyrosequencing and their method. The company is also claiming they can have a much faster cycle time than other methods. It will be interesting to see if this holds out.

Given the very short reads of many of these technologies, everyone knows they won't work on repeats, right? It's nice to see someone choosing to ignore the conventional wisdom. Granger Sutton, who spearheaded TIGR's & then Celera's assembly efforts, has a paper in Bioinformatics describing an assembler using suffix trees which attempts to assemble the repeats anyway while assuming no errors -- but with a high degree of oversampling that may not be a bad assumption. They report significant success:

We ran the algorithm on simulated
error-free 25-mers from the bacteriophage PhiX174 (Sanger, et al.,
1978), coronavirus SARS TOR2 (Marra, et al., 2003), bacteria
Haemophilus influenzae (Fleischmann, et al., 1995) genomes and
on 40 million 25-mers from the whole-genome shotgun (WGS)
sequence data from the Sargasso sea metagenomics project
(Venter, et al., 2004). Our results indicate that SSAKE could be
used for complete assembly of sequencing targets that are 30 kbp
in length (eg. viral targets) and to cluster millions of identical short
sequences from a complex microbial community.

Monday, December 18, 2006

Breast Cancer Genomics

This month's Cancer Cell has a pair of papers (from the same group), plus a minireview, on breast cancer genomics.

One paper focuses on comparing 51 breast cancer cell lines to 145 breast cancer samples, using a combination of array CGH and mRNA profiling. The general notion is to identify which cell lines resemble which subsets of the actual breast cancer world. Cell lines long propagated in vitro are likely (almost assured) to have undergone evolution in the lab; this means they are not the perfect proxies for studying the disease. Array CGH is a technique for examining DNA copy number changes, which are rampant in many cancers. Its use has exploded over the last few years, with a number of interesting discoveries. It is also a useful way to fingerprint cell lines; at least one cell line was described recently as an imposter (wrong tissue type), but I can't find the paper because of the huge flood of papers a query for 'array CGH' brings up.

The second paper looks at a set of clinical samples from early breast cancer, and again uses both transcriptional profiling and aCGH. I need to really dig into this paper, but the abstract has some interesting tidbits (CNAs=copy number abberations) -- emphasis my own

It shows that the recurrent CNAs differ between tumor subtypes defined by expression pattern and that stratification of patients according to outcome can be improved by measuring both expression and copy number, especially high-level amplification. Sixty-six genes deregulated by the high-level amplifications are potential therapeutic targets.
The mini-review does highlight a key point: as impressive as this study is, no study can ever hope to be the final word. As new omics tools are developed, new studies will be desirable. Two obvious examples here: running intensive proteomics and looking in depth at alternative transcripts.

Friday, December 15, 2006

Wierd memories from cleaning up

One week left. Time to get serious about the lack of time. One week.

I am a terrible pack rat. I periodically attempt to organize things into folders, but for the most part I use the geologic filing method -- that stratum is roughly October, below that November, below that November (earthquakes & uplifting occur frequently!).

Occasionally my supervisors would crack down (most notably prior to the FDA swinging through the labs one time), but in general there was a better trigger: moving. I was pretty good about lightening up prior to each move. One office lasted 5 years, so there was quite a lot of overburden to deal with that time, but the office one previous to the layoffs was only 2 years and we just moved in the spring. Even at my worst, that's not much time to lay down a mountain. The planners through in one more twist by moving me after the layoffs -- but then again, I was on extended time and they hadn't planned on me being there at all.

However, there was still a lot to go through, with severalmajor categories

  1. Paper for recycling

  2. Confidential material to shred

  3. Items to throw out

  4. Items to forward within Millennium or return

  5. Items to bring home or to next position

We have these big shredder bins which collect stuff for an outside vendor to shred in big trucks -- this is a huge improvement over office shredders, as I always spent more time unjamming them than shredding. It's not efficient to run to the bin each time, so I had a paper grocery bag for batching things. This worked very well -- my four-foot tall unpaid consultant gleefully fed the bins one weekend while I went through papers.

Due to the shortcomings of my system, I came across all sorts of obsolete things. Will I ever again need a serial-to-USB converter? Vendor catalog CD-ROMs from 3 years ago?

On the other hand, some things are really valuable, such as address lists from recent meetings. Others are what I collect too much of but useful: papers that I might want to comment on in this space, old papers I consider really interesting and might refer to.

And, of course, lots goes to recycling or shredding: papers relevant to projects, sequence alignments, snippets of code, etc.

One of the more interesting mixed bags are the business cards, and that's also where I got a strange trip down memory lane. I found some recent ones I thought I had lost, which would have been good contacts to have in my job search (aargh!). Others I couldn't remember at all -- I really should put some context on the back. And finally, I found one from early in my career that I remember vividly.

We had a group (MBio) trying to find the next Epogen and I was the main bioinformatics scientist attached to the group. They were constantly growing & constantly recruiting. I was going to the Hilton Head conference, and the MBio research chief wanted me to screen a candidate: simple enough.

We set up a meeting in one of the hotel bars. The conversation was pleasant, but neither of us seemed to have a strong reaction either way. He wasn't sure he wanted to leave his existing position or take this new one. I reported back to base the equivocal meeting, and moved on.

So it was stunning to see a news item several years later that the same person I had interviewed was the perpetrator of a murder-suicide. I think I saw it on GenomeWeb, but they don't seem to archive very well. I found (via Wikipedia) another item, which adds a truly surreal note about what happened to the pizzas used to lure the victim from her home (you have to read it to believe it).

You meet a lot of unusual people in science, but perhaps you never know -- and never want to -- who are truly outside the norm.

Thursday, December 14, 2006

Red Alert Mr. Pseudomonas!

I finally decided that four weeks of laryngitis was perhaps too long and got myself in to the nurse practitioner, who obliged me with an antibiotic script. Our bar for using antibiotics has historically been too low, but perhaps I overshot in the other direction.

Or maybe not. A recent paper in PNAS presents the provocative thesis that low doses of antibiotics can stimulate nasty traits in pathogenic bacteria. Using a microarray and low doses of three structurally unrelated antibiotics, they detected switching on of a number of unpleasant genetic programs.

All three antibiotics induce biofilm formation; tobramycin increases bacterial motility, and tetracycline triggers expression of P. aeruginosa type III secretion system and consequently bacterial cytotoxicity. Besides their relevance in the infection process, those determinants are relevant for the ecological behavior of this bacterial species in natural, nonclinical environments, either by favoring colonization of surfaces (biofilm, motility) or for fighting against eukaryotic predators (cytotoxicity)

The authors go on to suggest that antibiotics may be important signalling molecules in natural communities. This is in contrast to the older model of antibiotics as weapons in microbial battles for dominance. It is a provocative thesis worth watching for stronger evidence. In my mind, their data still fits the weapons model -- what they see is the same sort of signalling as my blood in the ocean signals a shark. Or, a bacterial Captain Kirk detecting an unseen ship raising its shields, prompting a defense posture.

Wednesday, December 13, 2006

Grading the graders

I picked up a copy of The Economist last week, as is my habit when flying, and it happened to have a quarterly review of technology. There is a quite accurate story on microarrays that does a good job of explaining the technology for non-scientists.

One of the bits of the microarray story I had forgotten is retold there: how both the Affymetrix and Stanford groups pioneering microarrays had grant proposals which received truly dismal priority scores.

For those readers not steeped in academic science, when you ask various funding sources for money, your proposal is put together with a bunch of related proposals. A group of volunteers, called a study section, review the proposals and rate them. The best are given numeric scores and these scores are used to decide which proposals will receive funding. Study sections also have some power to suggest changes to grants -- i.e. cuts -- and to make written critiques. Ideally, these are constructive in nature but such niceties are not always observed.

A commonly heard complaint is that daring grant proposals are not funded. Judah Folkman apparently has an entire office wallpapered with grant rejections for his proposal of soluble pro- and anti- angiogenic factors. Robert Langer apparently has a similar collection trashing his ideas for novel drug delivery methods, such as drug-releasing wafers to be embedded in brain tumors. Of course, both of these concepts have now been clinically validated so they can gleefully recount these tales (I heard them at a Millennium outside speaker series I will dearly miss).

I've participated once (this summer) in a grant review study section and would love to comment on it -- but by rule what happens in Gaithersburg stays in Gaithersburg. There are good reasons for such secrecy, but it is definitely a double-edged sword. It has the potential to encourage both candor and back-stabbing. It certainly prevents any sort of systematic review of how study sections function and dysfunction.

What I think is a serious issue is that such grant review processes have little or no mechanism for selecting good judges and avoiding poor ones. Reviewers who torpedo daring good proposals have no sanction and those who champion heterodoxy no bonus. It isn't obvious how you could do this, so I do not propose a solution, but I wish somehow it could work.

One wonders whether the persons who passed over microarrays regret their decisions or stand by them (and what got funded instead?). Do they even remember their role in retarding these technologies? If you could ask them now, would they say "Boy did I blow it!" or "Microarrays? Why ask about that passing fad?"

Wednesday, December 06, 2006

Image Enhancers!

That must have been the command that went out at the Joint Genome Institute, as they have a nice paper in Nature a few weeks back showing how sequence conservation can be used to find enhancer elements.

They started with non-coding sequence elements that are either ultraconserved between mammalian species or showing conservation in Fugu. These elements were placed in a vector wit a naked promoter driving a reporter gene (lacZ) & microinjected into mouse eggs. Embryos were then stained at day 11.5.

Greater than 50% of the ultraconserved elements drove expression of the reporter gene, and further conservation in Fugu did not improve the finding of enhancers. But more than 1/4 of the Fugu-conserved sequences lacking mammalian ultra-conservation functioned as reporters. I do wish they had put in the frequency that random mammalian fragments will score positive in this assay; surely that sort of negative control data is out there somewhere. It's probably a very small fraction.

The articles, alas, require a Nature subscription -- but you can also browse the data at

One of their figures shows a variety of staining patterns driven from elements pulled from near the SALL1 gene. Various elements show very different staining patterns.

They also use 4 enhancers driving forebrain expression to find motifs, and in turn use those motifs to search the Fugu-Human element set. 17% of the hits act as forebrain-specific enhancers, whereas only 5% of the tested elements are forebrain enhancers. So even a with very small training set they were able to sigificantly enrich for the target expression pattern.

It will be interesting to see this approach continued, especially to annotate GUUFs (Genes of Utterly Unknown Function).

Tuesday, December 05, 2006

Cousin May's Least Favorite Bacteria

Ogden Nash was a witty poet, but skipped some key biology. Termites may have found wood yummy, but without some endosymbiotic bacteria, wood wouldn't be more than garnish to them -- and the parlor floor would still support Cousin May.

It shouldn't be surprising that such bacteria might be a challenge to cultivate in a non-termite setting. Conversely, university facilities departments are not keen on keeping the native culture system in numbers! :-) Last week's Science has another paper showing off the digital PCR microfluidic chip I mentioned previously. They are again performing single cell PCR, except this time it is going for one cell per reaction chamber rather than one cell per set of chambers. That's because the goal now is not to count mRNAs, but to count bacteria positive for molecular markers. By performing multiplex PCR, they can count categories such as 'A not B', 'A and B', and 'B plus A'.

The particular A's and B's are degenerate primers targeting bacterial 16S ribosomal RNA and a key enzyme for some termite endosymbionts, FTHFS. The 16S rRNA primers have very broad specificity, whereas the FTHFS primers are specific to a subtype called 'clone H'. One more twist: reaction cells with amplifying both primer pairs were retrieved, further amplified, and sequenced. This enabled specific identification of the bacteria present in the positive wells, and in most cases the same 16S and FTHFS sequences were retrieved from wells amplifying both. This is some nifty linkage analysis!

In addition to all sorts of uses in microbiology, such chips might be interesting to apply to cancer samples. Tumors are complex evolving ecosystems, with both the tumors and some of their surrounding tissue undergoing a series of mutations. An interesting family of questions is what mutations happen in what order, and which mutations might be antagonistic. This device offers the opportunity to ask those sorts of questions, if you can design the appropriate PCR primer sets.

Monday, December 04, 2006

Computing Cancer

Last week's Cell has a paper using simulations to estimate the influence of the local microenvironment on the development of invasiveness in cancers. Their model includes both discrete elements (cells, which have a number of associated discrete states) and continuous variables, and is therefore referred to as a Hybrid Discrete-Continuum, or HDC, model. Properties associated with a cell include both internal activities, such as metabolism, and external ones, such as oxygen tension (which, the authors point out, could be any diffusible nutrient), secretion of extracellular matrix degrading enzymes, and the concentration of extracellular matrix.

With any model, the fun part are the predictions they make -- particularly the unorthodox ones. Predictions enable verification or invalidation. For the Cell paper, they do make quite an interesting prediction:
The HCC model predicts that invasive tumor properties are reversible under appropriate microenvironment conditions and suggests that differentiating therapy aimed at cancer-microenvironment interactions may be more useful than making the microenvironment harsher (e.g. by chemotherapy or antiangiogenic therapy).

Experimentally testing such predictions is decidedly non-trivial, but at least now the challenge has been posed. This prediction has clear implications for choosing therapeutic strategies, particularly in picking combinations of oncology drugs -- and few cancer patients are on only a single anti-tumor agent.

This paper is also part of Cell's experiment in electronic feedback -- readers can submit comments on the paper. The opportunity to be the first commenter still appears available -- is there anyone reading this brave enough to go for it?

Sunday, December 03, 2006

Perverse Milestone

Well, this blog hit a dubious milestone today -- my first comment spam! Unsurprisingly, given my topic choice, it was for an online "pharmacy.

On the other hand, it is wonderful to see helpful & friendly comments -- people actually are reading this! Thank you thank you thank you!

Friday, December 01, 2006

Dead Manuscripts #0

When Millennium originally cleaned house, I thought I would be idled almost immediately & this blog was one new initiative to maintain my sanity during the downtime. But then, thanks to some campaigning by friendly middle managers, I was given an extension until end-of-year. It's nice, since it gives me a little more time to hand-off a couple of projects to people.

But, there's still a lot of time left over, and so like Derek Lowe I find myself trying to cobble together some manuscripts before I go.

Now the problem here is that I tend to think I have a lot of interesting stuff to publish -- until I actually get going. It's serious work preparing something for publication. Plus, sometimes when you start dotting the i's and crossing the t's your results start looking less and less attractive.

I am trying to tackle too many papers, especially since all but one are solo affairs. So it will be time to cull some of the ideas soon. There's also stuff that previously stalled somewhere along the way & I doubt I'll ever resurrect. One was even submitted -- and rejected; I found myself agreeing with half the reviewer's comments about the quality of the writing.

Normally, these just go back into memory as items to trot out if they answer questions in interviews ("oh, yes , I once did a multiple alignment of llama GPCRs..."). But now, I have a place to unleash them on the world! I'm the editor & review board! (& 1/10th the readership? :-) Perhaps some of the nuggets will be useful to someone, and perhaps some will even be worked up by someone else into a full paper.

A lot of these little items are interesting, but not quite a Minimum Publishable Unit, or MPU. In academia, there are often debates as to the minimum content of a paper, and some authors push to publish no more than an MPU. Others go in the opposite direction: you need to read every last footnote in one of George Church's papers to get all the stuff he tries to cram in. For example, Craig Venter was the first to succeed at whole genome shotgun sequencing in 1995, but George was trying it back in 1988: see the footnotes to his multiplex sequencing paper.

I almost killed one idea today, but alas I figured out one more question to ask of the data. I'll do that, but I really should kill this one. It's the one where I'm skating way outside my recognized expertise and the results are useful but not stunning. The clock is ticking away, and it would be better to wrap up one good story than have 4 manuscript fragments to add to the queue for this space.