Wednesday, October 14, 2009

Why I'm Not Crazy About The Term "Exome Sequencing"

I find myself worrying sometimes that I worry too much about the words I use -- and worry some of the rest of the time that I don't worry enough. What can seem like the right words at one time might seem wrong some other time. The terms "killer app" are thrown around a lot in the tech space, but would you really want to hear it used about sequencing a genome if you were the patient whose DNA was under scrutiny?

One term that sees a lot of traction these days is "exome sequencing". I listened in on a free Science magazine webinar today on the topic, and the presentations were all worthwhile. The focus was on the Nimblegen capture technology (Roche/Nimblegen/454 sponsored the webinar), though other technologies were touched on.

By "exome sequencing" what is generally meant is to capture & sequence the exons in the human genome in order to find variants of interest. Exons have the advantage of being much more interpretable than non-coding sequences; we have some degree of theory (though quite incomplete) which enables prioritizing these variants. The approach also has the advantage of being significantly cheaper at the moment than whole genome sequencing (one speaker estimated $20K per exome). So what's the problem?

My concern is that the terms "exome sequencing" are taken a bit too literally. Now, it is true that these approaches catch a bit of surrounding DNA due to library construction and the targeting approaches cover splice junctions, but what about some of the other important sequences? According to my poll of practitioners of this art, their targets are entirely exons (confession: N=1 for the poll).

I don't have a general theory for analyzing non-coding variants, but conversely there are quite a few well annotated non-coding regions of functional significance. An obvious case are promoters. Annotation of human promoters and enhancers and other transcriptional doodads is an ongoing process, but some have been well characterized. In particular, the promoters for many drug metabolizing enzymes have been scrutinized because these may have significant effects on how much of the enzyme is synthesized and therefore drug metabolism.

Partly coloring my concern is the fact that exome sequencing kits are becoming standardized; at least two are on the market currently. Hence, the design shortcomings of today might influence a lot of studies. Clearly sequencing every last candidate promoter or enhancer would tend to defeat the advantages of exome sequencing, but I believe a reasonable shortlist of important elements could be rapidly identified.

My own professional interest area, cancer genomics, adds some additional twists. At least one major cancer genome effort (at the Broad) is using exome sequencing. On the one hand, it is true that there are relatively few recurrent, focused non-coding alterations documented in cancer. However, few is not none. For example, in lung cancer the c-Met oncogene has been documented to be activated by mutations within an intron; these mutations cause skipping of an exon encoding an inhibitory domain. Some of these alterations are about 50 nucleotides away from the nearest splice junction -- a distance that is likely to result in low or no coverage using the Broad's in solution capture technology (confession #2: I haven't verified this with data from that system).

The drug metabolizing enzyme promoters I mentioned before are a bit greyer for cancer genomics. On the one hand, one is generally primarily interested in what somatic mutations have occurred on the tumor. On the other hand, the norm in cancer genomics is tending towards applying the same approach to normal (cheek swab or lymphocyte) DNA from the patient, and why not get the DME promoters too? After all, these variants may have influenced the activity of therapeutic agents or even development of the disease. Just as some somatic mutations seem to cluster enigmatically with patient characteristics, perhaps some somatic mutations will correlate with germline variants which contributed to disease initiation.

Whatever my worries, they should be time-limited. Exome sequencing products will be under extreme pricing pressure from whole genome sequencing. The $20K cited (probably using 454 sequencing) is already potentially matched by one vendor (Complete Genomics). Now, in general the cost of capture will probably be a relatively small contributor compared to the cost of data generation, so exome sequencing will ride much of the same cost curve as the rest of the industry. But, it probably is $1-3K for whole exome capture due to the multiple chips required and the labor investment (anyone have a better estimate?). If whole mammalian genome sequencing really can be pushed down into the $5K range, then mammalian exome sequencing will not offer a huge cost advantage if any. I'd guess interest in mammalian exome sequencing will peak in a year or two, so maybe I should stop worrying and learn to love the hyb.


Daniel said...

Hey Keith,

At least a few of the more recent exome designs explicitly include promoters, UTRs and conserved non-coding regions; but you're right that the bulk of the interest is currently focused intently on coding regions, largely because variants in exons are a lot easier to interpret.

Most of the major genome centres are scaling up to do thousands (or tens of thousands) of exomes in 2010. I'm guessing pull-down will then be tailing off in favour of whole-genome sequencing by early to mid-2011, at least at the cutting edge.

Still, there's little doubt that exomes will be to 2010 what GWAS were to 2007.

obiwan said...


Is it the "term" or the concept of Exome sequencing that you disagree with? I couldn't agree more that focusing on only exons and avoiding (for example) regulatory regions is a limitation (albeit a reasonable one when balancing costs). In the ORegAnno project we have annotated over 100 regulatory regions that have mutations or polymorphisms with functional significance in disease or other conditions. And, there are many other examples (besides MYC) where alterations of regulatory sequence have implications in cancer. At a minimum, I think exome targeting designs should overlap with introns a little to look for evidence of alternative splicing (e.g., donor/acceptor mutations). But, it seems to me that 'exome sequencing' is an accurate term for what it attempts to do (sequence all exons of a genome). When exome sequencing designs start including non-coding regions (as Daniel alludes to) then I think the term is less accurate. Perhaps just 'targeted genome sequencing'? Hopefully in the not too distant future, reduced sequencing costs will allow us to avoid these compromises.

ps. Keep up the great work on this blog.

Anonymous said...

well, in the era of omics or omes, i am not surprised that exome has come in...soon we may be in for introme or promotome or even utrome (for 5' and 3'UTRs)...Nevertheless, given the importance of exons, i do not mind studying in them in detail esp non-synonymous SNPs (that is what i came across in JCV paper in PLOS GEN, Sarah et al, Nature, 2009.)

My doubt/confusion is as to why sequencing methods pushed forward when SNP chips are available at a cheaper price ? Isn't that SNP chips also can identify snps ? May be they can start with exon snp chips. I do agree that sequencing is superior to chip based techniques in finding SNPs. But expense and availability also matters.

Keith Robison said...

SNP chips simply lack the content to do much beyond GWAS -- the recent papers using exome/genome sequencing for Miller's syndrome, Charcot-Marie-Tooth and other rare diseases simply could not have used SNP chips -- the variants weren't known and therefore could not be on a chip.

Deltas said...

Well! All these are valid points regarding what is the best approach to find real disease causing mutations. Exome sequencing cannot be the gold standard but it is certainly well ahead of what we used to do with manual gel electrophoresis and base by base reading with our own eyes. My question to anyone who has it off his head, is what is the cost of buying such an instrument and what kind of expertise is needed to analyse the results. How long does it take? The 20K dollar cost is the running expense for one sample? Please answer me also at my email:

Eric said... now offers human exome sequencing, the whole package from capture to sequencing to data analysis, for $2,999/sample. That is significantly cheaper than the $20k charged by complete genomics' to do the whole human genome.
Until the whole genome sequencing price reaches $1k/person (there is till long way to go), I think there is still plenty of room for exome sequencing to exist.