Nature Genetics on Friday released the paper from Jay Shendure, Debra Nickerson and colleagues which used targeted sequencing to identify the damaged gene in a rare Mendelian disorder, Miller syndrome. The work had been presented at least in part at recent meetings, but now all of us can digest it in entirety.
The impressive economy of this paper is that they targeted (using Agilent chips) less than 30Mb of the human genome, which is less than 1%. They also worked with very few samples; only about 30 cases of Miller Syndrome have been reported in the literature. While I've expressed some reservations about "exome sequencing", this paper does illustrate why it can be very cost effective and my objections (perhaps not made clear enough before) is more a worry about being too restricted to "exomes" and less about targeting.
Only four affected individuals (two siblings and two individuals unrelated to anyone else in the study) were sequenced, each at around 40X coverage of the targeted regions. Since Miller is so vanishingly rare, the causative mutations should be absent from samples of human diversity such as dbSNP or the HapMap, so these was used as a filter. Non-synonymous (protein-altering), splice site mutations & coding indels were considered as candidates. Both dominant models and recessive models were considered. Combining the data from both siblings, 228 candidate dominant genes and 9 recessive ones fell out. Looking then to the unrelated individuals zeroed in on a single gene, DHODH, under the recessive model (but 8 in the dominant model). Using a conservative statistical model, the odds of finding this by chance were estimated at 1.5x10e-05.
An interesting curve was thrown by nature. If predictions were made as to whether mutations would be damaging, then DHODH was excluded as a candidate gene under a recessive model. Both siblings carried one allele (G605A) predicted to be neutral but another allele predicted to be damaging.
Another interesting curve is a second gene, DNAH5, which was a candidate considering only the siblings' data but ruled out by the other two individuals' data. However, this gene is already known to be linked to a Mendelian disorder. The two siblings had a number of symptoms which do not fit with any other Miller case -- and well fit the symptoms of DNAH5 mutation. So these two individuals have two rare genetic diseases!
Getting back to DHODH, is it the culprit in Miller? Sequencing three further unrelated patients found them all to be compound heterzygotes for mutations predicted to be damaging. So it becomes reasonable to infer that a false prediction of non-damaging was made for G605A. Sequencing of DHODH in parents of the affected individuals confirmed that each was a carrier, ruling out DHODH as a causative gene under a dominant model.
DHODH is known to encode dihydroorotate dehydrogenase, which catalyzes a biochemical step in the de novo synthesis of pyrimidines. This is a pathway targeted in some cancer chemotherapies, with the unfortunate result that some individuals are exposed to these drugs in utero -- and these persons manifest symptoms similar to Miller syndrome. Furthermore, another genetic disease (Nagler) has great overlap in symptoms with Miller -- but sequencing of DHODH in 12 unrelated patients failed to find any coding mutations in DHODH.
The authors point to the possible impact of this approach. They note that there are 7,000 diseases which affect fewer than 200K patients in the U.S. (a widely used definition of rare disease), but in aggregate this is more than 25M persons. Identifying the underlying mutations for a large fraction of these diseases would advance our understanding of human biology greatly, and with a bit of luck some of these mutations will suggest practical therapeutic or dietary approaches which can ameliorate the disease.
Despite the success here, they also underline opportunities for improvement. First, in some cases variant calling was difficult due to poor coverage in repeated regions. Conversely, some copy number variation manifested itself in false positive calls of variation. Second, the SNP databases for filtering will be most useful if they are derived from similar populations; if studying patients with a background poorly represented in dbSNP or HapMap then those databases won't do.
How economical a strategy would this be? Whole exome sequencing on this scale can be purchased for a bit under $20K/individual; to try to do this by Sanger would probably be at least 25X that. So whole exome sequencing of the 4 original individuals would be less than $100K for sequencing (but clearly a bunch more for interpretation, sample collection, etc). The follow-up sequencing would a add a bit, but probably less than one exome's worth of sequencing. Even if a study turned up a lot of candidate variants, smaller scale targeted sequencing can be had for $5K or less per sample. Digging into the methods, the study actually used two passes of array capture -- the second to clean up what wasn't captured well by the first array design & to add newer gene predictions. This is a great opportunity to learn from these projects -- the array designs can keep being refined to provide even coverage across the targeted genes. And, of course, as the cost per base of the sequencing portion continues its downwards slide this will get even more attractive -- or possibly simply be displaced by really cheap whole genome sequencing. If the cost of the exome sequencing can be approximately halved, then perhaps a project similar to this could be run for around $100K.
So, if 700 diseases could each be examined at 100K/disease, that would come out to $70M -- hardly chump change. This underlines the huge utility of getting sequencing costs down another order of magnitude. At $1000/genome, the sequencing costs of the project would stop grossly overshadowing the other key areas - sample collection & data interpretation. If the total cost of such a project could be brought down closer to $20K, then now we're looking at $14M to investigate all described rare genetic disorders. That's not to say it shouldn't be done at $70M or even several times that, but ideally some of the money saved by cheaper sequencing could go to elucidating the biology of the causative alleles such a campaign would unearth, because certainly many of them will be much more enigmatic than DHODH.
Sarah B. Ng, Kati J. Buckingham, Choli Lee, Abigail W. Bigham, Holly K. Tabor, Karin M. Dent, Chad D. Huff, Paul T. Shannon, Ethylin Wang Jabs, Deborah A. Nickerson, Jay Shendure, & Michael J. Bamshad (2009). Exome sequencing identifies the cause of a mendelian disorder Nature genetics : doi:10.1038/ng.499