Saturday, May 19, 2007

Circling back to sequence

One of the open-access papers in this week's PNAS Early Edition describes a new approach for large scale targeted resequencing. Their 'selector' technology, described previously in an open access Nucleic Acids paper, enables the amplification of specific restriction fragments from a very large population. The target DNA is digested with the restriction enzyme of interest, denatured, and then hybridized to the set of selector probes. Successful binding to a selector probe is then followed by ligation to form a circular DNA structure lacking free ends. Exonuclease treatment destroys all the non-circular DNA, and universal priming sites built into the selector probe allow amplification of all the selectors with a single pair of PCR primers.

In the new paper, the selectors were built to target 10 genes known to be mutated in cancer and the PCR products were explored using 454 sequencing. Six tumor samples were used. In any given sample about 74% of the selectors yielded at least one sequencing read. Mutations detected for p53 (HUGO:TP53) were confirmed by conventional Sanger sequencing.

This is a first report, but it does point to some challenges -- and the promise of this sort of approach. The challenge for resequencing, whether for cancer work or for genotyping important disease loci in a patient (or newborn), is to make sure you find everything you want. 74% sequence coverage is clearly not enough -- what if the allele you are interested would be in that other 26%. Furthermore, particularly for cancer resequencing, the standard was rather generous -- a single read from the amplicon. Even if you fully trusted the sequence data on a single read (yeah, right!), you obviously need at least two to properly assess a germline genotype -- and of course this is really a sampling exercise so you need a lot more to be confident you searched both chromosomes.

For cancer genomics a question of particular interest. A huge effort is rolling forward, with many critics, to sequence cancer-related genes in a large number of tumor samples. The definition of cancer-related is still murky and would depend on the technology used -- it might end up being every gene, but more likely will be a very large set of genes with possible relevance to cancer. For example, one published pilot project (in Science, not yet free) looked at 13K genes; as covered here previously other studies have focused on pharmacologically tractable gene families such as kinases. Yet another post on the subject, with links in need of repair. A key challenge is that every tumor is really a complex community of mutants, and the mutations of interest may be at a very low frequency -- the mutations which are in cancer stem cells or perhaps a rare mutation which later therapy will select for. Treating this as a population genetics problem, your sensitivity to detect a mutation will largely be a function of the number of independent reads which can be generated for a single amplicon from each sample. One interesting, but probably difficult, area of exploration would be to NOT PCR-amplify the circles prior to capturing them on the 454 (or similar next-gen sequencing technology) so that each bead represents a single captured molecule.

No comments: