Thursday, February 25, 2010

Personalized Annoyance of Research Enthusiast (PARE)

Last night I finally got my paws on a paper which started out on a frustrating tack. Last week, a flurry of news items heralded a new approach from Vogelstein's group at Johns Hopkins that involved second generation sequencing of patient tumor samples. But, the early reports claimed it had been published in Science Translational Medicine, whereas it most certainly wasn't there except a suggestive teaser about the next week's issue. I thought perhaps someone had really blown it and ignored an embargo, but then it turned out the AAAS meeting is going on and the work was presented there. Few things more irritating than a paper being bandied about that I can't get my eyes on! Plus, I have a manuscript due next week that this might be relevant to, so the desire to get a copy was intense!

Yesterday, it really did come out. You'll need a subscription to read it -- though that is only $50 for online access if you already have a Science personal subscription. The gist of the paper showed up in the reports. Using SOLiD, they sequenced cancer genomes around 1X coverage using 1.5Kb mate-paired libraries using 25 long reads. For copy number analysis they also used single end reads. The key point is to identify rearrangements using the mate-paired fragments.

Now, many papers have looked at rearrangements in cancer using mate paired or paired end strategies. What sets this paper apart is doing something with it: turning these into patient specific tumor markers (an approach they call PARE for personalized analysis of rearranged ends). Because rearrangements are specific to the tumor and not at all like what is in the patient's normal DNA, they make great PCR amplicons for finding the tumor. Indeed, they were able to detect tumor DNA in blood with their assays.

This is an example of second generation sequencing getting very close to the clinic. But what will it take to get it there? Many of the news items claimed the cost might be soon down around $3K. Now, to do this properly you really need to either do the sequencing on both normal and tumor DNA or make a bunch of assays and expect some to be duds. Why? Because some of these structural changes will either be alignment noise or private germline structural variants. They do use copy-number analysis to filter the list -- many tumor rearrangments will be associated with local copy number amplification. But more importantly, the cost numbers sound suspiciously like reagent-only cost, not fully-loaded. Fully loaded costs include the ~$1.5M sequencing center (SOLiD + prep gear + compute farm), real estate & salaries. These could easily double or triple that cost, though someone who actually owns a green eyeshade should figure that out for sure.

The paper talks a little bit about the risk that as a tumor evolves one of these markers might be lost. This is particularly the case here because, unlike many papers, they really aren't worried if the rearrangement is driving the tumor. It's a handy landmark, though you would find driving rearrangements with it too. But, one particular worry is that a given rearrangement might not be in the dominant clone or a clone which treatment selects for survival. So having multiple markers will be a useful protection -- though that will up costs.

But back to irritating: a key value left out of this paper (and unfortunately most such papers) is the amount of input DNA for sequencing. Many of these sorts of protocols start with 5-10 micrograms of DNA, though some mate-pair schemes call for 5 to 10 times that. For some tumor types, that's a kings's ransom -- particularly for recurrent tumors or inoperable ones. Even beyond that, large scale application of this approach will require automating the library construction process end-to-end.

It's also worth noting that this is an application where absolute speed isn't critical . For generating a marker to be used for long-term following of the tumor, needing two weeks for SOLiD library prep & assembly and another few weeks to develop the PCR assays won't be a major roadblock. But, any sequencing-based approach used to determine treatment strategy needs to turn around results in not much more than 1-2 days. That's a high hurdle, and a wide open spot for fast sequencing technologies such as 454, PacBio, nanopores & Ion Torrent.

This is also an approach where someone with a long but noisy sequencing technology should take a hard look. Calling rearrangements with very long reads shouldn't require nearly the level of accuracy as calling point mutations.
Leary, R., Kinde, I., Diehl, F., Schmidt, K., Clouser, C., Duncan, C., Antipova, A., Lee, C., McKernan, K., De La Vega, F., Kinzler, K., Vogelstein, B., Diaz, L., & Velculescu, V. (2010). Development of Personalized Tumor Biomarkers Using Massively Parallel Sequencing Science Translational Medicine, 2 (20), 20-20 DOI: 10.1126/scitranslmed.3000702

1 comment:

Stephen Henderson said...

spot on analysis. Even gene expression diagnosis is rare as it is lengthy and expensive and has to be interpreted by bioinformatics people usually without medical degrees (which is bitterly resented by pathologists).

The point about publishing the input DNA (or RNA) is also a sore point of mine. How the hell do referees let them leave that out? I can contact them and ask or spend several thousand pounds and weeks of work trying it out for myself with precious samples.

If you publish something I should be able to repeat it from the paper!