Tuesday, September 22, 2009

CHI Next-Gen Conference, Day 2

I'll confess that in the morning I took notes on only one talk, but the afternoon got back into gear.

The morning talk was by John Quackenbush over at Dana Farber Cancer Institute and covered a wide range of topics. Some was focused on various database approaches to tracking clinical samples but a lot of the talk was on microarrays. He described a new database his group has curated from the cancer microarray literature called GeneSigDb. He also described some work on inferring networks from such data & how it is very difficult to do with no prior knowledge, but with a little bit of network information entered in a lot of other interactions fall out which are known to be real. He also noted that if you look at the signatures collected in GeneSigDb, most human genes are in at least one -- suggesting either cancer affects a lot of genes (probable) and/or a lot of the microarray studies are noisy (certainly!). I did a similar curation at MLNM (whose results were donated to Science Commons when the group dissolved, though I think it never quite emerged from there) & saw the same pattern. I'd lean heavy on "bad microarray studies", as far too many studies on similar diseases come up with disjoint results, whereas there are a few patterns which show up in far too many results (suggesting, for example, that they are signatures of handling cells not signatures of disease). He also described some cool work initiated in another group but followed-up by his group of looking at trajectories of gene expression during the forced differentiation of a cell line. Using two agents that cause the same final differentiated state (DMSO & all-trans retinoic acid), the trajectories are quite different even with the same final state. Some talk at the end of attractors & such.

In the afternoon I slipped over to the "other conference" -- in theory there are two conferences with some joint sessions & a common vendor/poster area, but in reality there isn't much reason to hew to one or the other & good-sounding talks are split between them. I did, alas, accidentally stick myself with a lunch ticket for a talk on storage -- bleah! But, the afternoon was filled with talks on "next next" generation approaches, and despite (or perhaps because of, as the schedule had been cramped) two cancellations, it was a great session.

All but one of the talks at least mentioned nanopore approaches, which have been thought about for close to two decades now. Most of these had some flavor of science fiction to them in my mind, though I'll freely admit the possibility that this reflects more the limitations of my experience than wild claims by the speakers.

One point of (again, genteel) contention between the speakers was around readout technology, with one camp arguing that electrical methods are the way to go, because that is the most semiconductor-like (there is a bit of a cult worship of the semiconductor industry evident at the meeting). Another faction (well, one speaker) argues that optics is better because it can be more naturally multiplexed. Another speaker had no multiplexing in his talk, but that will be covered below

Based on the cluster of questioners (including myself) afterwards, the NABSys talk by John Oliver had some of the strongest buzz. The speaker showed no data from actual reads and was circumspect about a lot of details, but some important ones emerged (at least for me; perhaps I'm the last to know). Their general scheme is to fragment DNA to ~150Kb (well, that's the plan -- so far they go only to 50Kb) and create 384 such pools of single-stranded DNA. Each pool is probed with a set of short (6-10) oligonucleotide probes. Passing a DNA through a machined pore creates a distinct electrical signal for an aligned probe vs. a single stranded region. You can't tell which probe just rode through, but the claim is that by designing the pools carefully and comparing fingerprints you can infer a complete "map" and ultimately a sequence, with some classes of sequence which can't be resolved completely (such as long simple repeats). While no actual data was shown, in conversation the speaker indicated that they could do physical mapping right now, which, I doubt is a big market but would be scientifically very valuable (and yes, I will get back to my series on physical maps & finish it up soon).

Oliver did have a neat trick for downplaying the existing players. It is his contention that any system that can't generate 10^20 bases per year isn't going to be a serious player in medical genomics. This huge figure is arrived at by multiplying the number of cancer patients in the developed world by 100 samples each and 20X coverage. The claim is that any existing player would need 10^8 sequencers to do this (Illumina is approaching 10^3 and SOLiD 10^2). I'm not sure I buy this argument -- there may be value in collecting so many samples per patient, but good luck doing it! It's also not clear that the marginal gain from the 11th sample is really very much (just to pick an arbitrary number). Shave a factor of 10 off there & increase the current platforms by a factor of 10 and, well, you're down to 10^6 sequencers. Hmm, that's still a lot. Anyway, only if the cost gets down to 10s of dollars could national health systems afford any such extravagance.

Another speaker, Derek Stein of Brown University (whose campus I stumbled on today whilst trying to go from my distant hotel to the conference on foot) gave an interesting talk on trying to marry nanopores to mass spec. The general concept is to run the DNA through the pore, break off each nucleobase on the other side & slurp that into the mass spec for readout. It's pretty amazing -- one one side of the membrane a liquid and the other a vacuum! It's just beginning and a next step is to prove that each nucleotide gives a distinct signal. Of course, one possible benefit of this readout is that covalent epigenetic modifications will probably be directly readable -- unless, of course, the modified base has a mass too close to one of the other bases.

Another nanoporist, Amit Meller at Boston University, is back in the optical camp. The general idea here is for the nanopore to strip off probes from a specially modified template. the probes make a rapid fluorescent flash -- they are "molecular beacons" which are inactive when hybridized to template, become unquenched when the come off but then immediately fold unto themselves and quench again. Meller was the only nanopore artist to actually show a read -- 10nt!!! One quirk of the system is that a cyclic TypeIIS digestion & ligation process is used to substitute each base in the original template with 2 bases to give more room for the beacon probes. He seemed to think read lengths of 900 will be very doable and much longer possible.

One other nanopore talk was from Peiming Zhang at Arizona State, who is tackling the readout problem by having some clever molecular probes to interrogate the DNA after it exits the nanopore. He also touched on sequencing-by-hybridization & using atomic microscopy to try to read DNA.

The one non-nanopore talk is one I'm wrestling with my reaction to it. Xiaohua Huang at UCSC described creating a system that marries some of the best features of 454 with some of the features of the other sequencing-by-synthesis systems. His talk helped crystalize in my mind why 454 has such long read lengths but also is a laggard in density space. He attributed the long reads to the fact that 454 uses natural nucleotides rather than the various reversible terminator schemes. But, since pyrosequencing is real-time you get fast reads but the camera must always watch every bead on the plate. In contrast, the other systems can scan the camera across their flowcells, enabling one camera to image many more targets -- but the terminators don't always reverse successfully. His solution is to use 90% natural nucleotides and 10% labeled nucleotides -- but no terminators. After reading one nucleotide, the labels are stripped (he mentioned photobleaching, photolabile tags and chemical removal as all options he is working with) and the next nucleotide flowed in. It will have the same trouble with long mononucleotide repeats as 454 -- but also should have very long read lengths. He puts 1B beads on his plates -- and has some clever magnetic and electric field approaches to jiggle the beads around so that nearly every well gets a bead. In theory I think you could run his system on the Polonator, but he actually built his own instrument.

If I had to rate the approaches by which is most likely to start generating real sequence data, I'd vote for Huang -- but is that simply because it seems more conservative? NABSys talks like they are close to being able to do physical maps -- but will that be a dangerous detour? Or simply too financially uninteresting to attract their attention? The optically probed nanopores actually showed read data -- but what will the errors look like? Will the template expansion system cause new errors?

One minor peeve: pretty much universally, simulations look too much like real data and need more of a scarlet S on them. On the other hand, I probably should have a scarlet B on my forehead, since I've only once warned someone that I blog. One movie today of DNA traversing a nanopore looked very real, but was mentioned later to be simulated. Various other plots were not explained to be simulations until near the end of the presentation of that slide.

1 comment:

Dave said...

Great Blog. Thanks for sharing your insights. One correction, Xiaohua Huang is from UCSD. GO TRITIONS!