First, another company has thrown its hat in the next generation ring: Intelligent Bio-Systems. As detailed in GenomeWeb, it's located somewhere here in the Boston area & is licensing technology from Columbia.
The Columbia group last week published a proof-of-concept paper in PNAS (open access option, so free for all!). The technology involves using reversible terminators -- the labeled terminator blocks further extension, but then can be converted into a non-terminator. Such a concept has been around a long time (I'm pretty sure I heard people floating it in the early-90's) & apparently is close to what Solexa is working on, though Solexa (soon to be Illumina) hasn't published their tech. One proposed advantage is that reversible terminators shouldn't have problems with homopolymers (e.g. CCCCCC) whereas methods such as pyrosequencing may -- and the paper contains a figure showing the contrast in traces from pyrosequencing and their method. The company is also claiming they can have a much faster cycle time than other methods. It will be interesting to see if this holds out.
Given the very short reads of many of these technologies, everyone knows they won't work on repeats, right? It's nice to see someone choosing to ignore the conventional wisdom. Granger Sutton, who spearheaded TIGR's & then Celera's assembly efforts, has a paper in Bioinformatics describing an assembler using suffix trees which attempts to assemble the repeats anyway while assuming no errors -- but with a high degree of oversampling that may not be a bad assumption. They report significant success:
We ran the algorithm on simulated
error-free 25-mers from the bacteriophage PhiX174 (Sanger, et al.,
1978), coronavirus SARS TOR2 (Marra, et al., 2003), bacteria
Haemophilus influenzae (Fleischmann, et al., 1995) genomes and
on 40 million 25-mers from the whole-genome shotgun (WGS)
sequence data from the Sargasso sea metagenomics project
(Venter, et al., 2004). Our results indicate that SSAKE could be
used for complete assembly of sequencing targets that are 30 kbp
in length (eg. viral targets) and to cluster millions of identical short
sequences from a complex microbial community.