Amongst the news last week is a bit of a surprise: the salmon genome project is choosing Sanger sequencing for the first phase of the project. Alas, one needs a premium subscription to In Sequence, which I lack, so I can't read the full article. But, the group has published (open access) a pilot study on some BACs, which concluded that 454 sequencing couldn't resolve a bunch of the sequence, and so shorter read technologies are presumably ruled out as well. A goal of the project is a high quality reference sequence to serve as a benchmark for related fish, demanding very high quality.
This announcement is a jolt for anyone who has concluded that Sanger has been largely put to pasture, confined to niches such as verifying clones and low-throughput projects. Despite the gaudy throughput of the next-gen sequencers, read length remains a problem. However, that hasn't stopped de novo assembly projects such as panda from apparently proceeding forward. Apparently salmon is even nastier when it comes to repeats.
Still playing the armchair next-gen sequencer (for the moment!), it is an interesting gedanken experiment. Suppose you had a rough genome you really, really wanted to sequence and get a high-quality reference sequence. On the one hand, Sanger sequencing is very well proven. However, it is also more expensive per base than the newer technologies. Furthermore, Sanger is pretty much a mature technology, with little investment in further improvement. This is in contrast to next gen platforms, which are being pushed harder and harder both by the manufacturers as well as the more adventurous users. This includes novel sequencing protocols to address difficult DNA, such as the recently published Long March technique (which I'm still fully wrapping my head around) that generates nested libraries for next-gen sequencing using a serial Type IIS digestion scheme. Complete Genomics has some trick for inserting multiple priming sites per circular DNA template. Plus, Pacific Biosciences has demonstrated really long reads in a next gen platform -- but demonstrating is different than having it in production.
So it boils down to the key question: do you spend your resources on the tried-and-true, but potentially pricey approach or try to bet that emerging techniques and technologies can deliver the goods soon enough. Put another way, how critical is a high quality reference sequence? Perhaps it would be better to generate very piecemeal drafts of multiple species now and then go for finishing the genomes when the new technologies come on line. But what experiments dependent on that high quality reference would be put off a few years? And what if the new technologies don't deliver, in which case you must fall back on Sanger and be quite a bit behind schedule.
It's not an easy call. Will salmon be the last Sanger genome? It all depends on whether the new approaches and platforms can really deliver -- and someone is daring enough to try them on a really challenging genome.