Tuesday, March 08, 2011

What will be the last Sanger genome?

Even when I was finishing up as a graduate student, and only a few bacterial genomes had been published, one would periodically hear open speculation as to when the top journals would quit accepting genome sequencing papers. The thought was that once the novelty wore off, a genome would need to be increasingly difficult or have some very odd biology to keep getting in Science or Nature or such.

Happily, that still hasn't happened and genome sequencing papers still show up in the whole range of journals. I don't claim I scan every one, but I do try to poke around in a lot of the eukaryotic papers (I long since gave up on bacterial; happily they have become essentially uncountable). Two recent genomes in major journals, Daphnia (water flea) in Science and Pongo (orangutan, not dalmatian!) in Nature show that the limit has not yet been reached. These papers share another thread: both genomes were sequenced using fluorescent capillary Sanger sequencing.

Sanger, of course, was the backbone of genome projects until only very recently. Even in the last few years, only a few large genomes have been initially published using second generation technologies , though these methods have become the norm for resequencing projects. Indeed, this is reflected in the orangutan project in which several additional individuals were sequenced using Illumina technology.

Sanger sequencing has two attractive qualities, especially when sequencing from cloned material: long reads and high quality. But, not only is the sequencing itself comparatively expensive but one needs a host of accessories to keep a high throughput Sanger operation going: liquid handling robots, incubators, clone pickers, etc. While there are some useful accessories for a second generation sequencer, these all take up a tiny fraction of the space of what is needed for Sanger. Plus, all of these require skilled labor. As a result, many genome centers have wound down their investment in Sanger sequencing.

Still, there must be a pipeline of projects completed but still being analyzed and written up. Even after all of the pure Sanger projects are complete, some with a substantial contribution of Sanger will persist. But the writing is very much on the wall, particularly with a number of recent advances which suggest solutions to the assembly difficulties which have dogged second generation de novo sequencing. Thoughts on that are sketched in another entry, which I really need to finish.

Certainly in turn other sequencing technologies will peak and then fade for de novo genome sequencing. The 454 technology featured prominently in the recent ant genome papers, but must already be sliding as a favored approach due to the far higher cost than other second generation sequencing systems. Illumina will be king of the hill for quite a while, until some other platform significantly beats it on cost. PacBio is likely to be a favorite for providing long connecting reads to improve short read assemblies, but without a huge increase in accuracy and cost per base are unlikely to find much favor as a sole approach.

It will probably not be obvious when the last Sanger genome is published and so there may be no fanfare. It may just be the case that it is realized long after that some particular paper is the last of the generation. Dideoxy electrophoretic sequencing had a remarkable run and can be retired with great honors.


Anonymous said...

I think that 454 sequencing will still be favored for bacterial and archaeal genomes for some time. THe long reads make for much easier assembly than Illumina or SOLiD reads, and for de novo short genomes, the analysis cost exceeds the sequencing cost by a lot.

Unknown said...

For complete genome sequencing, indeed, CE/Sanger sequencing is a poor choice. But for every genome sequenced there are many hundreds of hypotheses generated and they need to be followed up with accurate, long-read, single template sequencing. There is still no better technology out there for that. CE/Sanger sequencing will increase rather than decrease, at least for the next several years.

Anonymous said...

I have to agree, in the last month I have been meeting sequencing centers and molecular diagnostic lab. They give me a weird look when we start talking about GC challenges, homopolymer read through and error rates. Designing amplicon enrichment strategies over short strand is not always simple in highly mutated genomes such as viruses. I can see a lot of labs are going to realise that they are gonna have to clean things out with Sanger. Unless that super dupper Pac Bio dream machine junior with 0,1% error rate at 50K comes out...

Anonymous said...

Sanger will still be with us for quite a while - there will always be a need for real reference genomes that can't be bootstrapped with short reads. Will you know when the last one is sequenced? Probably not, as Sanger has taken on a kind of support role for complex genomes where it provides the backbone which is filled in with 454 or illumina reads.