Wednesday, April 15, 2009

Sequencing's getting so cheap...

Here's a decidedly odd gendanken experiment which illustrates what next-gen sequencing is doing to the ocst.

A common way of deriving the complete sequence of a large clone is shotgun sequencing -- the clone is fragmented randomly into lots of little fragments. With conventional (Sanger) sequencing these fragments are cloned, clones are picked and each clone sequenced. By using a universal primer (or more likely primer pair; one read from each end), a lot of data can be generated cheaply.

If you search online for DNA sequencing, a common advertised cost is $3.50 per Sanger read. This probably doesn't include clone picking or library construction, but we'll ignore that. Read lengths vary, but to keep the math simple lets say we average 500 nucleotide reads, which from my experience is not unreasonable, though very good operations will routinely get longer reads.

So, at that price and read length it's $7.00 per kilobase of raw data. For shotgunning, collecting 10X-20X coverage is quite common and likely to give a reasonable final assembly, though higher is always better. At 10X coverage, that means for each 1Kb of original clone we'll spend $70.00.

Suppose we have an old cosmid -- which is about 50Kb of DNA including the vector. So to shotgun sequence it with Sanger sequencing, if building & picking the library were free, would be around $5200 for 15X coverage. Pretty cheap, right?

Except, for a measly $4700 you can have next gen sequencing of it (and that actually includes library construction costs). 680Mb of next gen sequencing -- or 1172X coverage. Indeed, if you left the E.coli host DNA in you'd still have well in excess of 100X coverage of E.coli plus your cosmid. So if you had multiple cosmids, you could actually get them sequenced for the same price, assuming you can distinguish them at the end (or they just assemble together anyway)!

Sequencing so cheap you can theoretically afford 99% contamination! Yikes!

Of course, it's unlikely you'd really want to be so profligate. Rather than resequence E.coli, you could pack a lot of inserts in. But it does underline why Sanger sequencing is quickly being relegated to a few niches (for example, when you need to screen clones in synthetic biology projects) & the price of used capillary sequencers is reputed to going south of $30K.


RPM said...

If you search online for DNA sequencing, a common advertised cost is $3.50 per Sanger read.That's if you pay by read. It's a lot cheaper if you pay by plate. I think one 96-well plate costs $100-$200. That said, if you're doing de novo transcriptomes, de novo small genomes, or resequencing, paired end Illumina G3 is the way to go (454 is so last year). But if you're doing de novo large genome sequencing (ie, most eukaryotic genomes), you're going to need something else to complement the Illumina sequencing if you want to assemble the genome into good scaffolds.

Keith Robison said...

That's a good point, and it does moderate the costs. We were finding costs around $200/plate, but I didn't do the shopping and lower costs are probably doable with volume or relaxed scheduling.

I'll admit to being a bit too cavalier about the assembly problem, the consequence of not having been in that trench for a very long time. Going from a draft sequence to finished still remains a big hurdle.

RPM said...

Going from draft to finished is a whole other story. From what I can gather, draft genomes are still acceptable in eukaryotic sequencing, but not in prokaryotic sequencing. However, you can still have a well assembled draft genome, with big scaffolds assigned to chromosomes. It'll be a draft assembly because of sequencing gaps.

Rick said...

The assembly problem can not be overlooked. We did a run (SOLiD but the same concern would be for Illumina) of 32 eukaryotic cosmids. They only partially assembled. We did get one full length cosmid and several close to full length. However it seems like the repeat problems in eukaryotes can erase much of the gain of 2nd gen sequencing. Of course that was true with Sanger sequencing as well -- much finishing work was needed.