Monday, June 04, 2007

SOLiD-ifying the next generation of sequencing

ABI announced today that it has has started delivering its SOLiD next generation sequencing instruments to early access customers and will take orders from other customers (anyone want to spot me $600K?). SOLiD uses the ligation sequencing scheme developed by George Church and colleagues.

Like most of the current crop of next generation sequencers (that is, those which might see action in the next couple of years), SOLiD utilizes the clonal amplification of DNA on beads.

One interesting twist of the SOLiD system is that every nucleotide is read twice. This should guarantee very high accuracy. Every DNA molecule on a given bead should have exactly the same sequence, but by having such redundancy one can reduce the amount of DNA on each bead -- meaning the beads can be very small.

Bio-IT World has a writeup on next generation sequencing that focuses on SOLiD (free!). They actually cover the wet side a surprising amount for an IT-focused mag, and even have photos of the development instrument. An interesting issue that the article brings up is that each SOLiD run is expected to generate one terabyte of image data. The SOLiD sequencer will come with a 10X dual core Linux cluster sporting 15 terabytes of storage. This is a major cost component of the instrument -- though it is worth noting that the IT side will be on the same spectacular performance/cost curve as the rest of the computer industry -- it's pointed out that 5 years ago such a cluster would be one of the 500 most power supercomputers in the world; in a handful of years I'll probably be requisitioning laptop with similar power.

That still is a lot of data per run, and in contrast the top-line 454 FLX generates only 13 gigabytes of images per run - so there still is an opportunity to develop a 454 trace viewer that runs on a video iPod! A side-effect of this deluge of image data is that ABI is expecting that users will not routinely archive their raw images, but instead let ABI's software digest them to reads and only save the reads. That's an audacious plan, as with the other sequencers and with fluorescent sequencing before that archiving was pretty standard -- at Millennium we had huge amounts of space devoted to raw traces & NCBI and the EBI have enormous trace archives also. The general reason for archiving the traces is that better software might show up later to read the traces better. SOLiD customers will be faced with either ditching that opportunity or paying through the nose for tape backup of each run.

Since a lot of the same labs are early access customers for the same instruments, one can hope that some head-to-head competitions will ensue to look at cost, accuracy and real throughput. ABI is claiming SOLiD will generate over 1 Gigabase per run, and Illumina/Solexa named their sequencer for similar output (the '1G'), whereas Roche/454 is quoted more in the 0.4/0.5Gb /run range. Further evolutionary advances of all the platforms are to be expected. For SOLiD, that will mean packing the beads tighter and minimizing non-productive beads (those with either zero or more than one DNA species). In the Church paper, an interesting performance metric was introduced: bits of sequence read per bit of image generated -- in the paper it was 1/10000 -- and the goal of a 1:1 ratio was proposed.

In any case, the density of data achievable is spectacular -- one of my favorite figures of all time is Figure 3B in the Church paper, which shows uses sequencing data to determine the helical pitch of DNA! The ABI press release mentions using SOLiD to identify nucleosome positioning motifs in C.elegans, and I recently saw an abstract which used 454 to hammer on HIV integration sites to work out their subtle biases. Ultra-deep, ultra-accurate sequencing will generate all sorts of novel biological assays. One can imagine simultaneously screening whole populations for SNPs or going very deep within a tumor genome for variants. Time to pull up a chair, grab a favorite beverage, and watch the fireworks!


Kevin McKernan said...

Just wanted to correct a point in the article which many people are confused by due to AB's failure to adequately explain our data policies with SOLiD.
Currently AB 3730s do not archive the EPS or image files. What you saved at MLNM were the trace files extracted from the EPS files. With SOLiD we still deliver the trace files and the image files but only archive the trace files not the equivalent EPS files or CCD dumps as these amount to 6Tb of space and currently this amount of Disk space is more expensive than the reagents to re-run it.
The cheapest way to store DNA data is now on beads in a tube or on a slide with a reader as the hard drives are still too expensive.

Keith Robison said...

Thank you for the clarification (for those of you not aware, Dr. McKernan is the leader of the SOLiD development team)

sm_is_bakc said...

Hi Keith !
Great blog... made for some excellent reading

Actually, had a request as well ... any good reviews on the current high throughput sequencing tech. +s and -s, ins and outs, pros and cons, data types, etc. ?

Thanks ...

Kstevens said...

We (Eureka Genomics) our already in the multi-terabyte range with the Illumina GAII and software upgrade. We are currently dumping our data into a 20TB storage array. What type of advantages does the SOLid system show? I am just the technician so sorry for the question.

Keith Robison said...

SOLiD and Illumina generally seem to be in a neck-and-neck race. Claims I have heard in favor of SOLiD are (1) higher accuracy due to colorspace encoding and (2) lower cost/bp overall. Someone with experience with both systems would need to comment on the validity of these