Monday, June 07, 2010

What's Gnu in Sequencing?

The latest company to make a splashy debut is GnuBio, a startup out of Harvard which gave a presentation at the recent Personal Genomics conference here in Boston. Today's Globe Business section had a piece, Bio-IT World covered it & also Technology Review. Last week Mass High Tech & In Sequence (subscription required) each a bit too.

GnuBio has some grand plans, which are really in two areas. For me the more interesting one is the instrument. The claim which they are making, with a attestation of plausibility from George Church (who is on their SAB, as is the case with about half of the sequencing instrument companies), is that a 30X human genome will be $30 in reagents on a $50K machine (library construction costs omitted, as is unfortunately routine in this business). The key technology from what I've heard is the microfluidic manipulation of sequencing reactions in picoliter droplets. This is similar to RainDance, which has commercialized technology out of the same group. The description I heard from someone who attended the conference is that GnuBio is planning to perform cyclic sequencing by synthesis within the droplets; this will allow miniscule reagent consumption and therefore low costs.

It's audacious & if they really can change out reactions within the picoliter droplets, technically it is quite a feat. From my imagination springs a vision of droplets running a racetrack, alternately getting reagents and being optically scanned for which base came next & an optical barcode on each droplet. I haven't seen this description, but I think it fits within what I have heard.

Atop those claims comes another one: despite having not yet read a base with the system, by year end two partners will have beta systems. It will be amazing to get proof-of-concept sequencing, let alone have an instrument shippable to a beta customer (this also assumes serious funding, which apparently they haven't yet found. Furthermore, it would be stunning to get reads long enough to do any useful human genome sequencing even after the machine starts making reads, let alone enough for 30X coverage.

The Technology Review article, a journal I once read regularly and had significant respect for, is a depressingly full of sloppy journalism & failure to understand the topic. One paragraph has two doozies
Because the droplets are so small, they require much smaller volumes of the chemicals used in the sequencing reaction than do current technologies. These reagents comprise the major cost of sequencing, and most estimates of the cost to sequence a human genome with a particular technology are calculated using the cost of the chemicals. Based solely on reagents, Weitz estimates that they will be able to sequence a human genome 30 times for $30. (Because sequencing is prone to errors, scientist must sequence a number of times to generate an accurate read.)

The first problem here is that yes, the reagents are currently the dominant cost. But, if library construction costs are somewhere in the $200-500 range, then after you drop reagents greatly below that cost then it's a bit dishonest to tout (and poor journalism to repeat) a $30/human genome figure. Now, perhaps they have a library prep trick up their sleeve or perhaps they can somehow go with a Helicos-style "look Ma no library construction" scheme. Since they have apparently not settled on a chemistry (which will also almost certainly impose technology licensing costs -- or developing a brand new chemistry -- or getting the Polonator chemistry, which is touted as license-free), anything is possible -- but I'd generally bet this will be a clonal sequencing scheme requiring in-droplet PCR. The second whopper there is the claim that the 30X coverage is needed for error detection. It certainly doesn't hurt, but even with perfect reads you still need to oversample just to have good odds of seeing both alleles in a diploid genome.

Just a little alter in the story is the claim "The current cost to sequence a human genome is just a few thousand dollars, though companies that perform the service charge $20,000 to $48,000", which confuses what one company (Complete Genomics) may have achieved with what all companies can achieve.

The other half of the business plan I find even less appealing. They are planning to offer anyone a deal: pay your own way or let us do it, but if we do it we get full use of the data after some time period. The thought is that by building a huge database of annotated sequence samples, a business around biomarker discovery can be built. This business plan has of course been tried multiple times (Incyte, GeneLogic, etc.) and has worked in the past.

Personally, I think whomever is buying into this plan is deluding themselves in a huge way. First, while some of the articles seem to be confident this scheme won't violate the consent agreements on samples, it's a huge step from letting one institution work with a sample to letting a huge consortium get full access to potentially deidentifying data. Second, without good annotation the sequence is utterly worthless for biomarker discovery; even with great annotation randomly collected data is going to be challenging to convert into something useful. Plus, any large scale distribution of such data will butt up against the widely accepted provision that subjects (or their heirs) can withdraw consent at any time.

The dream gets (in my opinion) just daffier beyond that -- subjects will be able to be in a social network which will notify them when their samples are used for studies. Yes, that might be something that will appeal to a few donors, but will it really push someone from not donating to donating? It's going to be expensive to set up & potentially a privacy leakage mechanism. In any case, it's very hard to see how that is going to bring in more cash.

My personal advice to the company is many-fold. First, ditch all those crazy plans around forming a biomarker discovery effort; focus on building a good tech (and probably selling it to an established player). Second, focus on RNA-Seq as your initial application -- this is far less demanding in terms of read length & will allow you to start selling instruments (or at least generating data) much sooner, giving you credibility. Of course, without some huge drops the cost of library construction will be dwarfing that $30 in reagent, perhaps by a factor of 10X. A clever solution there using the same picodroplet technology will be needed to really get the cost of a genome to low levels -- and could be cross-sold to the other platforms (and again, perhaps a source of a revenue stream while you work out the bugs in the sequencing scheme).

Finally, if you really can do an RNA-Seq run for $30 a run in total operating costs, could you drop an instrument by my shop?


Keith Grimaldi said...

Best review of GnuBIO I've seen yet, thanks. Interesting point about the RNA sequencing. The proposed machine price is pretty low and I also think that there are several applications outside of whole genome sequencing.

For the time being (and probably for quite a while) from a personalised medicine, nutrition, whatever, point of view small panels of SNPs are what we need to be able to genotype quickly and cheaply. The technology for small panels is still in the stone age (i.e. stuck in 4-5 yrs ago when development of WGS went full steam) meaning that a small panel of 20 or so SNPs is not much cheaper than a whole genome scan and takes just a long. Also the panels need to accomodate indels and repeats, which the SNP scanning services don't.

Anonymous said...

Current reagent costs are only so high because a huge markup is applied. That's required to recoup development costs and in some cases subsidized instruments....

Paul Morrison said...

Excellent review of the gnu technology. I wish all dollar calculations contained the library construction especially the FTE. Having used the "Helicos-style "look Ma no library construction" scheme" for a year now there is no way in god's green earth I am buying a sequencer that forces me back to the stone age of library amplification.

OK, so the Helicos is a tad slow, high error rate and the reads are short but that makes it the cat's meow for ChIP-Seq. For everything else I am still waiting. Oxford Nanopore, Visigen? Someone will get single molecule right. (I agree with your post on SeqAnswers that PacBio might have the same problems as Helicos at least in the error rate.)

Jagganatha said...

how true....Current reagent costs are only so high because a huge markup is applied

The technology for small panels is still in the stone age (i.e. stuck in 4-5 yrs ago when development of WGS went full steam) meaning that a small panel of 20
Повторите попытку
or so SNPs is not much cheaper than a whole genome scan and takes just a long. Also the panels need to accomodate indels and repeats, which the SNP scanning services don't.

Tuesday, June 08, 2010 5:37:00 AM

Anonymous said...

Just wanted to add, they have a press release out 12.14.2011 claiming they are now ready to build beta systems.

Still really no official word on how everything technically works, but oh well, we will see......

However, they seem to have integrated library production and everything. They claim to have a "insert genomic DNA and get results in 2 hours" system......

If this is true, it is THE system for the typical clinical chemistry lab in patient diagnostics.....