Fellow blogger, colleague and science history buff Ash pointed out to me recently that Randy Shilt's And The Band Played On for Kindle was on sale. I hadn't read the book, nor seen the miniseries, so I snapped up a copy. It's a good read -- though at times a hard one - I don't believe I've ever read another work of non-fiction where such a high fraction of the named individuals are dead by the end of the book
A computational biologist's personal views on new technologies & publications on genomics & proteomics and their impact on drug discovery
Thursday, March 31, 2016
Wednesday, March 30, 2016
Who Wants To Write A Review Article?
Yes, this is a solicitation. I'm on the Editorial Board of the journal Briefings in Bioinformatics,. I'm looking for authors who would like to write high-quality, compact reviews. If you are interested, or you want a little back-story, then keep reading.
Tuesday, March 29, 2016
At the Edge of The Cloud
I've used cloud computing at Amazon Web Services (AWS) off-and-on now for over five years. The cloud has all sorts of handy advantages -- flexible access to large amounts of compute, inexpensive access to any flavor of Linux you wish, the ability to guiltlessly kill a huge server you just fatally cratered with the wrong command. And until now, I''ve always been able to find machines that fit my needs -- perhaps sometimes just fitting or with a bit of compromise But, now I've hit the wall: nobody at this time offers a really serious cloud machine with 500Gb of RAM.
Friday, March 25, 2016
Selective sequencing: A Programming Opportunity!
I ask a bit of indulgence from my regular readership for this piece, as I am going to explain a number of things in depth that probably will be very familiar to them. My hope, perhaps fantastic, is that this piece will get out to some who are not so familiar with such topics, as I think the problem at hand might be very fascinating.
Friday, March 18, 2016
PacBio's big splash
[18 March 2016 -- my original inclusion of the Pac Bio marketing image 6 years ago was claimed to be a DCMA violation -- I've simply removed it, though I do think this would fall under fair use ]
The Pacific Biosciences instrument is officially unveiled now, with those lucky/smart (or SMRT?) enough to go to Marco Island filling in all of us not in that position. Sounds like a great lot of hoopla, though they didn't drag the Hornet for the splashdown.
First of all, it's a beast. "In this corner, weighing in a nearly an imperial ton...". Too bad their marketing picture has nothing good for judging the scale --
it's apparently 6.5 feet wide.
Kevin Davies at Bio-IT World has a wonderfully detailed article and there is a lot of nuggets in the Twitter feed. Anthony Fejes has two different sets of notes out -- one from a workshop and one from another speaker; Dan Koboldt has some good notes too (and if I haven't shouted out your notes, it's probably because I'm oblivious -- leave me a comment pointing to them). There was also a little bit of PacBio science in Elaine Mardis' talk (she's on their SAB) -- Anthony's notes & the twitter feed.
Okay, besides worrying about the capacity of floors & freight elevators, what's new? Well, not much on error rates from PacBio (apparently in the Q&A their presenter executed a jig, tango, waltz & rumba when asked) -- though the Mardis talk described resequencing samples of PacBio that had been done before by Illumina -- and the results are quite good. Another important note is that their system doesn't seem to have much bias in terms of composition -- bias against hi/lo %GC has been noted in all of the amplification-based systems and can be a serious problem.
There's also a lot of talk about being able to distinguish various modified bases by their effects on polymerase kinetics. PacBio has also demonstrated direct RNA sequencing (substituting a reverse transcriptase for DNA polymerase) and is talking about watching proteins being made. I haven't quite figured out why you'd want to do that last one, but presumably it's for more than a cool Nature cover.
Read lengths decay exponentially -- but with lots around 1Kb and quite a few around 5K. The big problem is apparently oxidative damage to the polymerase triggered by the laser -- so they are working on both getting the oxygen out of the system and engineering hardier polymerases (the sort of biz I used to be in). Their strobe sequencing mode -- in which the laser is turned off to enable elongation in safe darkness -- enables multiple reads separated by long gaps.
The instrument definitely raises the bar on sample prep -- it's apparently entirely automated within the monster. YEAH! A machine I can delude myself into thinking I could run it! One drawer takes the SMRT cells and another the DNA samples -- 500 ng of each. That doesn't sound like much (it's at least better than the 5-10ug most library prep protocols call for -- except the ones looking for 20-30ug), but it seems you don't get a lot from each sample.
The number of reads per cell isn't huge -- but you're still getting about 2 E.coli genome equivalents by my calculation. This is a bit undersized for a lot of applications -- but grand from many others. Mardis' talk discussed using PacBio for sequencing PCR amplified resequencing samples -- this would appear to be right in the PacBio sweet spot. Perhaps a few hundred long PCR products could be packed into one SMRT run and still get many hundreds of reads per sample -- well, maybe pack fewer amplicons.
What might be other good uses? Clearly metagenomics and similar. I just saw a posting on a professional board of someone pondering multiplexing hundreds of samples for an Illumina run (the current barcode schemes are for a few orders of magnitude fewer samples). Blitzing each sample through the PacBio instrument would seem to be obvious -- if the error rates are acceptable. Folks doing whole genome sequencing of small genomes will love having PacBio to generate scaffolds. For bigger genomes, it may just still be too expensive to get much coverage ($100 a SMRT cell sounds cheap, until you start multiplying that out for the numbers you need) -- but perhaps not (much too fried to do that calculation at the moment).
RNA-Seq might be a bit trickier. If you need 500ng of input material, that's an awful lot of ribosome-depleted or poly-A RNA. Plus, getting only tens of thousands of reads, making it hard to see lowly-expressed messages -- but very long ones, perhaps priceless. But, if you can get tons of RNA, then 100 SMRT cells would be about $10K and offer similar depth to what you can get today with Illumina but with those super long reads.
Now, who is this going to crimp the most? The instrument is clearly a ways from really threatening Illumina & SOLiD for the large genome market. 454 is a likely candidate to see growth pressured -- though between the new lower-priced "junior" and both PacBio's $700K price tag and their inability to flood the market with instruments, this will be ameliorated.
PacBio might have almost as much effect on the surrounding sequencing ecosystem. Making library prep reagents for this system is not going to make you lots of money! But, there will be a serious niche for targeted sequencing -- though with the scale it will probably require some rethought. Stuffing the whole exome into this doesn't really make sense -- if there are ~250K segments of the genome to read & you want 40X coverage of each, that's a lot of SMRT cells. But, intelligently chosen gene sets totaling about 500 regions (or around 20-50 genes) with pre-validated reagents -- now that might be a market (though one which might have 1-2 years of life -- better get cracking!). Simpler library prep will also go nicely with some of the enrichment systems -- a bugaboo of hybridization systems can be "daisy-chaining" of fragments via the amplification adapters -- but, on the other hand you don't get 500ng off an array or in-solution system without amplification. As with many disruptive technologies, it won't fit a lot of bills but will nibble off various parts of the business that are individually small but significant in aggregate. As noted above, RNA-Seq might be an initial success story for PacBio -- when RNA is abundant.
IMHO, PacBio does need to get some papers out on applications (Mardis' group apparently is close to having one) and make sure that the next tranche of installations not only includes the Sanger & BGI, but that there are also some core labs or commercial providers. Also, they need to start pumping data into the public domain -- while they signed a bunch of commercial software providers up, it is definitely out of academia that you find the most radical advances. Plus, there are a lot of now well-entrenched open source tools that need to be tested with the new kid. Even simple things like the semi-standard SAM/BAM format are going to need tweaking -- SAM/BAM stores all sorts of information on read pairs, and the strobe sequencing can generate many more than 2 tags per DNA fragment.
Of course, we have to wait another half day plus to find out what Ion Torrent is really delivering. That could really shake up the landscape -- at least the mental one.
A huge thanks to all the bloggers & twitterers for pouring out so much information. I'm still getting used to scanning past the retweets (is there a way to condense them) and there is the occasional shock-to-the-system (how could anyone in the field not have heard of Rodger Staden?!?), but that's a tiny price to pay for such fascinating stuff.
The Pacific Biosciences instrument is officially unveiled now, with those lucky/smart (or SMRT?) enough to go to Marco Island filling in all of us not in that position. Sounds like a great lot of hoopla, though they didn't drag the Hornet for the splashdown.
First of all, it's a beast. "In this corner, weighing in a nearly an imperial ton...". Too bad their marketing picture has nothing good for judging the scale --
it's apparently 6.5 feet wide.
Kevin Davies at Bio-IT World has a wonderfully detailed article and there is a lot of nuggets in the Twitter feed. Anthony Fejes has two different sets of notes out -- one from a workshop and one from another speaker; Dan Koboldt has some good notes too (and if I haven't shouted out your notes, it's probably because I'm oblivious -- leave me a comment pointing to them). There was also a little bit of PacBio science in Elaine Mardis' talk (she's on their SAB) -- Anthony's notes & the twitter feed.
Okay, besides worrying about the capacity of floors & freight elevators, what's new? Well, not much on error rates from PacBio (apparently in the Q&A their presenter executed a jig, tango, waltz & rumba when asked) -- though the Mardis talk described resequencing samples of PacBio that had been done before by Illumina -- and the results are quite good. Another important note is that their system doesn't seem to have much bias in terms of composition -- bias against hi/lo %GC has been noted in all of the amplification-based systems and can be a serious problem.
There's also a lot of talk about being able to distinguish various modified bases by their effects on polymerase kinetics. PacBio has also demonstrated direct RNA sequencing (substituting a reverse transcriptase for DNA polymerase) and is talking about watching proteins being made. I haven't quite figured out why you'd want to do that last one, but presumably it's for more than a cool Nature cover.
Read lengths decay exponentially -- but with lots around 1Kb and quite a few around 5K. The big problem is apparently oxidative damage to the polymerase triggered by the laser -- so they are working on both getting the oxygen out of the system and engineering hardier polymerases (the sort of biz I used to be in). Their strobe sequencing mode -- in which the laser is turned off to enable elongation in safe darkness -- enables multiple reads separated by long gaps.
The instrument definitely raises the bar on sample prep -- it's apparently entirely automated within the monster. YEAH! A machine I can delude myself into thinking I could run it! One drawer takes the SMRT cells and another the DNA samples -- 500 ng of each. That doesn't sound like much (it's at least better than the 5-10ug most library prep protocols call for -- except the ones looking for 20-30ug), but it seems you don't get a lot from each sample.
The number of reads per cell isn't huge -- but you're still getting about 2 E.coli genome equivalents by my calculation. This is a bit undersized for a lot of applications -- but grand from many others. Mardis' talk discussed using PacBio for sequencing PCR amplified resequencing samples -- this would appear to be right in the PacBio sweet spot. Perhaps a few hundred long PCR products could be packed into one SMRT run and still get many hundreds of reads per sample -- well, maybe pack fewer amplicons.
What might be other good uses? Clearly metagenomics and similar. I just saw a posting on a professional board of someone pondering multiplexing hundreds of samples for an Illumina run (the current barcode schemes are for a few orders of magnitude fewer samples). Blitzing each sample through the PacBio instrument would seem to be obvious -- if the error rates are acceptable. Folks doing whole genome sequencing of small genomes will love having PacBio to generate scaffolds. For bigger genomes, it may just still be too expensive to get much coverage ($100 a SMRT cell sounds cheap, until you start multiplying that out for the numbers you need) -- but perhaps not (much too fried to do that calculation at the moment).
RNA-Seq might be a bit trickier. If you need 500ng of input material, that's an awful lot of ribosome-depleted or poly-A RNA. Plus, getting only tens of thousands of reads, making it hard to see lowly-expressed messages -- but very long ones, perhaps priceless. But, if you can get tons of RNA, then 100 SMRT cells would be about $10K and offer similar depth to what you can get today with Illumina but with those super long reads.
Now, who is this going to crimp the most? The instrument is clearly a ways from really threatening Illumina & SOLiD for the large genome market. 454 is a likely candidate to see growth pressured -- though between the new lower-priced "junior" and both PacBio's $700K price tag and their inability to flood the market with instruments, this will be ameliorated.
PacBio might have almost as much effect on the surrounding sequencing ecosystem. Making library prep reagents for this system is not going to make you lots of money! But, there will be a serious niche for targeted sequencing -- though with the scale it will probably require some rethought. Stuffing the whole exome into this doesn't really make sense -- if there are ~250K segments of the genome to read & you want 40X coverage of each, that's a lot of SMRT cells. But, intelligently chosen gene sets totaling about 500 regions (or around 20-50 genes) with pre-validated reagents -- now that might be a market (though one which might have 1-2 years of life -- better get cracking!). Simpler library prep will also go nicely with some of the enrichment systems -- a bugaboo of hybridization systems can be "daisy-chaining" of fragments via the amplification adapters -- but, on the other hand you don't get 500ng off an array or in-solution system without amplification. As with many disruptive technologies, it won't fit a lot of bills but will nibble off various parts of the business that are individually small but significant in aggregate. As noted above, RNA-Seq might be an initial success story for PacBio -- when RNA is abundant.
IMHO, PacBio does need to get some papers out on applications (Mardis' group apparently is close to having one) and make sure that the next tranche of installations not only includes the Sanger & BGI, but that there are also some core labs or commercial providers. Also, they need to start pumping data into the public domain -- while they signed a bunch of commercial software providers up, it is definitely out of academia that you find the most radical advances. Plus, there are a lot of now well-entrenched open source tools that need to be tested with the new kid. Even simple things like the semi-standard SAM/BAM format are going to need tweaking -- SAM/BAM stores all sorts of information on read pairs, and the strobe sequencing can generate many more than 2 tags per DNA fragment.
Of course, we have to wait another half day plus to find out what Ion Torrent is really delivering. That could really shake up the landscape -- at least the mental one.
A huge thanks to all the bloggers & twitterers for pouring out so much information. I'm still getting used to scanning past the retweets (is there a way to condense them) and there is the occasional shock-to-the-system (how could anyone in the field not have heard of Rodger Staden?!?), but that's a tiny price to pay for such fascinating stuff.
Monday, March 14, 2016
A Mosquito ExAC?
Okay, there's a scheme for a crazy big genomics project has bitten me, infecting my brain. It's definitely not something I'm in a position at all to execute on, but I throw it out as an idea in case anyone finds it useful. And admittedly, it is pretty much stealing straight from the ExAC human exome aggregation project, which contains huge numbers of human exomes. Behind all those is a lot of phenotype data. Now, inspired by recently re-reading Laurie Garrett's The Coming Plague and also faced with daily news items on the Zika virus epidemic, I've had this question: what if the same approach were applied to key disease vectors?
Wednesday, March 09, 2016
Oxford's Riposte To Illumina Trade Action
Along with the "No thanks, I've already got one" online session, the other big Oxford Nanopore news is the public release of Oxford's response to the trade complaint filed by Illumina which was attempting to exclude all Oxford Nanopore devices from the U.S. markets. Nature News' Erika Check Hayden has posted the document on Dropbox, which was a big help. While no documents from Oxford's side concerning the simultaneous patent lawsuit have yet surfaced, it is reasonable to expect that it will use many of the same arguments.
Tuesday, March 08, 2016
Oxford's "No thanks, I've already got one"
Oxford Nanopore today hosted a Google hangout titled "No thanks, I've already got one". Only this morning did it occur to me I could have re-watched Monty Python and Holy Grail and scored it as blogging-related time! Oxford CTO Clive Brown went through a number of interesting (and in many cases, long-awaited) announcements on the release of multiple key upgrades to the platform (note: unless otherwise specified, images swiped from ONT).
Subscribe to:
Posts (Atom)