Omics! Omics!

Sunday, May 29, 2016

London Calling: Notes on Brownian Commotion

I'm behind on writing up London Calling. I can partly blame a failing computer -- though rebooting it seems to have righted it for the moment. A bigger challenge is that I had the luxury of staying in London thru the weekend, and have been trying to pack as much in of England as I can. To really do justice to everything, I need to scan all the tweets -- and that will take some time.

But I have dug into everything around Clive Brown's talk (kudos to NextGenSeek for Storifying that portion of the meeting's tweets!_ about the current and future state of the Oxford Nanopore platform, so I will focus on that, with a few side-trips on closely related topics. A few gaps on topics I previewed but didn't show up in the presentation were filled in with chats with Clive. Plus, the indescribably huge advantage of actually going to a conference are the tidbits gleaned from late night chats over drinks (and no, I didn't ply anyone to get them to spill -- all was coughed up out of pure free will). I'm going to roughly divide these by the announced timeframe: now, imminently, later this year, perhaps next year and unspecified.

London Calling Preview

ON Thursday and Friday this week Oxford Nanopore will be holding their second annual London Calling meeting. I successfully defended my schedule this year, so I'll be on the ground there. If you follow me on Twitter and don't want to be buried in nanopore tweets, mute the hashtag #nanoporeconf (a rather large hashtag for talking about nano stuff!) LC is OxNano's premier event, so what might we see from the company?

Inconstant lines

If you order chemicals, then the supplier provides a certificate of analysis, which shows the amounts of impurities or their limit of detection. Fir physics experiments, one can purchase components which have been carefully cast or machined to precise dimensions. Barring errors by the manufacturers, these reagents and components can be relied upon, as their consistency is known. Alas, for biological systems, such constancy is often a mirage.

Sickle Cell Anemia: An underprioritized disease?

The Sunday Boston Globe today had a front page piece by STAT's Sharon Begley that asks some challenging questions about prioritization of disease research. Poking around the STAT site, I found that the original article was even longer and better, but between the important issues it raises, some interesting peripheral stuff and at least one gaping hole, there's plenty to discuss.

Kendall Square Tech/Biotech/Biopharma Needs to Get Vocal About Transit!

Earlier this week, the current big Boston-area mass construction transit project, known as GLX, went through a near-death experience. The project, having been mismanaged to be over budget and behind schedule in the early going, was approved to survive in a stripped down form. Numerous political types were quoted supporting the project, albeit complaining about contributions their towns were making to keep the project alive. What wasn't heard was any sort of support from the tech, biotech and biopharma companies which crowd Kendall Square.

Exploring Critiques of Siddhartha Mukerjee's The Gene, An Intimate History

My finely tuned skills in the art of procrastination KOed my plans to see Siddhartha Mukerjee's talk tonight at a local bookstore (apparently with Henry Louis Gates) to promote Mukerjee's new book The Gene: An Intimate History -- the event sold out. Perhaps I could have found a scalper, but I decided I'd just head home. Mukerjee's first book, the cancer history The Emperor of All Maladies, was very well received (even spawning a PBS series), and I was impressed that Mukerjee took the time to contact me after I wrote a review in this space. The new book has been taking quite a bit of criticism, and even more so his New Yorker piece that preceded it (and I assume is derived from a portion of the book).

Around the World in Amino Acids

This post is pure whimsy, growing out of killing time on a train ride. The not-so-serious question: what is the geography of amino acids? If I search for them by name in Google Maps, what will I find? With just Google Maps, plus some Google Translate thrown in, I found a few surprises.

Genia Publishes Platform Progress

Nanopore sequencing developer Genia published in PNAS last week a study demonstrating the basics of their current approach to sequencing. I say current, because Genia has gone through a number of iterations and on at least two occasions promised to be going into beta in a 6-9 month timeframe. The paper demonstrates the basic concepts of a sequencing system and generates some short reads, but also suggests that Genia won't be hitting beta sites in the near future either.

Protein Homeostasis: Has it Hit The Classrooms Yet?

I wrote a piece earlier this year suggesting that introductory Biology textbooks should emphasize protein complexes more. My basis for assuming that they generally don't isn't very good: a single textbook in use in TNG's high school class, which sports a copyright date from a decade ago. I also remember what I was taught in high school and college courses, which I would rate as not bad and truly excellent (respectively), plus I was a teaching fellow for one semester of intro bio at Harvard. I now have another suggestion to cram into every biology course: an overview of ubiquitin-proteasome system.

Mosquito Genomes: Chance for Long-Range Companies to Shine

Friday's New York Times carried a front-page illustration of the current status of the Aedes aegyptii genome, accompanying an Amy Harmon story on efforts to improve the currently highly fragmented state of this genome

Hey @DrKatHolt @rrwick, there's a Bandage plot on the front page of the @nytimes today! #NoFooling @MarkKunitomi pic.twitter.com/a1n6QyQWr3
— Adam Phillippy (@aphillippy) April 1, 2016

The pice has seen a lot of opinion on Twitter with regard to its value and other issues (such as calling an assembly a map -- which to me is correct as the perfect genome sequence is the ultimate physical map!)

This whole thread. My science peeps keep me here. ❤️ https://t.co/96mifMD0da
— a muse (@_a_muse) April 3, 2016

Reflections on And The Band Played On

Fellow blogger, colleague and science history buff Ash pointed out to me recently that Randy Shilt's And The Band Played On for Kindle was on sale. I hadn't read the book, nor seen the miniseries, so I snapped up a copy. It's a good read -- though at times a hard one - I don't believe I've ever read another work of non-fiction where such a high fraction of the named individuals are dead by the end of the book

Who Wants To Write A Review Article?

Yes, this is a solicitation. I'm on the Editorial Board of the journal Briefings in Bioinformatics,. I'm looking for authors who would like to write high-quality, compact reviews. If you are interested, or you want a little back-story, then keep reading.

At the Edge of The Cloud

I've used cloud computing at Amazon Web Services (AWS) off-and-on now for over five years. The cloud has all sorts of handy advantages -- flexible access to large amounts of compute, inexpensive access to any flavor of Linux you wish, the ability to guiltlessly kill a huge server you just fatally cratered with the wrong command. And until now, I''ve always been able to find machines that fit my needs -- perhaps sometimes just fitting or with a bit of compromise But, now I've hit the wall: nobody at this time offers a really serious cloud machine with 500Gb of RAM.

Selective sequencing: A Programming Opportunity!

I ask a bit of indulgence from my regular readership for this piece, as I am going to explain a number of things in depth that probably will be very familiar to them. My hope, perhaps fantastic, is that this piece will get out to some who are not so familiar with such topics, as I think the problem at hand might be very fascinating.

PacBio's big splash

[18 March 2016 -- my original inclusion of the Pac Bio marketing image 6 years ago was claimed to be a DCMA violation -- I've simply removed it, though I do think this would fall under fair use ]

The Pacific Biosciences instrument is officially unveiled now, with those lucky/smart (or SMRT?) enough to go to Marco Island filling in all of us not in that position. Sounds like a great lot of hoopla, though they didn't drag the Hornet for the splashdown.

First of all, it's a beast. "In this corner, weighing in a nearly an imperial ton...". Too bad their marketing picture has nothing good for judging the scale --
it's apparently 6.5 feet wide.

Kevin Davies at Bio-IT World has a wonderfully detailed article and there is a lot of nuggets in the Twitter feed. Anthony Fejes has two different sets of notes out -- one from a workshop and one from another speaker; Dan Koboldt has some good notes too (and if I haven't shouted out your notes, it's probably because I'm oblivious -- leave me a comment pointing to them). There was also a little bit of PacBio science in Elaine Mardis' talk (she's on their SAB) -- Anthony's notes & the twitter feed.

Okay, besides worrying about the capacity of floors & freight elevators, what's new? Well, not much on error rates from PacBio (apparently in the Q&A their presenter executed a jig, tango, waltz & rumba when asked) -- though the Mardis talk described resequencing samples of PacBio that had been done before by Illumina -- and the results are quite good. Another important note is that their system doesn't seem to have much bias in terms of composition -- bias against hi/lo %GC has been noted in all of the amplification-based systems and can be a serious problem.

There's also a lot of talk about being able to distinguish various modified bases by their effects on polymerase kinetics. PacBio has also demonstrated direct RNA sequencing (substituting a reverse transcriptase for DNA polymerase) and is talking about watching proteins being made. I haven't quite figured out why you'd want to do that last one, but presumably it's for more than a cool Nature cover.

Read lengths decay exponentially -- but with lots around 1Kb and quite a few around 5K. The big problem is apparently oxidative damage to the polymerase triggered by the laser -- so they are working on both getting the oxygen out of the system and engineering hardier polymerases (the sort of biz I used to be in). Their strobe sequencing mode -- in which the laser is turned off to enable elongation in safe darkness -- enables multiple reads separated by long gaps.

The instrument definitely raises the bar on sample prep -- it's apparently entirely automated within the monster. YEAH! A machine I can delude myself into thinking I could run it! One drawer takes the SMRT cells and another the DNA samples -- 500 ng of each. That doesn't sound like much (it's at least better than the 5-10ug most library prep protocols call for -- except the ones looking for 20-30ug), but it seems you don't get a lot from each sample.

The number of reads per cell isn't huge -- but you're still getting about 2 E.coli genome equivalents by my calculation. This is a bit undersized for a lot of applications -- but grand from many others. Mardis' talk discussed using PacBio for sequencing PCR amplified resequencing samples -- this would appear to be right in the PacBio sweet spot. Perhaps a few hundred long PCR products could be packed into one SMRT run and still get many hundreds of reads per sample -- well, maybe pack fewer amplicons.

What might be other good uses? Clearly metagenomics and similar. I just saw a posting on a professional board of someone pondering multiplexing hundreds of samples for an Illumina run (the current barcode schemes are for a few orders of magnitude fewer samples). Blitzing each sample through the PacBio instrument would seem to be obvious -- if the error rates are acceptable. Folks doing whole genome sequencing of small genomes will love having PacBio to generate scaffolds. For bigger genomes, it may just still be too expensive to get much coverage ($100 a SMRT cell sounds cheap, until you start multiplying that out for the numbers you need) -- but perhaps not (much too fried to do that calculation at the moment).

RNA-Seq might be a bit trickier. If you need 500ng of input material, that's an awful lot of ribosome-depleted or poly-A RNA. Plus, getting only tens of thousands of reads, making it hard to see lowly-expressed messages -- but very long ones, perhaps priceless. But, if you can get tons of RNA, then 100 SMRT cells would be about $10K and offer similar depth to what you can get today with Illumina but with those super long reads.

Now, who is this going to crimp the most? The instrument is clearly a ways from really threatening Illumina & SOLiD for the large genome market. 454 is a likely candidate to see growth pressured -- though between the new lower-priced "junior" and both PacBio's $700K price tag and their inability to flood the market with instruments, this will be ameliorated.

PacBio might have almost as much effect on the surrounding sequencing ecosystem. Making library prep reagents for this system is not going to make you lots of money! But, there will be a serious niche for targeted sequencing -- though with the scale it will probably require some rethought. Stuffing the whole exome into this doesn't really make sense -- if there are ~250K segments of the genome to read & you want 40X coverage of each, that's a lot of SMRT cells. But, intelligently chosen gene sets totaling about 500 regions (or around 20-50 genes) with pre-validated reagents -- now that might be a market (though one which might have 1-2 years of life -- better get cracking!). Simpler library prep will also go nicely with some of the enrichment systems -- a bugaboo of hybridization systems can be "daisy-chaining" of fragments via the amplification adapters -- but, on the other hand you don't get 500ng off an array or in-solution system without amplification. As with many disruptive technologies, it won't fit a lot of bills but will nibble off various parts of the business that are individually small but significant in aggregate. As noted above, RNA-Seq might be an initial success story for PacBio -- when RNA is abundant.

IMHO, PacBio does need to get some papers out on applications (Mardis' group apparently is close to having one) and make sure that the next tranche of installations not only includes the Sanger & BGI, but that there are also some core labs or commercial providers. Also, they need to start pumping data into the public domain -- while they signed a bunch of commercial software providers up, it is definitely out of academia that you find the most radical advances. Plus, there are a lot of now well-entrenched open source tools that need to be tested with the new kid. Even simple things like the semi-standard SAM/BAM format are going to need tweaking -- SAM/BAM stores all sorts of information on read pairs, and the strobe sequencing can generate many more than 2 tags per DNA fragment.

Of course, we have to wait another half day plus to find out what Ion Torrent is really delivering. That could really shake up the landscape -- at least the mental one.

A huge thanks to all the bloggers & twitterers for pouring out so much information. I'm still getting used to scanning past the retweets (is there a way to condense them) and there is the occasional shock-to-the-system (how could anyone in the field not have heard of Rodger Staden?!?), but that's a tiny price to pay for such fascinating stuff.

Monday, March 14, 2016

A Mosquito ExAC?

Okay, there's a scheme for a crazy big genomics project has bitten me, infecting my brain. It's definitely not something I'm in a position at all to execute on, but I throw it out as an idea in case anyone finds it useful. And admittedly, it is pretty much stealing straight from the ExAC human exome aggregation project, which contains huge numbers of human exomes. Behind all those is a lot of phenotype data. Now, inspired by recently re-reading Laurie Garrett's The Coming Plague and also faced with daily news items on the Zika virus epidemic, I've had this question: what if the same approach were applied to key disease vectors?

Oxford's Riposte To Illumina Trade Action

Along with the "No thanks, I've already got one" online session, the other big Oxford Nanopore news is the public release of Oxford's response to the trade complaint filed by Illumina which was attempting to exclude all Oxford Nanopore devices from the U.S. markets. Nature News' Erika Check Hayden has posted the document on Dropbox, which was a big help. While no documents from Oxford's side concerning the simultaneous patent lawsuit have yet surfaced, it is reasonable to expect that it will use many of the same arguments.

Oxford's "No thanks, I've already got one"

Oxford Nanopore today hosted a Google hangout titled "No thanks, I've already got one". Only this morning did it occur to me I could have re-watched Monty Python and Holy Grail and scored it as blogging-related time! Oxford CTO Clive Brown went through a number of interesting (and in many cases, long-awaited) announcements on the release of multiple key upgrades to the platform (note: unless otherwise specified, images swiped from ONT).

Digging into the Illumina Lawsuit vs. Oxford Nanopore

Illumina's and University of Washington's filing of a patent lawsuit and related trade complaint against Oxford Nanopore made big news yesterday, with nice coverage from Mick Watson, GenomeWeb, Nature's Erika Check Hayden, Technology Reviews' Antonio Regalado, BioIT World's Aaron Krol, and venture capitalist Vishal Gulati. Each of these covers the onetime partnership between the two companies and their acrimonious parting of ways. Oxford Nanopore released a short and pithy response. Having failed to get an early jump on things, the ground is already well plowed. So my sloth and inertia have forced me to take an unpleasant route I usually spend great effort avoiding: actually reading the complaints and the two key patents licensed from Jens Gundlach's group at University of Washington (US8673550 and US9170230 ) they cite.

Amplification-free, library-free sequencing? NanoString wants to be It

Perhaps the most unusual new technology to be unveiled at AGBT16 is NanoString's new approach to sequencing, which is in very early stages of development. Called Hyb And Seq the process is remarkable in being a purely hybridization-based single molecule method -- absolutely no enzymes are harmed during the operation of the system. That's remarkable -- the only enzyme-free (or nearly so) sequencing approaches to deliver serious amounts of data into Genbank are Maxam and Gilbert approaches (including Church's genomic sequencing and multiplex sequencing), and even those typically required restriction digestion of the target.

Sunday, May 29, 2016

Wednesday, May 25, 2016

Tuesday, May 24, 2016

Sunday, May 22, 2016

Friday, May 20, 2016

Wednesday, May 18, 2016

Friday, May 06, 2016

Thursday, April 28, 2016

Tuesday, April 05, 2016

Sunday, April 03, 2016

Thursday, March 31, 2016

Wednesday, March 30, 2016

Tuesday, March 29, 2016

Friday, March 25, 2016

Friday, March 18, 2016

Monday, March 14, 2016

Wednesday, March 09, 2016

Tuesday, March 08, 2016

Thursday, February 25, 2016

Wednesday, February 24, 2016

Get new posts by email: