Omics! Omics!

Wednesday, May 25, 2016

London Calling Preview

ON Thursday and Friday this week Oxford Nanopore will be holding their second annual London Calling meeting. I successfully defended my schedule this year, so I'll be on the ground there. If you follow me on Twitter and don't want to be buried in nanopore tweets, mute the hashtag #nanoporeconf (a rather large hashtag for talking about nano stuff!) LC is OxNano's premier event, so what might we see from the company?

Inconstant lines

If you order chemicals, then the supplier provides a certificate of analysis, which shows the amounts of impurities or their limit of detection. Fir physics experiments, one can purchase components which have been carefully cast or machined to precise dimensions. Barring errors by the manufacturers, these reagents and components can be relied upon, as their consistency is known. Alas, for biological systems, such constancy is often a mirage.

Sickle Cell Anemia: An underprioritized disease?

The Sunday Boston Globe today had a front page piece by STAT's Sharon Begley that asks some challenging questions about prioritization of disease research. Poking around the STAT site, I found that the original article was even longer and better, but between the important issues it raises, some interesting peripheral stuff and at least one gaping hole, there's plenty to discuss.

Kendall Square Tech/Biotech/Biopharma Needs to Get Vocal About Transit!

Earlier this week, the current big Boston-area mass construction transit project, known as GLX, went through a near-death experience. The project, having been mismanaged to be over budget and behind schedule in the early going, was approved to survive in a stripped down form. Numerous political types were quoted supporting the project, albeit complaining about contributions their towns were making to keep the project alive. What wasn't heard was any sort of support from the tech, biotech and biopharma companies which crowd Kendall Square.

Exploring Critiques of Siddhartha Mukerjee's The Gene, An Intimate History

My finely tuned skills in the art of procrastination KOed my plans to see Siddhartha Mukerjee's talk tonight at a local bookstore (apparently with Henry Louis Gates) to promote Mukerjee's new book The Gene: An Intimate History -- the event sold out. Perhaps I could have found a scalper, but I decided I'd just head home. Mukerjee's first book, the cancer history The Emperor of All Maladies, was very well received (even spawning a PBS series), and I was impressed that Mukerjee took the time to contact me after I wrote a review in this space. The new book has been taking quite a bit of criticism, and even more so his New Yorker piece that preceded it (and I assume is derived from a portion of the book).

Around the World in Amino Acids

This post is pure whimsy, growing out of killing time on a train ride. The not-so-serious question: what is the geography of amino acids? If I search for them by name in Google Maps, what will I find? With just Google Maps, plus some Google Translate thrown in, I found a few surprises.

Genia Publishes Platform Progress

Nanopore sequencing developer Genia published in PNAS last week a study demonstrating the basics of their current approach to sequencing. I say current, because Genia has gone through a number of iterations and on at least two occasions promised to be going into beta in a 6-9 month timeframe. The paper demonstrates the basic concepts of a sequencing system and generates some short reads, but also suggests that Genia won't be hitting beta sites in the near future either.

Protein Homeostasis: Has it Hit The Classrooms Yet?

I wrote a piece earlier this year suggesting that introductory Biology textbooks should emphasize protein complexes more. My basis for assuming that they generally don't isn't very good: a single textbook in use in TNG's high school class, which sports a copyright date from a decade ago. I also remember what I was taught in high school and college courses, which I would rate as not bad and truly excellent (respectively), plus I was a teaching fellow for one semester of intro bio at Harvard. I now have another suggestion to cram into every biology course: an overview of ubiquitin-proteasome system.

Mosquito Genomes: Chance for Long-Range Companies to Shine

Friday's New York Times carried a front-page illustration of the current status of the Aedes aegyptii genome, accompanying an Amy Harmon story on efforts to improve the currently highly fragmented state of this genome

Hey @DrKatHolt @rrwick, there's a Bandage plot on the front page of the @nytimes today! #NoFooling @MarkKunitomi pic.twitter.com/a1n6QyQWr3
— Adam Phillippy (@aphillippy) April 1, 2016

The pice has seen a lot of opinion on Twitter with regard to its value and other issues (such as calling an assembly a map -- which to me is correct as the perfect genome sequence is the ultimate physical map!)

This whole thread. My science peeps keep me here. ❤️ https://t.co/96mifMD0da
— a muse (@_a_muse) April 3, 2016

Reflections on And The Band Played On

Fellow blogger, colleague and science history buff Ash pointed out to me recently that Randy Shilt's And The Band Played On for Kindle was on sale. I hadn't read the book, nor seen the miniseries, so I snapped up a copy. It's a good read -- though at times a hard one - I don't believe I've ever read another work of non-fiction where such a high fraction of the named individuals are dead by the end of the book

Who Wants To Write A Review Article?

Yes, this is a solicitation. I'm on the Editorial Board of the journal Briefings in Bioinformatics,. I'm looking for authors who would like to write high-quality, compact reviews. If you are interested, or you want a little back-story, then keep reading.

At the Edge of The Cloud

I've used cloud computing at Amazon Web Services (AWS) off-and-on now for over five years. The cloud has all sorts of handy advantages -- flexible access to large amounts of compute, inexpensive access to any flavor of Linux you wish, the ability to guiltlessly kill a huge server you just fatally cratered with the wrong command. And until now, I''ve always been able to find machines that fit my needs -- perhaps sometimes just fitting or with a bit of compromise But, now I've hit the wall: nobody at this time offers a really serious cloud machine with 500Gb of RAM.

Selective sequencing: A Programming Opportunity!

I ask a bit of indulgence from my regular readership for this piece, as I am going to explain a number of things in depth that probably will be very familiar to them. My hope, perhaps fantastic, is that this piece will get out to some who are not so familiar with such topics, as I think the problem at hand might be very fascinating.

PacBio's big splash

[18 March 2016 -- my original inclusion of the Pac Bio marketing image 6 years ago was claimed to be a DCMA violation -- I've simply removed it, though I do think this would fall under fair use ]

The Pacific Biosciences instrument is officially unveiled now, with those lucky/smart (or SMRT?) enough to go to Marco Island filling in all of us not in that position. Sounds like a great lot of hoopla, though they didn't drag the Hornet for the splashdown.

First of all, it's a beast. "In this corner, weighing in a nearly an imperial ton...". Too bad their marketing picture has nothing good for judging the scale --
it's apparently 6.5 feet wide.

Kevin Davies at Bio-IT World has a wonderfully detailed article and there is a lot of nuggets in the Twitter feed. Anthony Fejes has two different sets of notes out -- one from a workshop and one from another speaker; Dan Koboldt has some good notes too (and if I haven't shouted out your notes, it's probably because I'm oblivious -- leave me a comment pointing to them). There was also a little bit of PacBio science in Elaine Mardis' talk (she's on their SAB) -- Anthony's notes & the twitter feed.

Okay, besides worrying about the capacity of floors & freight elevators, what's new? Well, not much on error rates from PacBio (apparently in the Q&A their presenter executed a jig, tango, waltz & rumba when asked) -- though the Mardis talk described resequencing samples of PacBio that had been done before by Illumina -- and the results are quite good. Another important note is that their system doesn't seem to have much bias in terms of composition -- bias against hi/lo %GC has been noted in all of the amplification-based systems and can be a serious problem.

There's also a lot of talk about being able to distinguish various modified bases by their effects on polymerase kinetics. PacBio has also demonstrated direct RNA sequencing (substituting a reverse transcriptase for DNA polymerase) and is talking about watching proteins being made. I haven't quite figured out why you'd want to do that last one, but presumably it's for more than a cool Nature cover.

Read lengths decay exponentially -- but with lots around 1Kb and quite a few around 5K. The big problem is apparently oxidative damage to the polymerase triggered by the laser -- so they are working on both getting the oxygen out of the system and engineering hardier polymerases (the sort of biz I used to be in). Their strobe sequencing mode -- in which the laser is turned off to enable elongation in safe darkness -- enables multiple reads separated by long gaps.

The instrument definitely raises the bar on sample prep -- it's apparently entirely automated within the monster. YEAH! A machine I can delude myself into thinking I could run it! One drawer takes the SMRT cells and another the DNA samples -- 500 ng of each. That doesn't sound like much (it's at least better than the 5-10ug most library prep protocols call for -- except the ones looking for 20-30ug), but it seems you don't get a lot from each sample.

The number of reads per cell isn't huge -- but you're still getting about 2 E.coli genome equivalents by my calculation. This is a bit undersized for a lot of applications -- but grand from many others. Mardis' talk discussed using PacBio for sequencing PCR amplified resequencing samples -- this would appear to be right in the PacBio sweet spot. Perhaps a few hundred long PCR products could be packed into one SMRT run and still get many hundreds of reads per sample -- well, maybe pack fewer amplicons.

What might be other good uses? Clearly metagenomics and similar. I just saw a posting on a professional board of someone pondering multiplexing hundreds of samples for an Illumina run (the current barcode schemes are for a few orders of magnitude fewer samples). Blitzing each sample through the PacBio instrument would seem to be obvious -- if the error rates are acceptable. Folks doing whole genome sequencing of small genomes will love having PacBio to generate scaffolds. For bigger genomes, it may just still be too expensive to get much coverage ($100 a SMRT cell sounds cheap, until you start multiplying that out for the numbers you need) -- but perhaps not (much too fried to do that calculation at the moment).

RNA-Seq might be a bit trickier. If you need 500ng of input material, that's an awful lot of ribosome-depleted or poly-A RNA. Plus, getting only tens of thousands of reads, making it hard to see lowly-expressed messages -- but very long ones, perhaps priceless. But, if you can get tons of RNA, then 100 SMRT cells would be about $10K and offer similar depth to what you can get today with Illumina but with those super long reads.

Now, who is this going to crimp the most? The instrument is clearly a ways from really threatening Illumina & SOLiD for the large genome market. 454 is a likely candidate to see growth pressured -- though between the new lower-priced "junior" and both PacBio's $700K price tag and their inability to flood the market with instruments, this will be ameliorated.

PacBio might have almost as much effect on the surrounding sequencing ecosystem. Making library prep reagents for this system is not going to make you lots of money! But, there will be a serious niche for targeted sequencing -- though with the scale it will probably require some rethought. Stuffing the whole exome into this doesn't really make sense -- if there are ~250K segments of the genome to read & you want 40X coverage of each, that's a lot of SMRT cells. But, intelligently chosen gene sets totaling about 500 regions (or around 20-50 genes) with pre-validated reagents -- now that might be a market (though one which might have 1-2 years of life -- better get cracking!). Simpler library prep will also go nicely with some of the enrichment systems -- a bugaboo of hybridization systems can be "daisy-chaining" of fragments via the amplification adapters -- but, on the other hand you don't get 500ng off an array or in-solution system without amplification. As with many disruptive technologies, it won't fit a lot of bills but will nibble off various parts of the business that are individually small but significant in aggregate. As noted above, RNA-Seq might be an initial success story for PacBio -- when RNA is abundant.

IMHO, PacBio does need to get some papers out on applications (Mardis' group apparently is close to having one) and make sure that the next tranche of installations not only includes the Sanger & BGI, but that there are also some core labs or commercial providers. Also, they need to start pumping data into the public domain -- while they signed a bunch of commercial software providers up, it is definitely out of academia that you find the most radical advances. Plus, there are a lot of now well-entrenched open source tools that need to be tested with the new kid. Even simple things like the semi-standard SAM/BAM format are going to need tweaking -- SAM/BAM stores all sorts of information on read pairs, and the strobe sequencing can generate many more than 2 tags per DNA fragment.

Of course, we have to wait another half day plus to find out what Ion Torrent is really delivering. That could really shake up the landscape -- at least the mental one.

A huge thanks to all the bloggers & twitterers for pouring out so much information. I'm still getting used to scanning past the retweets (is there a way to condense them) and there is the occasional shock-to-the-system (how could anyone in the field not have heard of Rodger Staden?!?), but that's a tiny price to pay for such fascinating stuff.

Monday, March 14, 2016

A Mosquito ExAC?

Okay, there's a scheme for a crazy big genomics project has bitten me, infecting my brain. It's definitely not something I'm in a position at all to execute on, but I throw it out as an idea in case anyone finds it useful. And admittedly, it is pretty much stealing straight from the ExAC human exome aggregation project, which contains huge numbers of human exomes. Behind all those is a lot of phenotype data. Now, inspired by recently re-reading Laurie Garrett's The Coming Plague and also faced with daily news items on the Zika virus epidemic, I've had this question: what if the same approach were applied to key disease vectors?

Oxford's Riposte To Illumina Trade Action

Along with the "No thanks, I've already got one" online session, the other big Oxford Nanopore news is the public release of Oxford's response to the trade complaint filed by Illumina which was attempting to exclude all Oxford Nanopore devices from the U.S. markets. Nature News' Erika Check Hayden has posted the document on Dropbox, which was a big help. While no documents from Oxford's side concerning the simultaneous patent lawsuit have yet surfaced, it is reasonable to expect that it will use many of the same arguments.

Oxford's "No thanks, I've already got one"

Oxford Nanopore today hosted a Google hangout titled "No thanks, I've already got one". Only this morning did it occur to me I could have re-watched Monty Python and Holy Grail and scored it as blogging-related time! Oxford CTO Clive Brown went through a number of interesting (and in many cases, long-awaited) announcements on the release of multiple key upgrades to the platform (note: unless otherwise specified, images swiped from ONT).

Digging into the Illumina Lawsuit vs. Oxford Nanopore

Illumina's and University of Washington's filing of a patent lawsuit and related trade complaint against Oxford Nanopore made big news yesterday, with nice coverage from Mick Watson, GenomeWeb, Nature's Erika Check Hayden, Technology Reviews' Antonio Regalado, BioIT World's Aaron Krol, and venture capitalist Vishal Gulati. Each of these covers the onetime partnership between the two companies and their acrimonious parting of ways. Oxford Nanopore released a short and pithy response. Having failed to get an early jump on things, the ground is already well plowed. So my sloth and inertia have forced me to take an unpleasant route I usually spend great effort avoiding: actually reading the complaints and the two key patents licensed from Jens Gundlach's group at University of Washington (US8673550 and US9170230 ) they cite.

Amplification-free, library-free sequencing? NanoString wants to be It

Perhaps the most unusual new technology to be unveiled at AGBT16 is NanoString's new approach to sequencing, which is in very early stages of development. Called Hyb And Seq the process is remarkable in being a purely hybridization-based single molecule method -- absolutely no enzymes are harmed during the operation of the system. That's remarkable -- the only enzyme-free (or nearly so) sequencing approaches to deliver serious amounts of data into Genbank are Maxam and Gilbert approaches (including Church's genomic sequencing and multiplex sequencing), and even those typically required restriction digestion of the target.

AGBT16 Storify Completion & Rate Limits

AGBT16 ended a week ago, but for various reasons I'm just now catching up on my Storify project. A vacation was in there but also some tool building. As I was griping about the pains of organizing the tweets manually, Brian Krueger suggested what was already dawning on me (but it helps to be poked -- professional embarrassment is often a stronger motivator than pure annoyance) -- I needed to stop doing this purely manually. So, off to deal with pulling in Tweets automatically and at least doing some organization programatically.

10X Launches Chromium (#agbt16)

10X Genomics launched their approach to obtaining long-range genomic information last year with a big financing and some exciting preliminary data at AGBT15. Now they are back at AGBT16 with an upgraded instrument, improved biochemistry, new software and new applications, along with a trio of major co-marketing agreements and a splashy hire and a raft of both published and unpublished data from academic collaborators.

#AGBT16 Day 2: How is AGBT On Twitter Like Sequence Assembly?

I spent a bunch of time yesterday going through the Tweets from AGBT. For me personally it is a useful exercise, plus I'll have it as a resource to go back to for future posts. But the time and pain involved definitely had me sometimes questioning the wisdom of attempting this.

AGBT Begins (with bonus Storify Jeremiad)

Just finished my last Storify for tonight from AGBT16, and boy am I wondering how sustainable this will be. The "problem", which is wonderful to have, is that the number of tweeters has grown substantially, and so there is a wealth of material to attempt to distill down. There's also the desire to make sure I don't further propagate the spam which sneakily re-tweets the occasional item. The other problem has been my tools

AGBT16 Preview (aka The Non-Attendee's Lament)

AGBT16 starts this today but I'm again not there. The usual complex set of personal constraints (or imagined ones) kept my hat out of the ring this year, and now I'm again torn between wanting to be there and why it would have been hard. Easy would be leaving our most recent snow and ice storm and the general cold weather. A bit harder is it is early in the school term, and back-to-school night is Thursday -- plus I spent last night chatting with a candidate for a local office (School Committee) at a low-key campaign event. The big, and unforeseeable, challenge is that the other half of Starfleet's bioinformatics group is out on paternity leave, and while I'm proud of how much quotidian work I get done during conferences, it still isn't the same as being on full duty.

Why do we purify DNA the way we do?

An interesting conversation on Twitter on means for purifying DNA for PacBio and the risks of phenol-chloroform extractions restarted some pondering on the historical contingency of experimental techniques. Or, as the title says, why do we (today) purify genomic DNA the way we do?

First on an occasional series on high school biology: Complexes

TNG has biology this term, so I will be (at erratic intervals, of course) sometimes venturing my thoughts on the teaching of that specific subject. I think I never quite got around to venting the last time he had a biology unit, back in middle school, which among its sins was still teaching the seven kingdoms system of classification, which for the love of Woese is absurd.

Does any analytical program really care about the order of paired end files?

I was recently experimenting with C. Titus Brown and company's khmer package and hit an interesting little snag. First, I had my usual problems with installing a Python-based program, which were solved by the totally counter-intuitive absurdity of actually following the installation directions precisely. Armageddon is certainly near if random shortcuts and assumptions can't be relied on to get the job done. But once I had it working for a simple test case, I crazily tried to build something -- and that's when a new, maddening bug cropped up.

Illumina's MiniSeq Giveaway

This morning, Illumina announced a Scientific Challenge program as part of the launch of the MiniSeq sequencing instrument. Three prizes will be given away, with 3 sequencing runs on MiniSeq as 3rd prize and a MiniSeq plus reagents for 3 runs as second prize, and a MiniSeq plus reagents for 3 runs plus a Mini Cooper automobile as the grand prize. Entrants in the contest will submit a proposal for how to they plan to use the instrument (but not the car; if you support future reagent purchases by being an Uber driver, that's your business). There will also be a set of iPad mini giveaways based on recommending colleagues to enter the contest on social media; if your recommendation results in an entry, then you are entered in the iPad giveaway.

Illumina's Unveils Firefly

Illumina third big announcement around JPM is to unveil Project Firefly, a semiconductor sequencer which will use existing SBS library preparation and a derivative of SBS chemistry. Slotted with a price point ($30K), physical size (small pizza box ish?) and data yield (4M reads, 1Gbp data) below the just announced MiniSeq , Firefly would be two small boxes which could stack: one for library preparation and one to run single channel sequencing. The flowcell would use ordered arrays, layered atop the semiconductor sensors. Launch is proposed for the second half of 2017.

MiniSeq!

Okay, third (and last) post for tonight. Time for the victory lap -- and a reminder of the limits of educated guessing. On Sunday I threw out a prediction of a hypothetical Illumina MiniSeq based on ordered arrays, NextSeq chemistry and optics, but with only one optical unit and a price point of about $50K -- and today Illumina announced precisely that.

Grail will change oncology & society, but how?

Second big genomics news this weekend around the JP Morgan is that Illumina is spinning out a new company called Grail, backed also by Jeff Bezos, Bill Gates and some VCs, to pursue mass cancer screening via liquid biopsies. Given that cancer is most treatable when caught early, it's an exciting idea. But, the devil as always is in the details, and they could be quite diabolical.

Affymetrix Assimilated

With JP Morgan Conference starting, there's lots of big news in the genomics world, and for 2016 I'm throwing out my previous internal policy of one post per day; if the news warrants multiple, then write multiply! Plus, too often in the past I've jammed topics together, and if nothing else it makes it hard for me to find my own posts on a topic! The first big news this weekend is the announcement that Affymetrix has been snared in a $1.6B tractor beam from Thermo Fisher.

Illumina JPM/AGBT Predictions

The JP Morgan conference is next week, which for several years now has been Illumina's venue for making major platform announcements. So naturally, based on not even anything as substantial as scuttlebutt or rumor, I'll venture some predictions.

When it comes to Nanopore, am I too GAGA?

In the comments to yesterday's piece on the reboot of Nabsys, a commenter used a truly colorful epithet in inquiring why I am so bullish on Oxford Nanopore, and in particular whether I am paid to do so. Whether I've lost objectivity on this (or any) subject is something I take seriously, and I think it is worth a look. To deal with the more serious allegation up front: I have never been paid by Oxford Nanopore, and if I ever do write on a company which has paid me in money or substantive gifts or in which I held stock I would disclose that. My company has been a member of the MinION Access Program (MAP), and one could argue that Oxford has provided materials worth far in excess of the $1K entry fee. On the other hand, I and my co-workers have sunk quite a bit of time trying to use bad flowcells and unstable kits, so we've also sunk a lot into that project, so there's hardly a windfall there to cloud my judgement.

Nabsys Reboots

It's the beginning of the year, so new beginnings are a natural subject (though to be honest, spring works for that too). The holidays brought word of an effort to reboot a company that seemed to have expired: Nabsys.

Closing the books on 2015

The last line of Perl has been written, the last SQL select executed. As 2015 draws to a close, I want to extend a thank you to everyone who has read this blog, commented on it here or over on Twitter, followed me on Twitter, engaged me on Twitter, or any of the other myriad of ways that suggest that what I write and tweet is of interest to others. It is very rewarding to know that others find this space engaging, and I hope to continue to earn your attention and time.

So long 2015. I mustered a bit more resolve this year than previous years, hitting the 5th most number of posts for a full year (which should make it the median). 2016 promises to be an exciting year in the sequencing and genomics arena and I will try to both up the frequency and reduce the variance in the frequency of these posts -- ideally while improving the quality. Will I succeed -- only you, the reader, can score the quality aspect.

Loose 2015 Threads #2: Thanks for the help with bootstrap values

I owe a belated thank you to everyone who responded to my post on my muddled thinking around phylogenetic tree bootstrap values. I think I'm straightened out now and even dare to think I can explain this to someone else.

Loose 2015 threads #1: MiSeq 2x300 Issues

Before 2015 ends, I'd like to tie up two loose threads. In doing so, I'll deviate slightly from my usual pattern and publish two posts in a day; I could have lumped them together but instead I'll split. First up, a belated explanation, prompted by a comment, of my mention of issues with the MiSeq 2x300 reagents and a bit more on my confusion with regard to bootstrap values.

Thoughts on the Synthetic Biology of Seveneves

Neal Stephenson's Seveneves is a sprawling space novel of truly epic ambition and scope, which I enjoyed thoroughly. I'm not going to review it or give a detailed plot summary, but there are aspects related to the biology angles which interest me enough to scribble -- which means I must reveal some key plot points. I've grown increasingly sensitive to spoilers and (yo Charles Schulz's ghost: thanks for wrecking Citizen Kane for me at a young age!) for myself prefer to go into a major book or movie as cold as possible. So, if you haven't read the book and were planning to do so, please don't jump beyond the jump break. If you do, don't blame me for any reveals!

MinION and Time-to-Result

Peripatetic blogger Dale Yuzuki posed a question on my last piece which I'll answer with a separate post because it crystallizes for me what makes the Oxford Nanopore platform so different for a large number of counting-type assays. Dale's question was on Zev William's talk on pre-implantation screening and the number of reads required.

MinION Community Meeting 2015: Reflections & Wrap-Up

I spent the end of last week at the New York Genome Center for Oxford Nanopore's MinION Community Meeting 2015. Since the family joined me for the weekend, I let my thoughts simmer for a wrap-up. Plus I've been spending time scrutinizing the complementary "pen" for a USB connection and sample port, with no luck. Wisely, I've given up on that -- so I can start the same process with the "notepad". I've also finally stopped looking over my shoulder for a pitchfork-and-torch crowd after my numerous Twitter miscues, ranging from omitting speakers' names and affiliations to various mutations of the official hashtag (when I remembered any hashtag). For a slightly better synthesis of the Twitter stream, see my Storify.

Admitting to Ignorance on Interpreting Bootstrap Values

Okay, one of the points of this space has always been to crowdsource the project of educating myself, which also means on of the underlying principles is that I sometimes need to admit ignorance in a very public manner. After staring at a lot of phylogenetic trees, I've sufficiently unglued my confidence in my deep understanding of the principals (beyond coming just plain unglued) of confidence / bootstrap values. Despite trying a number of sites and reviews and threads on the Net, I can't quite find a detangling of the particular mental knot I've tied, so I'm throwing out the problem for group help.

Well, that was brief!

BGI news today is that they are jettisoning the Revolocity large sequencing system, announced all the way back in June. Along with the product abandonment, ~~40%~~ of the ex-Complete Genomics group in the Bay Area is being laid off, with remaining staff focusing on the desktop BGISEQ-500 sequencer.

Do Demons Dream of Phylogeny Packages?

Miserable day today - spent my entire day wrestling with bad formats and flaky tools and trying to bull my way past them, leading to many a mad expostulation. The whole day down in the pit, with the pendulum of multiple deadlines swinging just over my head. The MBTA released new schedules that muck with my routines. And the then to top it off, Mick Watson writes a piece titled "The Five Habits of Bad Bioinformaticians" that cuts far too close to home. So I arrived home in a foul mood, my senses unpleasantly heightened to every sound.

Comments on "The use and misuse of supplementary material in science publications"

Mihai Pop and Steven Salzberg have an opinion piece in BMC Bioinformatics titled "Use and mis-use of supplementary material in science publications", examining issues arising from the ever growing data supplements accompanying papers, particularly in high-profile journals with strict article length limits. Pop & Salzberg make a number of important points, but there are some topics they didn't cover that I think are also worth treatment.

HelicosTech Back on the Dance Floor?

It's Halloween, and as is my habit I fired up Saint-Saëns, As death tunes up his violin in a graveyard, the dead residents live again and dance with abandon, until the rooster (oboe) crows in the dawn. In a similar vein, a pre-print on bioRxiv has demonstrated new life in the Helicos single molecule sequencing platform, though while the platform stopped being commercially distributed (and Helicos went bankrupt just under 3 years ago), a scrappy little company called SeqLL has kept up a service business. A new Chinese company called Direct Genomics, with two Helicos founders onboard, plans to commercialize the new version of the technology.

BGI Launches the BGISEQ-500

This weekend brought the formal launch of the BGISEQ-500 desktop sequencing instrument from BGI (though deliveries won't begin until early next year). Utilizing the ex-Complete Genomics ligation technology also used in the Revolocity system, the instrument appears to sport a price similar to the Illumina NextSeq but offers throughput somewhere running from a NextSeq up to the low end of a HiSeq. Two flowcells can be run at a time (apparently in sync with each other, unlike QIAGEN's long-delayed machine), with a small and large versions of the flowcells. There's some ambiguity on questions such as the precise read length, though it is very short compared the the typical Illumina offerings. Dale Yuzuki has a nice write-up (complete with a picture next to the box) based on attending the International Congress of Genomics 100 where it was unveiled. One of these days I should wangle my way to that conference -- not only would the genomics be fascinating, but China holds a special allure -- or more specifically, Ailuropoda, for our household.

Concepts for Better Sequencer Calibration

Last week's release of the MARC data for the Oxford Nanopore MinION rebooted a train of thought I've had around DNA samples used as standards. Ideally, standards would meet a number of criteria, though some of these may be inherently in conflict and there are issues of practicality. But as a whole, the standards used for molecular technologies are often short of ideals in ways which could be addressed, as I will attempt to argue here. While many of my comments will be placed directly around MinION, many apply to other platforms -- as would solutions I have been contemplating.

Dovetail Takes Flight

Back in March I covered the unveiling of Dovetail Genomics' approach to scaffolding genomes via deriving long distance constraints from reconstituted chromatin. This morning the company announced full access to their genome sequencing and scaffolding service. Founder Ed Green and CEO Todd Dickinson chatted with me by phone last night about this launch.

Dovetail's offering is a complete service for sequencing or scaffolding large animal or plant genomes. Users can choose from a menu of service components, which can range from scaffolding an existing short read assembly for around $10K to a complete genome sequencing and scaffolding for around $40K, with a turnaround in either case of 6-8 weeks and scaffold sizes on the order of chromosome arms.

Since the beta program opened in the spring, Dovetail has worked to streamline both their wet lab and informatics protocols as they completed over 45 different customer projects. Of particular note is that the input DNA requirements are down from 5-10 10ug to 1-2ug. However, the Dovetail team agreed with my comment from before that with their current markets, de novo sequencing and structural variant calling on known genomes, input DNA has not been a serious constraint. They do believe they can substantially reduce the requirements further, perhaps to a few hundred nanograms.

While the service offerings can include users supplying their own high molecular weight DNA, Dovetail prefers to perform the extractions in house. The logic here is simpler: results are critically dependent on the size of the input DNA. As a result, Dovetail has spent great effort becoming expert in extracting DNA from a wide variety of different species and sample types, as well as using pulsed-field gel electrophoresis for DNA quality control

Dovetail is currently offering their Chicago technology only as a service, which has obvious advantages. Anyone who has attempted technology transfer will know how difficult it can be to make a process consistently repeatable at multiple sites, not to mention the variances that can easily creep in due to the vagaries of shipping. Aspects of this can be seen in the recently released MARC data for Oxford Nanopore. For users, a pure service offering means no learning curve and no equipment purchases; just turn over some biomass to Dovetail and wait for a high quality genome to be returned.

That doesn't mean kits aren't in on the horizon; Dovetail does plan to offer them at some future point. Also in future plans is expanding the service offerings to include metagenomes and haplotype calling. In the nearer term, a publication describing the scaffolding of a human sample (NA12878) and the American Alligator genome, which Dovetail has discussed in the past, is "well along" the publication pipeline. While the current offering is based on Illumina sequencing technology, Dovetail emphasizes that the technology itself is platform-agnostic. In a similar vein, when asked about how Dovetail differentiates themselves from the growing swarm of long range technologies, including Oxford Nanopore, PacBio Sequel, BioNano Genomics, and the now-launched 10X GemCode, their team praised the field as full of exciting technologies, but emphasized that they offer the ability to scaffold complex genomes very fast with no specialized equipment and no new techniques to learn..

Personally, a pure service offering is very attractive, since that means not having to find internal resources to learn the new technology and then execute on it. I checked with Dovetail, and while I don't have $40K burning a hole in my pocket, if I did I could grab something out of the garden or from the local seafood market, I really could have a complex genome scaffold of my very own in about two months. That's an exciting vision, and perhaps will be a major force in the sunsetting of science's tolerance for highly fragmented draft genomes.

Monday, October 19, 2015

MARC spots the Ox(ford)

Last week's end brought the initial report from MARC, the MinION Analysis and Reference Consortium, detailing a body of experiments intended to benchmark the performance and consistency of the Oxford Nanopore MinION sequencing device. The MARC paper is also the inaugural research article in F1000's new channel for nanopore papers.

PacBio Sequel: Smaller Box, Bigger Bang

Boy, am I regretting taking a vacation from online due to being engrossed in A Canticle for Leibowitz. Between last night and this morning, my neglect of my Twitter feed meant a colleague tipped me to the new PacBio machine with "what's this Sequel I keep hearing about from PacBio". So a lot of folks had a huge jump and covered it pretty well, including Keith Bradnam, Mick Watson, and James Hadfield. Long rumored, the new instrument costs about half as much (but that's still $350K), takes up much less floor space (and doesn't need any reinforced floors) yet the new flowcells deliver about 6 7 times as many reads than the older ones. WOW!

Farewell Nabsys

A bit over a week ago brought news that mapping instrument hopeful Nabsys had ceased operations. As a veteran of one failed biotech, I have a lot of sympathy for the team there. Plus, I knew a bunch of folks at the Providence RI firm. Nabsys's signle molecule mapping technology was a wonder -- what single molecule technology isn't? Already stories are emerging of a disgruntled founder who wants to buy up the intellectual property and give it another go. It is easy to admire that stick-to-it spirit; it's a lot harder to find a rational reason to believe that such a revival will be any more successful.

How Do You Differentiate Archea and Bacteria in the First Week of High School Biology???

I have a long standing interest in biology education -- I seriously considered it as at least a career to explore -- but now I really have skin in the game. TNG just executed a schedule move that will defer his biology this year to the second half of the term, but I also have a niece who is taking AP Biology at her STEM high school. Even in his short time in biology class, TNG has succeeded in asking for homework help that has me scratching my head.

Freely & Unrepentantly Confessing to Heresy

Keith Bradnam reported a huge influx of traffic for a recent post -- not surprising, since he labeled it NSFW (Not Safe For WorK). And yes, despite my skepticism that it would be truly offensive, I'll confess I checked it with phone, not my work laptop.

Ion's S5

The Ion Torrent team rolled out a new sequencer line this morning, the S5. The S5, whose impending release had been tipped on the internet by the leak of a manual, arrives in two models, the standard and the XL, which differ only by on-board computing power and not sequencing metrics. As has been the trend, Ion's focus is entirely on focused sequencing, and the new lineup emphasizes making targeted sequencing with AmpliSeq and other approaches fast and simple.

The Road to Hell is Paved with Bioinformatics Formats

If you really want to raise a bioinformaticist's blood pressure, loudly declare your new tool generates output in brand new data formats. This leads to the frequent observation that a large fraction of bioinformatics work is simply converting formats. It is probably consensus that the field is awash in too many formats, though it is equally clear that we can't agree on which should survive. Between some recent news and a Twitter thread on the subject that erupted last night, there was a bunch of fodder for me to collect in a Storify -- and to lay out my own idiosyncratic views.

Do Helix's Numbers Work?

A number of efforts in the consumer genomics space have been attempted in the past, with 23andMe appearing to make limited headway and Knome not much at all. I haven't been able to get any investment interest in my own concept, though perhaps that's because it was tongue-in-cheek (or tongue held out while panting). Last week brought a big splash, with a new company Helix launching with $100M and three major players as backers: Illumina, LabCorp and the Mayo Clinic

Clinical Metagenomics Pipelines: Revisiting & Reflecting

When I set out to start this blog nearly over eight years ago, I set myself a number of goals. One goal was to take some risks -- not crazy risks but to not just play it safe. But counterbalancing that goal was one to be open, accurate and honest. My piece last week on clinical metagenomics pipelines had a fair amount of attention, and resulted in an ongoing electronic conversation with one of the key parties. In the course of this, there are now parts of that piece I wish I had handled differently. Some other important topics have been raised, and I would like to cover here.

Leaky clinical metagenomics pipelines are a very serious issue

Update: Some significant issues with the tone of this post are discussed in a follow-up.

I am a firm believer that the practice of science is the result of contingency; we do not necessarily have the best scientific culture possible but rather one which has evolved over time driven by chance, necessity and human nature. We should never hesitate to re-examine the way science is actually practiced, and that particularly holds true for how we analyze data and publish results. A re-analysis of a prominent Lancet paper has just come out in F1000, and this work by Steven Salzberg and colleagues illustrates a number of significant issues that slipped past the conventional peer review publishing practice

June 2015: Busting Out All Over with Genomics Technology

This month I again entered the prime of my life, though next year my programming brother points out that next year I (and the first Apollo manned missions) hit the big 30. Beyond my personal milestone, it's been a busy last couple of weeks on the genomics technology front. Despite a lack of conferences or other traditional venues, big news has poured out from Pacific Biosciences, BioNano Genomics, Genapsys, BGI (which had another announcement earlier in the month), 10X Genomics and a pair from Oxford Nanopore.

BGI Unveils a Sequencing Factory to Go

When I was in George Church's lab, he submitted a grant proposal (which, alas, was not funded) for a sequencing factory to generate one megabase of data per day. In those days that was an ambitious goal, and the plan would have truly been on a factory scale, with a large workforce and an assembly line of stages to yield the final product of data.

Is Illumina Serious About an Alternate Chemistry for the Rapid Amplicon Market?

Back in January, at the end of my post on Illumina's new machine lineup I speculated whether Illumina might see a niche for a lower cost, lower throughput sequencing system that would slot below the MiSeq in their lineup. Such an instrument, I posited, might go after applications in biosurveilance and diagnostics where relatively small amounts of data are needed quickly. I speculated that perhaps a smaller instrument with less expensive optics could compete in this arena, which is heating up due to Oxford Nanopore and the growing acceptance of DNA-based diagnostics. As luck would have it, a few days later Molly He, Mostafa Ronaghi and colleagues at Illumina actually published a proof-of-concept paper for just such an instrument. Unlike many sequencing technology PoC papers, this one demonstrates feasibility of reading actual templates (phiX rides again!).

London Calling Wrap-Up

The second, and final, day of Oxford Nanopore's London Calling conference concluded last Friday -- and I'm behind on writing it up. Some of that was due to travel (and the wrong power supply going on the trip) and post-trip exhaustion, but failing to finish this last night was pure slacking. That route was shut down when one reader asked when I'd get things done. Anyway, I again organized the activity into a storify story as I did for the first day of the conference. I'm going to go into less detail on individual presentations below and instead engage in the vice of far-ranging speculation.

London Calling Day 1: Highlights

Oxford Nanopore's London Calling conference kicked off today; I've Storified a large collection of Tweets from it, covering today up through about dinner. I'll summarize some highlights below

Oxford Nanopore's London Calling: Pre-meeting speculation

Oxford Nanopore's London Calling confab starts up in a matter of hours. Alas, several issues scotched my plans to attend (not only does it promise to be an exciting conference, but I simply love exploring London on foot). It is worth emphasizing that the MinION devices and consumbables have been out in the wild for not quite 11 months at this time. In that time, Oxford has dealt with a wide variety of technical and logistical headaches. While performance is still variable, many MAP participants are forging forward and the available tools for nanopore data continue to grow. London Calling will likely bring a burst of new announcements; Oxford's Clive Brown has been giving talks recently but has promised that exciting stuff has been reserved for the confab. Below is a set of semi-informed speculations calling out likely happenings, mostly based on Clive's recent presentations and tweets.

PacBio's New Sample Prep Plan: Too Late to the Dance?

Pacific Biosciences had a string of announcements around its earnings release last week. Of particular interest is a collaboration with RainDance to develop a new sample preparation system for generating long synthetic reads from minuscule inputs. If some of that sounds familiar, the loose outline in the press release suggests an approach similar to that of 10X. But is this proposed system arriving too late to the party?

Revisiting the RNA Tie Club

As mentioned previously, by wonderful luck I now have regular contact with Ash from the Curious Wavefunction, and he has stimulated a new burst of scientific history interest in me. I've ripped through a bunch of scientific memoirs -- by Crick, Djerassi and Dyson -- and have learned how to summon the biographies of Wilkins and Chargaff, as well as trying to dive again into The Eighth Day of Creation. One topic I keep stumbling across is an interesting little bit of genetic history called the RNA Tie Club, which is a story worth re-telling and re-examining

Interested in the History of Biotech Companies? Don't start with Wikipedia.

I'm generally a big fan of Wikipedia and use it often for background research. I've gotten more active this year in editing it, particularly around biographies of scientists. For example, this year I've made major additions or edits to the entries for Walter Gilbert and Arthur Pardee and the , created entries for Martinas Ycas, Benno Müller-Hill, Monica Riley and Helen Donis-Keller. I also stumbled my way into a campaign of major revisions to the entry on Marie Antoinette, getting sometimes into a revision war with one other editor (which we resolved with a truce). Along the way I've gotten almost adept at writing Wikipedia references and discovered a bizarre recurrent vandalism of Wally's page in which the vandal changes his name and personal details. Recently, I've discovered a whole category of flawed entries: those on companies in the biotechnology industry.

To Properly Assess Cancer Genomics, One Cannot Dismiss It

Through a happy series of professional events, I now get to have lunch very regularly with the author of the excellent blog The Curious Wavefunction. If you haven't visited there, Ash not only delves into chemistry but the history of science. In a most friendly way, he dropped a challenge on my Twitter-step that represents a long procrastinated blogging project, so I really couldn't turn it down. And that challenge is: what has been the value of cancer genomics. Is it, as he asked, a very expensive exercise in looking for keys under the lamppost, or something far more valuable?

A Dovetail Route to Scaffolded Genomes

10X Genomics had a lot of buzz at AGBT over their approach to acquiring long range information for complex genomes via a microfluidic-assisted library preparation scheme. Another young company, Dovetail Genomics, is starting to unveil a very different technology with similar aims.

An Impending Shakeout In Library Prep?

My ABGT teleconference-based pieces all had a theme of library preparation. Library prep has never been as flashy as instrument performance, but is clearly critical. A library-free sequencing technology remains a distant dream, so DNA (or RNA) must go through a series of preparative steps prior to being loaded on the sequencer. The dominant library prep molecular biology for clonal sequencing systems consists of shearing the DNA mechanically, making flush ends with a repair mix, adding 3' runs of A and then ligating primers and finally using PCR to amplify the material.

Mechanical shearing can be replaced with enzymatic shearing (or perhaps even chemical, though I'm unaware of chemical shearing being used in production). For RNA of different sorts,

some upstream steps are added to convert the RNA to DNA, perhaps with a depletion at some stage of hyperabundant species such as rRNA. This conversion may, with different levels of success, mark which strand was sense and which antisense. The transposase-based Nextera protocol represents the most drastic departure from this paradigm, enzymatically eliminating all the steps prior to PCR.

There's Gold in Them Thar Programs

Last night was the season five finale of the Gold Rush, which I confess is one of the few television programs that I have been watching routinely near their airing schedule (the other is The Simpsons, which is a father-son bonding experience). Now, writing in a blog mostly about science that you watch something on the Discovery Channel is a bit of a bold act, given its many panderings. The network annually features Shark Week, that has been roundly criticized for its sensationalized portrayal of these magnificent creatures. It also features shows which purport to show individuals routinely engaged in felonies and in one case claiming to document a violent subculture in a pacifist religious community, the Amish. I grew up near the Amish Country of Pennsylvania; if anything like that ever existed the Philadelphia papers would have had a field day. Gold Rush itself, and a second gold show which I've developed a fondness for, Bering Sea Gold, has shortcomings that are obvious and painful. So why am I hooked?

Can BGI Really Stir Up the Sequencing Instrument Market?

I've been asked several times recently about rumors coming out from BGI. They've started claiming they have a super sequencer which will radically beat Illumina's offerings on both cost and accuracy. The recent 10K Genomes meeting apparently had a quick talk from BGI which led to some limited Twittering, and judging from this Mendel's Pod interview at least one person believes the buzz (though the same individual quotes a price per PacBio human genome that high by at least a factor of 25). . The claim is that this summer at ESHG BGI will release two boxes, one a benchtop model which I haven't seen any details on, and the other claimed to offer throughput superior to a HiSeq with better accuracy. What might be backing up these claims?

What's Been Cooking for Ion At AGBT15

Rounding out my remote coverage of platform news from AGBT, the Ion Torrent team also lent me some of their time (and at risk of sounding obsequious, I do greatly appreciate this -- vendors have almost no down time at these events) to touch on some of the topics I I wrote about in my Ion history and speculation piece.

10X Reveals Its Facets

Perhaps the heavily anticipated launch at AGBT this year is the library prep instrument for 10X Genomics. This Bay Area startup made a huge splash at the beginning of the year by announcing a monster ($55.5M) financing. A member of my professional network had been part of the early team and had given me very minimal hints at last year's AGBT, so I've been eagerly awaiting details for a long time. Several members of 10X's team were kind enough to chat with me by phone yesterday with the proviso that I hold off on launching this piece after their talk today at the conference (interestingly, I had crossed paths with all of them in some previous setting). Also, they sent me some promotional materials and permitted me to post some clips from them. Now, the GemCode system is officially launched, with orders being taken now and devices planned to be delivered in early Q2.

Illumina Launches NeoPrep (#agbt15)

The 2015 AGBT conference started out today. A few hardware makers have let me chat by phone with members of their team, since they're there and I'm not. Tonight's dispatch is from a chat with Illumina focused on their now launched NeoPrep library preparation instrument

Can Ion Torrent Buzz Again?

In my AGBT 2015 Preview / Speculation at one point had a tightly packed (and overly long) paragraph on Ion Torrent, but I realized that this was a symptom of trying to to cram too much in too little a space -- plus I really had a lot more thoughts worth unpacking. So here's a long form look at Ion Torrent -- with plenty of references to past AGBTs to make writing this now apropos. One advance bit of excuse making: the historical background that follows is not intended to be a comprehensive history of Ion Torrent technology, but more of an impressionistic sketch (but as always, my worst excesses and omissions are fair game for comments!).

#AGBT2015 Preview

The annual genomics party on Gulf of Mexico beaches named AGBT runs next week, and already there have been some speculations flying. I'd better dash something off before I'm any later to the preshow -- or more importantly before I get contaminated with embargoed information.

The MBTA Must Embrace Data!

As you may have heard, we’ve had a bit of snow in the Boston area recently. Two storms, one the beginning of last week and one which just ended yesterday, each dumped close to a meter of snow in the area. The two storms each had different profiles: last week’s storm featured rapid snowfall and furious winds, with the snow falling over a 24-36 hour period. The more recent storm started on Friday afternoon, ended on Tuesday morning, with a steady fall of lazy snowflakes. Last week a hare, this week a tortoise. But both weeks, a paralyzed Boston from a transportation standpoint, with the MBTA mass transit system performing dismally.

Unfortunately, the main response to that failure has been a lot of political theater. GM Beverly Scott gave a press conference yesterdaythat featured the usual refrain: the system features antiquated equipment, our crews are working hard, nobody could deal with this. In other words, a string of unquantifiable and unactionable clichés. There's already an unhelpful murmur in the press that Scott might be fired, which would seem little fix but mostly fodder for more column inches of newspaper opinion (such as this and this)

How not to write a sequence assembly comparison paper

Lex Nederbragt flagged, via Twitter, a preprint on the F1000 site with a questionable table comparing sequencing systems. Alas, once I looked at the paper I've gotten myself in a state where only writing up its numerous deficiencies will free my mind of it. I've even volunteered to F1000 to review the paper, but I haven't heard anything and so I will use this space. I'm afraid this paper fall into the small category of manuscripts that I would recommend rejection.

The preprint is titled "Advantages of distributed and parallel algorithms that leverage Cloud Computing platforms for large-scale genome assembly". Alas, the paper doesn't attempt to deliver anything of the scope promised by that, and the abstract isn't much better. Most papers have a certain amount of preamble and then deliver some new finding; the preamble to the paper is overlong and badly executed, and the work in the paper is far too minimal and also badly executed.

Cargo Cult Networking & Other LinkedIn Laments

LinkedIn is a social media tool I find greatly flawed, but useful. Part of the devil's pact one makes with LinkedIn is to receive great amounts of requests from individuals who wish to grow their networks. I have a personal guideline for such which help me weed through the requests, but last year I got a request that had me laughing -- and slightly revising that guideline.

JPM Wrap-Up:

In this final installment of a series of reactions to news coming from the J.P. Morgan Conference, I'll cover an interesting complementary technology that was announced. But first, it might appear my prediction of no radical sequencer announcements may have been invalidated, with an announcement from BGI of plans to launch two sequencers based on Complete Genomic's technology. Unfortunately, the only outlet that seems to have covered this is GenomeWeb, and it is in their premium (paywalled) section, so I know nothing beyond that. It appears this was only announced around JPM and not at JPM, so I have a Clintonesque out as well.

Illumina's Expanded Lineup

In my JP Morgan predictions for sequencing platforms, I didn't do badly. The only major player to make a platform announcement was Illumina, and they did indeed announce instruments that are not radical departures from the prior platforms. I am kicking myself for not making more specific predictions, as the nature of the new boxes was really unsurprising and it would have been nice to nail that.

2015: Another Year of Sequencing Evolution (not Revolution)?

The J.P. Morgan Conference is firing up, and for the past few years that has meant big sequencing platform announcements -- HiSeq or Ion Proton or such. This has stolen some of the thunder from AGBT in terms of major announcements (sadly, I won't be attending this year -- and will try not to land my self into surgery the way I did the last time I didn't attend AGBT). I figured I'd better write this tonight before any more JPM-related sequencing instrument announcements show up, or more to my prediction, before the conference ends without any.

Druggability: An Underappreciated Issue in Translating the Human Genome Into Therapeutics

I'm sorely guilty of neglecting this space, but a recent (and now storified) Twitter conversation from Jonathan Eisen (@phylogenomics) has improbably fired me up enough to scribble something.

Reanalysis Lays Bare MinION Review's Spectacular Flaws

I will confess that when our first MinION burn-in data for lambda came in & I threw a few aligners at it (after first getting my data extractor in Julia shaken out), I was disappointed at the results. Very few 2D reads, very few aligned reads and the alignments all short. At this point, I sat back to wait to see what others had experienced and to think of additional bioinformatics approaches. It never occurred to me to dash off a glorified blog post and submit it to a journal.

Oxford Takes Some Flak, Fires Back

A huge event in the genomics community this summer has been the Oxford Nanopore MinION Access Program (MAP), which has enabled a sizable but select group of researchers to try out ONT's novel nanopore-based sequencing technology. While results and rumors have periodically drifted out over the summer, this week saw three disclosures, one of which resulted in fireworks and action

The good, bad & missing from Bio* libraries?

As I mentioned recently, I've been exploring how I might use the emerging Julia language to solve problems. While that requires a large amount of mental work, I see some potential gains, both in having more readable code than Perl as well as to potentially leverage a lot of high-level concepts for parallel execution that are built into the language. But beyond the challenge of elderly canine pedagogy that I present, there is the issue that the BioJulia library is quite embryonic, with serious consideration of treating much of the existing code base as a first draft (or, that is the impression I get from skimming the Google group). So I'm going to try to pitch in, despite my multiple handicaps.

After the New Yorker piece, what of disruptive innovation?

I don't read a lot of books aimed at the MBA crowd, but one set I have liked, and sometimes cite here, are Clayton Christensen's on inovation and disruption. As you may have heard, a recent article in the New Yorker by Jill Lepore took a gimlet-eye view to the whole concept and raised serious questions about Christensen's methods. This was then summarized by another author in Slate and since then Christensen has responded in part via a Business Week interview. He's also scheduled to be interviewed on PBS this weekend, so likely there will be further developments. Indeed, after sketching this out on the commute home I discovered a Financial Times article whose tone is very similar to what I have written below.

Dabbling with Julia

As I've remarked before, I've done significant coding in a large number of languages over the last 35-or-so years. I don't consider myself a computer language savant; I've known folks who can pick up new languages quickly and switch between them facilely, but for me it is more difficult. I haven't tried learning a new language in perhaps 5 years, but this week I backed into one

NGS Saves A Young Life

One of the most electrifying talks at AGBT this year was given by Joe DeRisi of UCSF, who gave a brief intro on the difficulty of diagnosing the root cause of encephalitis (as it can be autoimmune, viral, protozoal, bacterial and probably a few other causes) and then ran down a gripping case history which seemed straight out of House.

A Sunset for Draft Genomes?

The sun set during AGBT 2014 for a final time over a week ago. The posters have long been down, and perhaps the liver enzyme levels of the attendees are now down to normal as well. This year’s conference underscored a possibility that was suggested last year: that the era of the poorly connected, low quality draft genome is headed for the sunset as well

How will you deal with GRCh38?

I was foolishly attempting to catch up with Twitter last night during Valerie Schneider's AGBT talk last night on the new human reference, GRCh38. After all, my personal answer to my title is nothing, because this isn't a field I work in. But Dr. Schneider is a very good speaker and I could not help but have my attention pulled in. While clearly not the final word on a human reference, this new edition fixes many gaps, expands the coverage of highly polymorphic regions, and even models the difficult to assemble centromeres. Better assembly, combined with emerging tools to better handle those complex regions via graph representations, means better mapping send better variant calls.

So, a significant advance, but a bit unpleasant one if you are in the space. You now have several ugly options before you with regard to your prior data mapped to an earlier reference.

The do nothing option must appeal to some. Forgo the advantages of the new reference and just stick to the old. Perhaps start new projects on the new one, leading to a cacophony of internal tools dealing with different versions, with an ongoing risk of mismatched results. Also, cross your fingers that none of changes might be revised if analyzed against the new reference. Perhaps this route will be rationalized as healthy procrastination until a well-vetted set of graph-aware mappers exist, but once you start putting-off it is hard to stop doing so.

The other pole would be to embrace the new reference whole-heartedly and realign all the old data against the new reference. After burning a lot of compute cycles and storage space running in place, spend a lot of time reconciling old and new results. Then decide whether to ditch all your old alignments, or suffer an even larger storage burden.

A tempting shortcut would be to just remap alignments and variants by the known relationships between the two references. After all, the vast majority of the results will simply shift coordinates a bit, but with no other effects. In theory, one could estimate all the map regions that are now suspect and simply realign the reads which map to those regions, plus attempt to place reads that previously failed to map. Again reconciliation of results, but on a much reduced scale.

None would seem particularly appealing options. Perhaps that latter route will be a growth industry of new tools acting on BAM, CRAM or VCF which themselves will provide a morass of competing claims of accuracy, efficiency and speed. Doesn't make me at all in a hurry to leave a cozy world of haploid genomes that are often finished by a simple pipeline!

Thursday, January 16, 2014

Illumina's New Lineup

Illumina made a brace of big hardware announcements at this week's J.P. Morgan conference, and Mick Watson has done a nice job of covering them. I'll try to cover some different points that have occurred to me after letting the news ferment -- plus Illumina made yet another announcement tonight that scotched a portion of an earlier draft of this piece.

Relearning Chemistry

An evening ritual is to inquire what homework requires assistance, and at the beginning of the year it was a science worksheet as part of an introduction to chemistry. That, and a later project, have exposed how much rust my knowledge of chemistry has accumulated, but also have led me down the path of repairing forgotten bits and certainly learning some new stuff

Envisioning The Perfect Scaffolder

Rather than make any New Year's resolutions of my own, which I would then feel guilty about not keeping, I've decided to make one for someone else: they will write the perfect open source scaffolder. There's a lot of scaffolders out there, both stand-alone and integrated into various assemblers, but none are quite right.

If you are sequencing an isolated bacterium or archean and are looking for a scaffolder, except in a few rare cases, you're doing something wrong: given enough long reads from PacBio it should be possible to solve nearly every bacterial genome. But, if you're sequencing eukaryotic genomes or any metagenome (or you're unlucky or data short on a simple microbial genome), you're probably in the market for one. I'm going to supply a list of attributes I cooked up during a long drive up the Eastern Seaboard today, without much regard for feasibility or even if some conflict with each other.

Peering Through the Flowcell Glass, Darkly

As 2013 draws to a close, I've decided to stick my neck out and make some predictions for 2014. Perhaps I'll get lucky and a few will even come true! After several mental experimentations on the structure, I'll settle for stepping roughly past each major player.

Assembly Could Benefit From More Circular Reasoning

It was very gratifying to get comments on my recent piece on a de novo assembly review from both a referee of the manuscript (the amazing Heng Li) as well as one of the authors of the piece (though I am truly feeling guilty I forgot to reach out to the authors). Of course I was having my usual post-post regrets of things not written, such as the whole interesting topic of dealing with (and leveraging) uneven coverage in metagenomes and when assembling from amplified samples. But one other thing I was reminded of is one of the minor complaints I have with assembly programs: a lack of proper handing of circular genomes.

Assembling a Review of a Review of Assembling

A review on short-read de novo genome assembly appeared recently in PLoS Computational Biology, titled "Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges". I think the review has a number of merits, but I also find a number of frustrating flaws. I'm going to write this entry much as I would have written a referee report on it. Unfortunately, that will mean I'll dwell a bit more on the flaws than the assets, but if you are interested in the field

Did The Biochemists of Yore Know Morse Code?

So, this piece is going to be mostly asking questions. In one of the corners of my dream world I have a scientific historian on retainer, but in the real world my substitute is to throw some questions out and hope some knowledgeable people leave comments. If someone I spark someone’s term paper or thesis topic, I ask only that I get an electronic draft!

Wednesday, May 25, 2016

Tuesday, May 24, 2016

Sunday, May 22, 2016

Friday, May 20, 2016

Wednesday, May 18, 2016

Friday, May 06, 2016

Thursday, April 28, 2016

Tuesday, April 05, 2016

Sunday, April 03, 2016

Thursday, March 31, 2016

Wednesday, March 30, 2016

Tuesday, March 29, 2016

Friday, March 25, 2016

Friday, March 18, 2016

Monday, March 14, 2016

Wednesday, March 09, 2016

Tuesday, March 08, 2016

Thursday, February 25, 2016

Wednesday, February 24, 2016

Saturday, February 20, 2016

Friday, February 12, 2016

Wednesday, February 10, 2016

Tuesday, February 09, 2016

Thursday, January 28, 2016

Friday, January 22, 2016

Monday, January 18, 2016

Tuesday, January 12, 2016

Monday, January 11, 2016

Sunday, January 10, 2016

Tuesday, January 05, 2016

Monday, January 04, 2016

Thursday, December 31, 2015

Wednesday, December 30, 2015

Monday, December 28, 2015

Friday, December 11, 2015

Monday, December 07, 2015

Monday, November 30, 2015

Tuesday, November 24, 2015

Monday, November 16, 2015

Wednesday, November 04, 2015

Saturday, October 31, 2015

Tuesday, October 27, 2015

Wednesday, October 21, 2015

Tuesday, October 20, 2015

Monday, October 19, 2015

Thursday, October 01, 2015

Thursday, September 24, 2015

Thursday, September 17, 2015

Wednesday, September 09, 2015

Tuesday, September 01, 2015

Wednesday, August 26, 2015

Monday, August 24, 2015

Friday, July 10, 2015

Thursday, July 02, 2015

Tuesday, June 30, 2015

Monday, June 08, 2015

Tuesday, May 26, 2015

Wednesday, May 20, 2015

Friday, May 15, 2015

Wednesday, May 13, 2015

Monday, April 27, 2015

Monday, April 13, 2015

Tuesday, March 31, 2015

Tuesday, March 10, 2015

Monday, March 09, 2015

Saturday, March 07, 2015

Wednesday, March 04, 2015

Saturday, February 28, 2015

Friday, February 27, 2015

Wednesday, February 25, 2015

Sunday, February 22, 2015

Saturday, February 21, 2015