Monday, June 30, 2014

The good, bad & missing from Bio* libraries?

As I mentioned recently, I've been exploring how I might use the emerging Julia language to solve problems.  While that requires a large amount of mental work, I see some potential gains, both in having more readable code than Perl as well as to potentially leverage a lot of high-level concepts for parallel execution that are built into the language.  But beyond the challenge of elderly canine pedagogy that I present, there is the issue that the BioJulia library is quite embryonic, with serious consideration of treating much of the existing code base as a first draft (or, that is the impression I get from skimming the Google group).  So I'm going to try to pitch in, despite my multiple handicaps.

Tuesday, June 24, 2014

After the New Yorker piece, what of disruptive innovation?

I don't read a lot of books aimed at the MBA crowd, but one set I have liked, and sometimes cite here, are Clayton Christensen's on inovation and disruption.  As you may have heard, a recent article in the New Yorker by Jill Lepore took a gimlet-eye view to the whole concept and raised serious questions about Christensen's methods.  This was then summarized by another author in Slate and since then Christensen has responded in part via a Business Week interview.  He's also scheduled to be interviewed on PBS this weekend, so likely there will be further developments.  Indeed, after sketching this out on the commute home I discovered a Financial Times article whose tone is very similar to what I have written below.

Tuesday, June 03, 2014

Dabbling with Julia

As I've remarked before, I've done significant coding in a large number of languages over the last 35-or-so years.  I don't consider myself a computer language savant; I've known folks who can pick up new languages quickly and switch between them facilely, but for me it is more difficult.  I haven't tried learning a new language in perhaps 5 years, but this week I backed into one

Wednesday, February 26, 2014

NGS Saves A Young Life


One of the most electrifying talks at AGBT this year was given by Joe DeRisi of UCSF, who gave a brief intro on the difficulty of diagnosing the root cause of encephalitis (as it can be autoimmune, viral, protozoal, bacterial and probably a few other causes) and then ran down a gripping case history which seemed straight out of House.

Monday, February 24, 2014

A Sunset for Draft Genomes?


The sun set during AGBT 2014 for a final time over a week ago.  The posters have long been down, and perhaps the liver enzyme levels of the attendees are now down to normal as well.  This year’s conference underscored a possibility that was suggested last year: that the era of the poorly connected, low quality draft genome is headed for the sunset as well

Thursday, February 13, 2014

How will you deal with GRCh38?

I was foolishly attempting to catch up with Twitter last night during Valerie Schneider's AGBT talk last night on the new human reference, GRCh38. After all, my personal answer to my title is nothing, because this isn't a field I work in.  But Dr. Schneider is a very good speaker and I could not help but have my attention pulled in.  While clearly not the final word on a human reference, this new edition fixes many gaps, expands the coverage of highly polymorphic regions, and even models the difficult to assemble centromeres.  Better assembly, combined with emerging tools to better handle those complex regions via graph representations, means better mapping send better variant calls.

So, a significant advance, but a bit unpleasant one if you are in the space.  You now have several ugly options before you with regard to your prior data mapped to an earlier reference.

The do nothing option must appeal to some. Forgo the advantages of the new reference and just stick to the old. Perhaps start new projects on the new one, leading to a cacophony of internal tools dealing with different versions, with an ongoing risk of mismatched results. Also, cross your fingers that none of changes might be revised if analyzed against the new reference.  Perhaps this route will be rationalized as healthy procrastination until a well-vetted set of graph-aware mappers exist, but once you start putting-off it is hard to stop doing so. 

The other pole would be to embrace the new reference whole-heartedly and realign all the old data against the new reference. After burning a lot of compute cycles and storage space running in place, spend a lot of time reconciling old and new results. Then decide whether to ditch all your old alignments, or suffer an even larger storage burden.

A tempting shortcut would be to just remap alignments and variants by the known relationships between the two references. After all, the vast majority of the results will simply shift coordinates a bit, but with no other effects.  In theory, one could estimate all the map regions that are now suspect and simply realign the reads which map to those regions, plus attempt to place reads that previously failed to map. Again reconciliation of results, but on a much reduced scale.

None would seem particularly appealing options. Perhaps that latter route will be a growth industry of new tools acting on BAM, CRAM or VCF which themselves will provide a morass of competing claims of accuracy, efficiency and speed. Doesn't make me at all in a hurry to leave a cozy world of haploid genomes that are often finished by a simple pipeline!

Thursday, January 16, 2014

Illumina's New Lineup


Illumina made a brace of big hardware announcements at this week's J.P. Morgan conference, and Mick Watson has done a nice job of covering them.  I'll try to cover some different points that have occurred to me after letting the news ferment -- plus Illumina made yet another announcement tonight that scotched a portion of an earlier draft of this piece.