As I mentioned recently, I've been exploring how I might use the emerging Julia language to solve problems. While that requires a large amount of mental work, I see some potential gains, both in having more readable code than Perl as well as to potentially leverage a lot of high-level concepts for parallel execution that are built into the language. But beyond the challenge of elderly canine pedagogy that I present, there is the issue that the BioJulia library is quite embryonic, with serious consideration of treating much of the existing code base as a first draft (or, that is the impression I get from skimming the Google group). So I'm going to try to pitch in, despite my multiple handicaps.
A computational biologist's personal views on new technologies & publications on genomics & proteomics and their impact on drug discovery
Monday, June 30, 2014
Tuesday, June 24, 2014
After the New Yorker piece, what of disruptive innovation?
I don't read a lot of books aimed at the MBA crowd, but one set I have liked, and sometimes cite here, are Clayton Christensen's on inovation and disruption. As you may have heard, a recent article in the New Yorker by Jill Lepore took a gimlet-eye view to the whole concept and raised serious questions about Christensen's methods. This was then summarized by another author in Slate and since then Christensen has responded in part via a Business Week interview. He's also scheduled to be interviewed on PBS this weekend, so likely there will be further developments. Indeed, after sketching this out on the commute home I discovered a Financial Times article whose tone is very similar to what I have written below.
Tuesday, June 03, 2014
Dabbling with Julia
As I've remarked before, I've done significant coding in a large number of languages over the last 35-or-so years. I don't consider myself a computer language savant; I've known folks who can pick up new languages quickly and switch between them facilely, but for me it is more difficult. I haven't tried learning a new language in perhaps 5 years, but this week I backed into one
Wednesday, February 26, 2014
NGS Saves A Young Life
One
of the most electrifying talks at AGBT this year was given by Joe DeRisi of
UCSF, who gave a brief intro on the difficulty of diagnosing the root cause of
encephalitis (as it can be autoimmune, viral, protozoal, bacterial and probably
a few other causes) and then ran down a gripping case history which seemed
straight out of House.
Monday, February 24, 2014
A Sunset for Draft Genomes?
The
sun set during AGBT 2014 for a final time over a week ago. The posters have long been down, and perhaps
the liver enzyme levels of the attendees are now down to normal as well. This year’s conference underscored a
possibility that was suggested last year: that the era of the poorly connected,
low quality draft genome is headed for the sunset as well
Thursday, February 13, 2014
How will you deal with GRCh38?
I was foolishly attempting to catch up with Twitter last night during Valerie Schneider's AGBT talk last night on the new human reference, GRCh38. After all, my personal answer to my title is nothing, because this isn't a field I work in. But Dr. Schneider is a very good speaker and I could not help but have my attention pulled in. While clearly not the final word on a human reference, this new edition fixes many gaps, expands the coverage of highly polymorphic regions, and even models the difficult to assemble centromeres. Better assembly, combined with emerging tools to better handle those complex regions via graph representations, means better mapping send better variant calls.
So, a significant advance, but a bit unpleasant one if you are in the space. You now have several ugly options before you with regard to your prior data mapped to an earlier reference.
The do nothing option must appeal to some. Forgo the advantages of the new reference and just stick to the old. Perhaps start new projects on the new one, leading to a cacophony of internal tools dealing with different versions, with an ongoing risk of mismatched results. Also, cross your fingers that none of changes might be revised if analyzed against the new reference. Perhaps this route will be rationalized as healthy procrastination until a well-vetted set of graph-aware mappers exist, but once you start putting-off it is hard to stop doing so.
The other pole would be to embrace the new reference whole-heartedly and realign all the old data against the new reference. After burning a lot of compute cycles and storage space running in place, spend a lot of time reconciling old and new results. Then decide whether to ditch all your old alignments, or suffer an even larger storage burden.
A tempting shortcut would be to just remap alignments and variants by the known relationships between the two references. After all, the vast majority of the results will simply shift coordinates a bit, but with no other effects. In theory, one could estimate all the map regions that are now suspect and simply realign the reads which map to those regions, plus attempt to place reads that previously failed to map. Again reconciliation of results, but on a much reduced scale.
None would seem particularly appealing options. Perhaps that latter route will be a growth industry of new tools acting on BAM, CRAM or VCF which themselves will provide a morass of competing claims of accuracy, efficiency and speed. Doesn't make me at all in a hurry to leave a cozy world of haploid genomes that are often finished by a simple pipeline!
Thursday, January 16, 2014
Illumina's New Lineup
Illumina made a brace of big hardware announcements at this week's J.P. Morgan conference, and Mick Watson has done a nice job of covering them. I'll try to cover some different points that have occurred to me after letting the news ferment -- plus Illumina made yet another announcement tonight that scotched a portion of an earlier draft of this piece.
Monday, January 13, 2014
Relearning Chemistry
An evening ritual is to inquire what homework requires assistance, and at the beginning of the year it was a science worksheet as part of an introduction to chemistry. That, and a later project, have exposed how much rust my knowledge of chemistry has accumulated, but also have led me down the path of repairing forgotten bits and certainly learning some new stuff
Wednesday, January 01, 2014
Envisioning The Perfect Scaffolder
Rather than make any New Year's resolutions of my own, which I would then feel guilty about not keeping, I've decided to make one for someone else: they will write the perfect open source scaffolder. There's a lot of scaffolders out there, both stand-alone and integrated into various assemblers, but none are quite right.
If you are sequencing an isolated bacterium or archean and are looking for a scaffolder, except in a few rare cases, you're doing something wrong: given enough long reads from PacBio it should be possible to solve nearly every bacterial genome. But, if you're sequencing eukaryotic genomes or any metagenome (or you're unlucky or data short on a simple microbial genome), you're probably in the market for one. I'm going to supply a list of attributes I cooked up during a long drive up the Eastern Seaboard today, without much regard for feasibility or even if some conflict with each other.
Tuesday, December 31, 2013
Peering Through the Flowcell Glass, Darkly
As 2013 draws to a close, I've decided to stick my neck out and make some predictions for 2014. Perhaps I'll get lucky and a few will even come true! After several mental experimentations on the structure, I'll settle for stepping roughly past each major player.
Tuesday, December 17, 2013
Assembly Could Benefit From More Circular Reasoning
It was very gratifying to get comments on my recent piece on a de novo assembly review from both a referee of the manuscript (the amazing Heng Li) as well as one of the authors of the piece (though I am truly feeling guilty I forgot to reach out to the authors). Of course I was having my usual post-post regrets of things not written, such as the whole interesting topic of dealing with (and leveraging) uneven coverage in metagenomes and when assembling from amplified samples. But one other thing I was reminded of is one of the minor complaints I have with assembly programs: a lack of proper handing of circular genomes.
Sunday, December 15, 2013
Assembling a Review of a Review of Assembling
A review on short-read de novo genome assembly appeared recently in PLoS Computational Biology, titled "Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges". I think the review has a number of merits, but I also find a number of frustrating flaws. I'm going to write this entry much as I would have written a referee report on it. Unfortunately, that will mean I'll dwell a bit more on the flaws than the assets, but if you are interested in the field
Friday, November 15, 2013
Did The Biochemists of Yore Know Morse Code?
So, this piece is going to be mostly asking questions. In one of the corners of my dream world I have a scientific historian on retainer, but in the real world my substitute is to throw some questions out and hope some knowledgeable people leave comments. If someone I spark someone’s term paper or thesis topic, I ask only that I get an electronic draft!
Friday, October 25, 2013
Spanish Prisoner, ZX-81 or Turbo Pascal?
In the movie The Spanish Prisoner, a brilliant inventor possesses a paranoia that "The Process" he has invented will be stolen by deceitful competitors, and everyone speaks with a highly distinctive cadence. The entire movie is suffused with deceit, starting with the title which is a notorious con scheme akin to the modern Nigerian scam. I spent last evening in some of the space in which the movie was filmed listening to a scientist in that mold (& distinctive speech) describe a process his group has invented (indeed, by lucky chance I helped him find the venue). But many remain unconvinced that Clive Brown and Oxford Nanopore are not themselves the puller of ocular wool.
Tuesday, October 22, 2013
Ion Previews More Accurate Polymerase, Faster Template Prep
I haven't talked about Ion Torrent for a while, because it was largely off my radar screen. In early 2012 the PGM had been an important contributor to my early de novo genome assemblies, as it was the only fast turnaround, low cost system I could access. But the data quality was always frustrating, with many indels, and the 200 basepair mode on the read lengths not great for assembly. Once I could access a MiSeq, that became our dominant instrument for individual genome assembly. We tried Ion once more with the 300 basepair chemistry, but were not particularly impressed.
Saturday, October 19, 2013
Ripples from 454's Shutdown Announcment
Roche's announcement this week that they planned to shut down the 454 sequencing business in mid-2016 was not completely unexpected, as a number of rumors of shutdown had shown up on Twitter. Most tweets on the subject fell into two categories: either just-the-facts-ma'am or jokes about the dominant error profile (which I guess you could call just the facts maaa'aaam). But, certainly I wouldn't have thought Roche on the verge of this decision when I went to AGBT 2013 in February, as 454 had a huge suite in a prime location (just by the main conference hall entrance) and many expensive events. Now, Roche's presence in the genomics space is looking like just the recently announced deal with PacBio to market human diagnostics on that platform.
Thursday, September 26, 2013
Roche Taps PacBio for Human Diagnostics
One of the two big buzzes in the genomics business world was the announcement that Roche Diagnostics has signed a major deal with Pacific Biosciences in the field of human diagnostics, which comes with a $35M upfront payment and a possible $45M in milestones, plus future sales of reagents. PacBio stock rocketed over 70% on this news. This on the same day that cancer diagnostics company Foundation Medicine went public with a similar potent climb from their offering price; a good day for those lucky enough to have the shares (which, by the way, does not include me in any way, though Foundation shares a common venture backer with Warp Drive Bio in Third Rock Ventures).
Monday, September 23, 2013
Potential Sources of Drag on PacBio's Long Read Performance Trajectory
Saturday, August 24, 2013
SGE Isn't For Dummies (I sort of wish it were)
Kendall Square used to have the ultimate geek book store, Quantum Books. No fiction or graphic novels there; it was all technical books. One could browse every O'Reilly book and many, many others.
Sunday, June 30, 2013
My biggest contribution to the field of biochemistry
LinkedIn has a feature by which one can endorse other people for different fields. Periodically the system prompts me to vote yea-or-nay on a bunch of endorsements, and conversely I get regular updates as to what others have endorsed me. It's always nice to get a vote of confidence, but sometimes I find myself wondering what it really means.
Subscribe to:
Posts (Atom)