A pretty common question over on Quora is something along the lines of "how do I learn bioinformatics". Great question! Tonight I'm going to outline a project which I think would make a good first bioinformatics project. It is rich in content and keys off an interesting new non-computational result. And since I've left graffiti on multiple Quora threads that I would write something like this in the immediate future, here it is!
Wednesday, April 26, 2017
Saturday, April 22, 2017
In my recent piece on long read assembly, I laid out part of the case against the N50 statistic. Historically, the issues with the statistic have been around the fact it can be gamed at the expense of assembly correctness or assembly coverage. These are concerns for the typical sort of short read assemblies we've grown used to: lots of contigs and the temptation (perhaps justified) to try to go for higher N50s by more aggressive merging or by filtering out the short contigs. Elin Videvall over at The Molecular Ecologist has a nice ongoing series of posts illustrating the statistic and these commonplace issues:
I'm going to come at the problem from the other end, as a new preprint from 10x Genomics illustrates the problem of using an N50 statistic (or any related Nxx statistic) with good long-read / linked read assemblies -- but doesn't demonstrate this point quite as strongly as I thought when I first started drafting this.
Thursday, April 20, 2017
A TV movie produced by and starring American culture mogul Oprah Winfrey is about to hit screens which dramatizes Rebecca Skloot's The Immortal Life of Henrietta Lacks. If you haven't read this remarkable book, you really should. It should certainly be required reading for anyone entering biomedical fields. That's not to claim it is perfect; one of Lacks' sons has objected to the way his family is portrayed. But it is a searing human story of how the most famous cell line in the world came to be. Even if you excuse some of the injustices done as compatible with then contemporary ethical standards, it is a thought-provoking piece on the topic of what our biomedical ethics should be.
Thursday, April 13, 2017
A restaurant I frequented during my grad school days had a map on the wall showing Boston area transit routes from roughly the 1940s. Remarkably, most of those streetcar routes are found largely unchanged in the MBTA's current bus routes. Yes, routes have been altered to account for expansion of the Red Line and shifting of the Orange Line, but most of the routes are little changed and very, very few new ones have been added. Some of that reflects the canalization of routes by the street patterns; there are only so many large streets suitable for buses and Somerville's hills and the various rivers impose further constraints. Much of it lies in the always tight purses at the T and the political difficulty of ever closing an old route to enable moving resources to a new one. Unfortunately, the commuting patterns in Boston are not conserved from the 1940s, with far more workers commuting from distant suburbs and dense developments springing up.
Monday, April 10, 2017
Adaptive immunity is an endlessly fascinating topic which I have not explored very deeply, which is particularly unfortunate given the many parallels to computing. Combinatorial logic is used to construct a vast array of possible antigen readers, expression logic ensures that only one such reader is expressed in a given cell and hypermutation and evolution are used to optimize these readers to match specific antigens. All this not only creates weapons to deploy against foreign invaders, but also a memory which effectively records an individual's history of environmental exposures. Just before I started writing this two tweets highlighted using adaptive immunity profiling to reveal exposure to tuberculosis and cytomegalovirus. Adaptive immunity is responsible for transplant rejection, with new companies looking to more selectively modulate immunity to enable transplants without shutting the immune system down. Adaptive immunity also ties into the white hot field of immunotherapy for oncology, exploring whether differences in antigen response underlay variation in immunotherapy success. To enable profiling adaptive immunity on a mass scale, 10x Genomics has now introduced a single-cell kit for targeted profiling of T-cell receptor variable regions.
Tuesday, April 04, 2017
Advances in optical mapping, linked reads, PacBio and nanopore sequencing are enabling generating highly contiguous large genome sequences routinely and inexpensively. However, this in turn is creating intense demand for efficiently and reliably preparing ultra-high molecular weight (uHMW) DNA. By this term, I mean DNA approaching or exceeding a megabase in size. Methods for preparing HMW and uHMW DNA tend to be very old-school, reaching back at least back to the 1970s, 80s and 90s for approaches used in the early days. Phenol-chloroform preps with the DNA spooled out onto a glass hook or rod are one popular approach; another is to embed cells in agarose blocks, extract the DNA within the block and then degrade the agarose to retrieve the DNA. Nuclei preps are yet another approach. Any liquid handling must be performed gently and with wide bore pipettes. These techniques tend to be tedious and slow affairs, requiring many manual steps. As an alternative, Sage Sciences has launched an instrument which automates a process with no hazardous chemicals, the SageHLS.