Sunday, December 10, 2017

2017 Nanopore Community Meeting: An Incomplete Summary

The 2017 Nanopore Community Meeting was over a week ago back in New York City, so I'm grossly overdue in cobbling together some observations and opinion based on the tweet stream (I had a critical day job meeting at the same time and wasn't in New York).  I did dash off the bit about SmidgION being potentially like the early Macs (though I got wrong the nomenclature, the original was the Mac 128K -- Mac Classic was a later model that resembled it).  Oxford also deviated this autumn from the pattern of public information they had seemingly established, with major news at London Calling and smaller updates at the community meeting but also a pair of Clive Brown webcasts each falling roughly halfway between the two meetings.  This fall, no webcast.

Nanopore's have their own Day 1 and Day 2 writeups and an independent write-up from Arwyn Edwards.


Per the usual pattern, Oxford showed off previously announced hardware but made no solid announcements.  I've put together a Storify of relevant tweets which may hold further information.


SmidgION pumping out data with an attached Android phone calling the bases was a heavily tweeted and retweeted photo.  Alas, Oxford apparently put release of the SmidgION/Flongle components into the second half of next year, so no SmidgIONs adorning Christmas trees this year while happy recipients sing Flongle Bells ("Oh what fun, it is to sequence, in a one horse open sleigh, hey!").

Seriously, as suggested by the previous post I think these smaller flowcells are going to be hugely popular and influential.  For training and educational purposes, small is better.  The targeted application of field operations will be huge.  

But I think in the end the biggest use will be for many applications in which there are large numbers of samples from which small amounts of data will answer the scientific question and where multiplexing isn't a good solution.

To give one example, there is one of the burning questions of DNA sample prep: what contaminants damage flow cell performance?  Obviously that isn't a question suitable for multiplexing!

But there will be many others, particularly for counting applications.  Especially if "no library" approaches are developed along the lines suggested previously by ONT for their Cas9-based schemes.  If creating a sequencer-ready sample consists of just pipeting a small amount of inexpensive reagent, then a lot of new applications will open up.


No real news specifically about GridION X5, other than that many people have tweeted out pictures of their new GridION instruments and there have been very few reports of problems (I know of at least one example of one being dead-on-arrival, but that seems to be rare).  

But the big news tied to GridION is the launch of the first two contract research nanopore sequencing services, with the Garvan Institute in Australia and BaseClear / Future Genomics Technologies in the Netherlands.  Since Oxford won't license MinION users for service sequencing, only the availability of GridION made this possible.  Presumably nailing down a U.S.-based operation is a priority for ONT; I've shipped samples overseas for sequencing but it is never a calm process plus it creates additional scheduling headaches (never, never let your samples sit around at a shipping firm over the weekend!).


I wrote a very critical piece on PromethION last year.  The instrument isn't out of the woods yet, but 
Twitter traffic does suggest that Oxford is sending out small quantities of good flowcells.  Clive Brown tweeted that his yield from a PromethION flowcell is pushing what would be needed for 30X coverage of a human genome; of course Clive's yields are historically about 2X the best field yields and 3-4X better than what most users achieve.  So perhaps PromethION will be a real star of data production for London Calling 2018 presentations, but I certainly don't see that as a sure thing.

Basecaller Widget

ONT started showing off their prototype of the FPGA-powered stand-alone basecalling widget, also announcing a contest to name the device.  


VolTRAX is still in the "VIP" beta test phase, which I am not part of.  I believe the only available kit is still the rapid 1D DNA kit, which hasn't attracted a fan base as the conventional protocol is so simple.  ONT promised version 2 flowcells which will have capabilities such as thermocycling.


On the software side, Oxford touted their improved Scrappie basecaller and a new Tombo package for modified base analysis. You can find tweets on this and others related to base modification in a Storify.

I really can't do justice to Ryan Wick's talk -- if you want to get the latest on basecalling performance, check out the publication-ready README file from Ryan Wick which compares just about every known basecaller -- including the not-yet-public Guppie GPU caller -- on a variety of metrics.  Here's one example, showing raw basecalling accuracy.

Cold Chain

ONT has been making progress in reducing the cold chain requirements for select kits.  Flowcells are now being shipped wrapped in wool and they are beta-testing lyophilized versions of library prep reagents.  That would of course be huge for field use, but not inconsequential would be reducing the shipping costs for all users.  If you're going to be a low cost platform for hobbyists and educators, those shipping charges add up.


Not ONT, but a company called Circulomics announced plans for a sample preparation technology called NanoBind.  These are described as
a thermoplastic disk that contains a high density of micro- and nanostructured silica. This unique structure enables vast amounts of DNA to bind and release without being damaged. Processing occurs through a rapid bind, wash, and elute process that parallels magnetic beads and is easily automated.
Prep time is promised at 45 minutes and claimed to deliver up to milligrams of high quality, high purity HMW DNA from 1.5mL of input material


Probably the biggest splash of the meeting was the release of a large consortium RNA dataset for human cell line NA12878, with both 13 million direct RNA reads (from 30 flowcells) and 24 million cDNA reads (from 12 flowcells), all released on github

With both the RNA and DNA, even this set of highly experienced labs obtained greatly varying yields.

Still, getting hundreds of thousands of RNA reads is nothing to sneeze at (particularly since that would spread RNase around the lab!).

More importantly, a large number of the direct RNA reads -- and far more than the cDNA reads -- appear to represent full length transcripts.  Furthermore, the poly-A tail lengths can be accurately estimated with the direct RNA, even when they are hundreds of As long.

Basecalling accuracy is in the same neighborhood as DNA, with RNA performing slightly better.

There's a lot more in that README file -- identifying base modifications in RNA, capturing multiple splice forms, etc.  I'll try to dig more into that soon.

A number of users also presented exciting RNA results, particularly for direct sequencing of RNA viral pathogens such as flu and rabies.  I've put all the RNA-related tweets into a single Storify.

At least one talk debuted single-cell RNA sequencing on nanopore.  Another talk referenced Deb Peattie's pioneering work on chemical sequencing of RNA back in the 1970s.

Other User News

MinIONs continue to go to previously unimaginable locations -- perhaps the strangest one presented here was deep in a mine.  Nick Loman reviewed again his group (particularly featuring Josh Quick) sequencing Ebola and Zika in the field.  More tweets and talks in the a Storify focused on field uses.

Rachel Rubinstein of Ginkgo Bioworks described how a fast nanopore run saved hundreds of thousands of dollars by identifying the contaminating organism in a bioreactor. 

There were multiple talks on antibiotic resistance and pathogen detection (disclosure: my day job is looking for new antibiotics and I am doing light consulting for a company in the sequencing-by0-diagnostics space).   I've collected tweets on those topics in a Storify -- except a few I missed in preparing that from Claire Jenkins on getting pathogen sequence databases filled out.

Other worthy talks I'm going to reduce to tiny summaries: Steven Salzberg on assembling wheat,
Svetlana Madjunkova on pre-implantation genetic screening, Chia-Lin Wei on structural variants.  And so many more.  Watch my Twitter for announcements of a few more Storify pages from the 600 or so tweets which haven't been incorporated in the ones mentioned above.

Thursday, December 07, 2017

On the Problem of Sequence Leakage

I've been spending some time lately in an unfamiliar world: the eukaryotic section of NCBI's NR protein database.  I've been almost exclusively a bacterial guy for six years, but the other side of starbase had an interest in find homologs of a particular protein so I went diving for some.  That experience has reminded me of two serious issues with public sequence databases.  Tonight I'll dash off a bit about one; expect the other complaint to show up in the not-so-distant future. And tonight's lament is the increasing dispersion of sequence respositories.

Sunday, December 03, 2017

SmidgION: Mac Classic for the 21st Century?

Apple launched the Macintosh computer with a famous television ad playing on the launch year, 1984. What emerged was what we now know as the Mac Classic.  What may be less known is why the Mac Classic had that distinctive shape: it was intended to be backpack-portable, as Apple had a deal with a consortium of top U.S. universities to sell Macintoshes to their students.  Perhaps even more forgotten is that one of those schools, Drexel University in Philadelphia, made owning a Macintosh a requirement for students.

Monday, November 06, 2017

A Nucleotide Mixture-Based Error Correcting Short Read Chemistry

Sometimes polony-style short read sequencing seems like old news.  The underlying technology has been commercially available for over a decade.  I focus much of my attention to gains in long read technologies, though incremental improvements to read lengths or polony densities still appear.  Now in Nature Biotechnology a group from Peking University has published a new twist on sequencing-by-synthesis that is claimed to offer significant improvements on read accuracy.

Wednesday, November 01, 2017

AlphaGo & Biology

A comment was left on an early piece suggesting I comment on the recent AlphaGo paper and the possible applicability of this approach to biomedical sciences.  I'm not sure I have anything terribly original to say, but who can refuse a request?

Tuesday, October 17, 2017

Mission Bio Launches Tapestri Single Cell Platform

The fact that tumors and their immediate environment is genetically heterogeneous has long been known, but tools for high-throughput assessment of this heterogeneity have only recently become available.  The whole field of single cell RNA-Seq has seen spectacular growth, as new methods enable greater and greater numbers of cells to be profiled from a sample.  Profiling the DNA content on an individual cell basis has not been quite as much in the spotlight, but now a start-up called Mission Bio is launching a microfluidic library prep workflow, Tapestri, to enable amplicon panels to be run in single cell mode.

Friday, October 13, 2017

iGenomX Riptide Kits Promise a Sea of Data

A theme for me in my six years on Starbase has been addressing the challenge of cost-effectively sequencing many small genomes.  While sequence generation bulk prices have plummeted, all-in library construction cost has tended to stubbornly resist dramatic change.  Large genome projects don't face quite such a pinch, but if you want to sequence thousands of bacteria, viruses or molecular biology constructs, paying many-fold more for getting a sequence into the box than you're paying to move it through the box ends up being a roadblock. Illumina's Nextera approach dropped prices a bit, but not really a sea change.  Various published protocols drop  costs further via reagent dilution, but these can suffer from variable library yield and an increased dependence on precise input DNA quantitation and balancing.  Even then, the supplied barcoding reagents for Nextera handle at most 384 samples, and that is only a relatively recent expansion from 96. I previously profiled seqWell's plexWell kits, which like Nextera use a transposase scheme but with modifications to enhance tolerance to input sample concentration variation.  plexWell also enables very high numbers of libraries, which better mates projects with large numbers of small genomes to sequencers with enormous data generation capabilities.  Now comes another entrant in the mass Illumina library generation space: iGenomX, which has reformatted their chemistry from a microdroplet mode intended for linked read generation to a 96-well plate format requiring no unusual hardware.

Wednesday, October 04, 2017

PacBio's Frankenpatent on Error Correction

Well, here we go again.  Pacific Biosciences launched yet another patent lawsuit towards Oxford Nanopore at the end of September, and already the hounds are baying for me to look at the patents -- which I've foolishly established a reputation of doing. I will remind readers that, to use a construction that exasperates my son, I have no memory of these topics being covered during the time I was in law school. (said construction also works for divinity school, seminary, yeshiva, dental school, military academy, etc). 

Sunday, October 01, 2017

Dispatches from CDC AMD Day 2017

I had the singular honor and pleasure of speaking this past Monday at the Center for Disease Control and Prevention's Advanced Molecular Detection(AMD) program's annual confab in Atlanta.  Just visiting the CDC campus was already a bit magical -- along with the Kennedy Space Center and Cold Spring Harbor it's one of mythical places of human exploration to me.  But to actually stand at the podium? Wow!

I've collected below a bunch of separate mental threads, many of which probably should be expanded out to a full post in the future.

Sunday, September 24, 2017

Why Is LISP So Rare in Bioinformatics?

LISP is one of the oldest computer languages and perhaps one of the most influential of the early ones.  Some of the other well-known Eisenhower era languages -- Fortran, COBOL and ALGOL, have certainly left their mark, but LISP and derivatives such as Scheme or Common LISP certainly carries more cachet among "serious" programmers.  COBOL has always been a bit of an easy joke and Fortran tends to mark you as old-school; use of APL (once a language of mine) would mark you as dangerously reactionary.  ALGOL begat Pascal and Modula II and clearly had impact on the C syntax family of languages (including bioinformatics mainstays Python, Perl and Java) As I'll detail below, learning LISP has embarrassingly ended up stuck seemingly permanently on my future plans queue.  But that's also because life never forced the issue:  while LISP has certainly been used in bioinformatics (as covered in a review from 2016 ) , its mindshare in the community would seem to be very minimal.

Monday, September 18, 2017

Teaching Biology Evidence: Old or New?

I've been toying over a week with writing something based on an interesting Twitter discussion started by Dr. Laura Williams (@MicroWavesSci) of Providence College pondering the best way to approach teaching molecular genetics (really, science in general) at the undergraduate level.  In particular, Professor Williams wondered about the dangers of branding various key experiments with the names of the experimenters, such as Hershey-Chase or Meselson-Stahl.  The risk she points out is that this can devolve into an exercise in memorizing names and dates without assimilating concepts, or conversely that some students will find the names more of a hindrance than a help.  I'm going to play a bit with this, but I do emphasize that for her this is reality and for me it is a hobby (or perhaps a retirement fantasy, if I should ever actually retire).  Or in other words, for the academic this is her industry but for this industrial scientist it is academic.

Tuesday, August 29, 2017

The Curse of Spammotation Lives!

High throughput sequencing of genomes is over twenty years old, which demanded the development of automated pipelines for annotating this data.  I've worked on such pipelines since the early 1990s, implementing them as a student and at two different corporate stops.  Indeed, we were reviewing results from my pipeline versus some of the other ones out there to see what can be done better.  And unfortunately, I've found infuriating problems with RefSeq entries annotated with NCBI's bacterial genome annotation pipeline.  Now I'm usually one to sing the praises of NCBI -- they are a key resource for biological research and they make available multiple spectacular public services freely to the entire world.  But I'm afraid this time I need to vent.

Tuesday, August 15, 2017

DNA vs. the Machine

Last week's news contained a story sure to raise eyebrows.  A group of computer security researchers from the University of Washington claimed to have demonstrated that they could hijack a computer via sequencing a carefully-constructed DNA fragment.  Visions of NextSeqs rampaging through the streets immediately sprung to mind.  The paper is interesting and has some useful warnings for the bioinformatics community, but certainly the news coverage has been strong on hype and alarmism.

Saturday, August 05, 2017

Computational Biology & Math: Am I Just Faking It?

Over on Quora a common type of question is "Can I be a computational biologist if I am now an X".  Personally I take a very broad view and think just about anyone with intellectual curiosity can become any kind of scientist.  A related type of question is "how skilled do I need to be in Y to succeed in computational biology", where Y is most often programming, biology or math.  I got thinking about this and started wondering whether I am actually at all skilled in math.  Here is the results of that analysis.

Friday, July 21, 2017

A Third GridION X5 Pricing Plan

When Oxford Nanopore announced their GridION X5 instrument in March, I and others attempted to parse the difference between the two pricing plans  -- and I made a bit of a hash of it.  The X5 runs 5 MinION flowcells independently in parallel from a single desktop instrument, which also includes FPGA-based acceleration of basecalling plus a license to perform sequencing-for-hire.  Indeed, Matt Loose tweeted out an image of an "X6" and then mention of an "X7"; the X6 had a MinION plugged into the USB port and apparently the FPGA unit can keep up with seven flowcells all running simultaneously.  Now Oxford has launched an interesting third "Starter Pack" plan that offers an even lower price point for the system.

Wednesday, June 28, 2017

STAT Proves Not Resistant To Antibiotic Tropes

Tuesday's Boston Globe carried a piece originating from STAT news on an interesting natural product antibiotic, pleuromutilin.  A research group recently published a new total synthesis of this fungal terpene, an advance which promises to enable greater medicinal chemistry around the molecule.  That part is cool.  Unfortunately, when it gets to the biology of pleuromutilin the piece by Eric Boodman completely spits the bit, trotting out some horribly inaccurate tropes.

Wednesday, June 14, 2017

New Life in the Sanger Market

In my bit on "I'm not dead yet" technologies recently, I included large scale Sanger sequencing. That reflects to a large degree my personal experiences and biases.  Targeted Sanger is great for spot checking the occasional junction or misbehaving clone or strain, but I forget that many clinicians still see it as a gold standard.  Apparently there are others who disagree with me, as Thermo Fisher recently launched a new Sanger instrument targeted at small labs, and according to GenomeWeb Promega plans an instrument offering in the same space as well.

Tuesday, June 06, 2017

Ice Ghosts:A Shortage of Maps

I'm going to step outside the usual topic space here and cover an interesting but frustrating book I read partly on the flight to London Calling (which is about the only connection it has to genomics).  Ice Ghosts, by Paul Watson, covers the searches for the lost Franklin Expedition, a mid-1800s British Navy attempt to find the Northwest Passage.  It's a pretty good book, after all it did win a Pulitzer Prize,  The topic is thrilling: explorers under difficult conditions and a mystery that lasted over a century.  There are lessons for science in general, such as the value in carefully evaluating oral histories that some would discard as unreliable. But what is maddening for me is that in a book for which a central theme is poorly understood geographies and their interpretations, the set of supplied maps fail miserably at assisting in the telling of the story.

Monday, May 22, 2017

What Is (and Is Not) Sequence Assembly?

In the closing talk of the pre-London Calling workshop, Hans Jansen had closed his presentation with a question whether at some future date sequence assembly would become obsolete.  This was meant to be an aspirational vision for a distance timepoint, but one correspondent on Twitter saw it as hype.  I got in a bit of a discussion, constrained by the dreaded 140 character limit, which ended up largely illustrating that I have a somewhat more restricted definition of assembly than some people.  I'm going to explore this and you can judge for yourself

Thursday, May 18, 2017

London Calling 2017: Plant & Animal de novo Genomes

Okay, I'm desperately behind on writing up the external science from London Calling.  Not helpful that I claimed I would not only do so, but in multiple installments.  A number of the plenaries focused on large genome assembly, so that's what I'll tackle now -- plus a few other bits.   See also my Storify summaries, which include other reports on the conference.  Also check out my storifies on the SMRT Leiden conference, which ran at the beginning of the same week and discusses many similar topics.

Sunday, May 14, 2017

SFAF & I'm Not Dead Yet Technologies

Jonathan Jacobs posted his annual reminder that the Sequencing, Finishing and Analysis in the Future Meeting (SFAF) will be this week.  Alas, that meeting hasn't had many more tweeters in the past than Jonathan, but perhaps this year there will be more.  There's a glut of genomics conferences to track, compile tweets and opine on -- besides London Calling, there's been SMRT Leiden and Biology of Genomes, all in the span of two weeks!  This post is going to be a bit short on actual writing and more to just flag some talks at SFAF that grabbed my attention.  What I realized is that the talks at SFAF illustrate that a number of technologies I consider effectively dead retain significant attention.

Tuesday, May 09, 2017

London Calling 2017: A Theme of Consolidation

London Calling 2017 came to a close last Friday.  Any excuses of jet lag or nights running up ONT's bar tab won't hold up much longer, so time to finish this post (I really did start the night after Clive's talk!) I'm going to largely divide coverage on the dividing line of who presented: today's piece on Oxford Nanopore presentations, particularly Clive Brown's, and in the near future at least one focusing on the science users presented.  For other summaries of the action, I've created a storify of just blog posts and similar summaries of the meeting, as there were a great number (and I am on the hunt for additional ones I've missed)

Thursday, May 04, 2017

Nanopore Workshop Notes

I attended on Wednesday the London Calling pre-conference workshop, an add-on for those wishing for help getting started with MinION sequencing.  Judging from who I spoke to, many participants were utterly new to nanopore sequencing and more than a few were like me in that they had tried the platform and wanted to do better.  My colleague has gotten some very good results recently, which has re-fired my determination to get good at that myself.  Below are some limited notes I took that may be of general interest. Large portions of the workshop will go largely uncovered, as I focused on what was surprising or new.

Tuesday, May 02, 2017

London Calling 2017: A Preview

Oxford Nanopore's London Calling confab runs Thursday and Friday, with a training workshop on Wednesday.  I'll be there -- who can resist a conference nearly at the Tower of London? -- and will also be testing whether my personal "field of nanopore sequencing suppression" can defeat ONT's best trainers.  Here's some preview of what I'll be particularly looking for, though being surprised will be lots of fun too. Much more fun that reading (the wrong) patents!

Monday, May 01, 2017

Oxford Nanopore's Enigmatic Patent Litigation

Oxford Nanopore has launched lawsuits in the UK and Germany against Pacific Biosciences, alleging infringement of a European patent licensed from Daniel Branton's lab at Harvard, EP1192453, which is apparently exclusively licensed to Oxford.  When I wrote about Pacific Biosciences first lawsuit against Oxford Nanopore late last year I titled it "PacBio's Quixotic Patent Litigation", as it appeared the Oxford could easily dodge the lawsuit by abandoning the 2D sequencing technology, which Oxford is in the process of doing.  I've swapped in "enigmatic" for this title, as I'm not even sure what aspect of PacBio is allegedly infringing the patent.

Wednesday, April 26, 2017

Exercise: A Sequence Signature for Transcription-Translation Coupling in Bacteria?

A pretty common question over on Quora is something along the lines of "how do I learn bioinformatics".  Great question!  Tonight I'm going to outline a project which I think would make a good first bioinformatics project.  It is rich in content and keys off an interesting new non-computational result.  And since I've left graffiti on multiple Quora threads that I would write something like this in the immediate future, here it is!

Saturday, April 22, 2017

Pinniped Karyotypes & N50 Statistics

In my recent piece on long read assembly, I laid out part of the case against the N50 statistic.  Historically, the issues with the statistic have been around the fact it can be gamed at the expense of assembly correctness or assembly coverage. These are concerns for the typical sort of short read assemblies we've grown used to: lots of contigs and the temptation (perhaps justified) to try to go for higher N50s by more aggressive merging or by filtering out the short contigs.  Elin Videvall over at The Molecular Ecologist has a nice ongoing series of posts illustrating the statistic and these commonplace issues:
I'm going to come at the problem from the other end, as a new preprint from 10x Genomics illustrates the problem of using an N50 statistic (or any related Nxx statistic) with good long-read / linked read assemblies -- but doesn't demonstrate this point quite as strongly as I thought when I first started drafting this.

Thursday, April 20, 2017

Time to Retire HeLa?

A TV movie produced by and starring American culture mogul Oprah Winfrey is about to hit screens which dramatizes Rebecca Skloot's The Immortal Life of Henrietta Lacks.  If you haven't read this remarkable book, you really should.  It should certainly be required reading for anyone entering biomedical fields.  That's not to claim it is perfect; one of Lacks' sons has objected to the way his family is portrayed.  But it is a searing human story of how the most famous cell line in the world came to be.  Even if you excuse some of the injustices done as compatible with then contemporary ethical standards, it is a thought-provoking piece on the topic of what our biomedical ethics should be.

Thursday, April 13, 2017

Alexandria Jumps Into Shuttle Business

A restaurant I frequented during my grad school days had a map on the wall showing Boston area transit routes from roughly the 1940s.  Remarkably, most of those streetcar routes are found largely unchanged in the MBTA's current bus routes.  Yes, routes have been altered to account for expansion of the Red Line and shifting of the Orange Line, but most of the routes are little changed and very, very few new ones have been added.  Some of that reflects the canalization of routes by the street patterns; there are only so many large streets suitable for buses and Somerville's hills and the various rivers impose further constraints.  Much of it lies in the always tight purses at the T and the political difficulty of ever closing an old route to enable moving resources to a new one.  Unfortunately, the commuting patterns in Boston are not conserved from the 1940s, with far more workers commuting from distant suburbs and dense developments springing up.

Monday, April 10, 2017

10x Launches Mass T-Cell Receptor Decoding

Adaptive immunity is an endlessly fascinating topic which I have not explored very deeply, which is particularly unfortunate given the many parallels to computing.  Combinatorial logic is used to construct a vast array of possible antigen readers, expression logic ensures that only one such reader is expressed in a given cell and hypermutation and evolution are used to optimize these readers to match specific antigens.  All this not only creates weapons to deploy against foreign invaders, but also a memory which effectively records an individual's history of environmental exposures.  Just before I started writing this two tweets highlighted using adaptive immunity profiling to reveal exposure to tuberculosis and cytomegalovirus.  Adaptive immunity is responsible for transplant rejection, with new companies looking to more selectively modulate immunity to enable transplants without shutting the immune system down.  Adaptive immunity also ties into the white hot field of immunotherapy for oncology, exploring whether differences in antigen response underlay variation in immunotherapy success.  To enable profiling adaptive immunity on a mass scale, 10x Genomics has now introduced a single-cell kit for targeted profiling of T-cell receptor variable regions.

Tuesday, April 04, 2017

SageHLS: Automated uHMW DNA Preparation

Advances in optical mapping, linked reads, PacBio and nanopore sequencing are enabling generating highly contiguous large genome sequences routinely and inexpensively.  However, this in turn is creating intense demand for efficiently and reliably preparing ultra-high molecular weight (uHMW) DNA.  By this term,  I mean DNA approaching or exceeding a megabase in size.  Methods for preparing HMW and uHMW DNA tend to be very old-school, reaching back at least back to the 1970s, 80s and 90s for approaches used in the early days.  Phenol-chloroform preps with the DNA spooled out onto a glass hook or rod are one popular approach; another is to embed cells in agarose blocks, extract the DNA within the block and then degrade the agarose to retrieve the DNA.  Nuclei preps are yet another approach. Any liquid handling must be performed gently and with wide bore pipettes.  These techniques tend to be tedious and slow affairs, requiring many manual steps.  As an alternative, Sage Sciences has launched an instrument which automates a process with no hazardous chemicals, the SageHLS.

Thursday, March 30, 2017

Chromosome-Scale Scaffolds And The State of Genome Assembly

A new paper on using Hi-C sequencing appeared in Science recently, demonstrating the generation of chromosome-length scaffolds for human as well as several insect genomes.  The authors even provide a cost model, proposing that by processing multiple genomes in parallel the sequencing reagent cost (but not labor) of this approach should be about $10K per human genome. In the case of the insect genomes, the paper enables a look at chromosome evolution which is simply impossible with lower resolution.  These findings resonate with a number of pieces I've written over the years, but particularly with my recent criticism of the proposal Earth BioGenome project and a spirited defense of that concept made in the comments of my piece by a member of the steering committee.

Monday, March 27, 2017

Differential Mammalian Toxicity: Why Do Some Human Foods Kill Dogs?

I've been contemplating this post for a while, but it can be seen as another angle on my recent post on the challenges of drug discovery, so it finally left the mental queue.  We often use other mammalian species in drug development to predict human toxicity.  We know animals aren't the same as people, but lacking a better alternative that's what we do.  Now, as regular readers know I keep company with a dog, and that sometimes has me wondering: how well do we understand the cases of things we can eat but which are dangerous for our canines?

Saturday, March 25, 2017

Targets: Drugability Revisited

My correspondent @datarade shot a tweet my way on his quest to understand drug discovery. He does this despite the fact I've promised posts on previous tweets that are submerged in my mental queue.  But the best part of teaching is forcing yourself to rethink what you think you know, so I'm going to actually take this one on in the space of "what is a target, how do we pick them and how do we drug them".  Which I've found to be enlightening and frustrating.  It's a messy space because so much is empirical, and I keep devising and then discarding taxonomies and explanatory approaches because they all seem unsatisfactory.

Tuesday, March 21, 2017

Obviousness: Rarely Obvious

Pacific Biosciences has made new thrusts in their ongoing intellectual property action against Oxford Nanopore, adding two recently issued patents to the fray.  Oxford has publicly brushed these off as "another pore excuse for a lawsuit", but certainly the battle is not over.  One of these patents, 9,542,527 "Compositions and methods for nucleic acid sequencing", appears to concern using hairpin linkages to read both strands, much like the 9,404,146 "Compositions and methods for nucleic acid sequencing"  patent that PacBio led with.  Since Oxford has announced they will abandon their "2D" methods that use such hairpins, this angle would seem to be soon irrelevant (as I predicted back when PacBio originally attacked).  But the other, US 9,546,400 "Nanopore sequencing using n-mers" covers basecalling methods, which is a new twist.  A route to challenge any patent is to identify "prior art", information which was publicly available at the time of the patent filing which impinges on the claims in the patent application.  Not only can exact matches to prior art be an issue, but also anything which would be "obvious" to a skilled practitioner.  And that can certainly be a can of worms

Monday, March 20, 2017

plexWell: Illumina Libraries by the Plateload

The advent of so-called next generation sequencers, particularly those from Illumina, have brought the price of sequence data down dramatically.  However, there is a catch: the cost of preparing DNA to go into the sequencer, the process known as library preparation, has glided downwards on a much shallower trajectory.  This means that for projects wishing to sequence very large numbers of small genomes or large constructs the cost of library preparation can be similar to or even exceed the cost of data generation.  A small company north of Boston called seqWell Inc has a new approach to Illumina library generation which they are on the cusp of making widely available, and not only does this bring the cost per well down but it is designed to yield normalized libraries from relatively unnormalized samples.

Tuesday, March 14, 2017

ONT Updates: GridION X5, PromethION, 1D^2, Scrappie, FPGAs and More

Clive Brown gave a webcast today with updates on a number of Oxford Nanopore topics, but clearly the flagship announcement was a new instrument, GridION X5.  Due to the raging snowstorm in the Boston area I was home with my teammate and we've been doggedly going through the tweets (now storified) and my notes (plus David Eccles' nice set) to retrieve the juiciest bones therein.

Wednesday, March 08, 2017

MinION Leviathan Reads: An Update

Last week I posted a piece on some amazing new nanopore data, only to be red-faced to discover the next morning that I had misread the axes.  So I re-posted the piece with the offending data and subsequent analysis in strike-thru font.  After I did that, I was informed that the same dataset actually did have leviathan reads, bigger than my misinterpretation.

Thursday, March 02, 2017

Catching Up On Oxford Nanopore News: More, Better, Meth & Huge

Oxford Nanopore and its collaborators have shown at least three interesting advances in the last few months which I haven't yet covered; the most astounding of which was announced this week.  I'll take these three in an order which works logically for me, though it isn't strictly chronological plus I'll touch on some parts of their platform which have not made advances which were perhaps expected.

(Morning after: Ugh, ugh, ugh -- I misread an axis, inserting an extra 0 -- so major crossouts in one section; why I shouldn't post late at night during pauses in day job stuff)

Tuesday, February 28, 2017

Earth BioGenome Project: Ill-Conceived Megaproject Du Jour

There's been a bit of buzz recently about an unfunded proposal to ultimately sequence every living species on Earth, warming up by sequencing every eukaryotic species, with a targeted cost of $4.8B.  It pains me a bit to write this, but I'm with those who think this is not a wise way to spend money and certainly not likely to work for anywhere near that budget.

Friday, February 17, 2017

#AGBT17 Tweet Archive is Up!

I've used my scheme for collecting and organizing tweets to capture most of the feed from this week's AGBT17 conference.  I still need to pore over these in detail, so I won't try to distill out much thoughts (other than single-cell sequencing is clearly in exponential growth phase!).

Monday, February 13, 2017

Bagging Novel Enzymes Via Mass Spec Metabolomics

Obtaining a complete genome sequence for a bacterium or archean is essentially a solved problem, if you can culture the bug.  Grow up biomass, purify the DNA and then use PacBio alone or a combination of long reads (PacBio or Oxford Nanopore) and short reads.  These should yield a closed genome with a very low error rate.  A few bugs spit at you by repeated failing PacBio sequencing or having some monster prophage or other repeat that is longer than the read lengths, but these are very rare.  With advances in metagenomics techniques, the solving of uncultured genomes is becoming increasingly easy and many of these remarks also apply to fungi and other eukaryotic microorganisms. Once you have the sequence, then the lack of introns in bacteria and archea makes gene prediction almost trivial, and you now have a parts list for the organism.  But is that a useful parts list?  A new paper in Nature Methods makes some progress in improving the utility of those parts lists, though we are still far from actually fully understanding an organism given its genome.

Thursday, February 02, 2017

Could Hermione Tackle MinION Yield Variability?

A bit of a foray into Oxford Nanopore land again.  By replacing a bench bumbler with someone competent, we've seen some success with our MinION at Starbase.  Highly variable yields though.  I've done some looking and discovered this isn't a unique experience.  And now Oxford is suggesting that software upgrades alone will give MinION about another 50% boost in yield; it will be interesting to see what this does for variability.  Finally, I have a notion of some of the sources of variability and an idea for a troubleshooting tool

Wednesday, February 01, 2017

Illumina Drops NeoPrep

At the 2015 AGBT meeting, Illumina launched the NeoPrep, a ~$40K instrument to automate the preparation of up to 16 sequencing libraries at a time, using a technology called electrowetting microfludics. Now news comes that Illumina is dropping the NeoPrep, halting sales immediately and allowing existing users about a year of reagents.  What happened and how does it impact genomics?

Tuesday, January 31, 2017

On The International Nature of American Biotech

I'll spend two hours in project meetings tomorrow. Around the table will be a group of scientists who are all at the top of the game and among the best in the world at what they do. We will be trying to push forward new antibiotics to save lives. Yes, we are also trying to be rewarded monetarily with it, but we all share a mission to improve humanity by finding new drugs for important medical needs.

Friday, January 27, 2017

Perl: The Bad Habit I Can't Quite Kick

TULIP is a new assembler for long, error-rich reads such as from nanopore. I was a bit stunned to see that TULIP is written in Perl; I was starting to wonder how many holdouts like me there were. Which led to this exchange on Twitter

Tuesday, January 24, 2017

Notes on a Conversation with 10X

I've been remiss in writing up a piece on 10X Genomics based on a phone discussion last week with Michael Schnall-Levin (VP Computational Biology and Applications) and Anup Parikh (Director, Product Marketing).  I always appreciate companies reaching out to me and spending time to educate me on their products and plans, and this was a very interesting and enjoyable conversation.

Saturday, January 21, 2017

Gen9 Vanishes

Earlier this week one of my colleagues had gotten a somewhat ominous email  from the CEO of Gen9 titled "Special Gen9 Announcement", which led off by saying that their holiday shutdown would be followed with a "corporate restructuring period" during which "Gen9 will not be accepting orders". The next day came an article from Scott Kirsner detailing the effective shutdown of Gen9 and sale of its assets to Ginkgo Bioworks for an undisclosed amount of cash and stock.  Interestingly, Kirsner reports that only 10 Gen9 employees will make the transition and that most of the Gen9 staff was laid off in mid-December.  It is surprising that no gossip of the cutbacks seemed to enter my radar, given a number of personal connections to the company (CEO Kevin Munnelly was a colleague at Millennium; several members of the Gen9 business group were ex-Codon or ex-Infinity and we had done limited business with Gen9)

Tuesday, January 17, 2017

Bio-Rad Sips Up RainDance

Monday evening brought news that Bio-Rad has further consolidated its grip on the droplet microfluidics space by acquiring RainDance Technologies for an undisclosed price.  Bio-Rad had previously acquired droplet digital PCR company QuantaLife back in October of 2011 and targeted sequencing company GnuBio in April of 2014.  While the droplet digital PCR has been marketed for many years now, the GnuBio effort had gone relatively quiet since the acquisition.  However, Bio-Rad announced the JP Morgan conference that this technology will be launched as OncoDrop late this year.

Monday, January 09, 2017

Illumina Unveils HiSeq Successor NovaSeq

At today's J.P. Morgan Healthcare Conference Illumina made a number of small announcements -- some new partnerships, Firefly on track for launch later this year, launch of the single cell workflow partnered with Bio-Rad.  Then CEO Francis deSouza dropped the big news: a new high-end sequencer architecture to ultimately replace all of the HiSeq instruments.  It sounds like an interesting evolution of the Illumina product line, but unfortunately too many headlines and tweets have focused on a distant goal of $100 human genomes.  Worse, not only did some commentators misconstrue the announcement as delivering on $100 genomes, but some also touted a sequencing speed of one hour for a genome which isn't remotely true.

Sunday, January 08, 2017

Pondering What Is Lost In Teaching Translation

I'm good at acquiring distractions, and a relatively new one is Quora.  This site allows users to ask questions which are then answered by members of the community.  I lurk in a number of fields, but have answered a few questions related to genomics and related fields of biology.  Tackling a question last night required re-learning some details I was disappointed I had forgotten.  In researching to regain that knowledge, I skimmed a number of study guides online, which leads to this post.

Saturday, January 07, 2017

#JPM17 Genomics and Synthetic Biology Companies

With the 2017 J.P. Morgan Conference in Healthcare (#JPM17) starting Monday, I and others have engaged in early reporting or speculation.  I've tried to compile a list of presenting companies in the genomics, informatics and synthetic biology tool spaces, but these were filtered quickly from a long list of presenting companies so I may have missed some -- please leave comments and I can add.  Also, some of the big conglomerates could speak on these topics but might ignore them, so no promises.  For example, Roche has their pharmaceutical CEO speaking, so we may not hear anything about the PacBio breakup or Genia lawsuit.  All times are Pacific Standard Time and are from the J.P. Morgan, though I've converted to 24-hour time (hopefully successfully!).  You may need to register with J.P. Morgan to follow the links I've provided and access the webcasts when they are  available.

Thursday, January 05, 2017

Two Pore Guys Previews Handheld Nanopore Analyte Sensor Ahead of J.P. Morgan Conference

2017 is certainly shaping up to be a big year for nanopore news.  I touched on Oxford Nanopore's very full plate in my speculation about sequencing platforms and we already know of two different legal actions which will be progressing, PacBio vs. Oxford Nanopore and University of California vs. Genia.  James Hadfield's take on possible Illumina announcements at the J.P. Morgan Conference includes an Illumina nanopore device.  That's speculation; today we had a pair of tweets from Two Pore Guys previewing their sensing device and that they will be talking more at J.P. Morgan (all videos from 2PG).

2PG Demo Video - HIV from Two Pore Guys on Vimeo.

Tuesday, January 03, 2017

University of California Cries "Thief!" on Genia Patents

As I noted in my last post, the University of California has filed suit against Genia claiming that Genia co-founder Roger Chen misappropriated intellectual property from UC Santa Cruz and the laboratory of Mark Akeson (filings include a bunch of  other well-known nanopore scientists, including David Deamer and Dan Branton).  While the filings are mostly dry, they are enlivened occasionally by such colorful language as "evasive tactics", "aided and abetted" and "stonewalled".  Goaded by Mick Watson, I've dug into the court filings and some of the patents (and obtaining those filings apparently cost me some real money, perhaps approaching $1.0e01 dollars).

Monday, January 02, 2017

Sequencing Technology Outlook, January 2017

Another year of blogging is upon us!  Since the J.P. Morgan Conference starts a week from today and then before long it's time for AGBT.  So if one is going to prognosticate, then there's no time to lose, as announcements could start flying at any time.