Sunday, December 10, 2017

2017 Nanopore Community Meeting: An Incomplete Summary

The 2017 Nanopore Community Meeting was over a week ago back in New York City, so I'm grossly overdue in cobbling together some observations and opinion based on the tweet stream (I had a critical day job meeting at the same time and wasn't in New York).  I did dash off the bit about SmidgION being potentially like the early Macs (though I got wrong the nomenclature, the original was the Mac 128K -- Mac Classic was a later model that resembled it).  Oxford also deviated this autumn from the pattern of public information they had seemingly established, with major news at London Calling and smaller updates at the community meeting but also a pair of Clive Brown webcasts each falling roughly halfway between the two meetings.  This fall, no webcast.

Nanopore's have their own Day 1 and Day 2 writeups and an independent write-up from Arwyn Edwards.

Platform

Per the usual pattern, Oxford showed off previously announced hardware but made no solid announcements.  I've put together a Storify of relevant tweets which may hold further information.


Flongle/SmidgION

SmidgION pumping out data with an attached Android phone calling the bases was a heavily tweeted and retweeted photo.  Alas, Oxford apparently put release of the SmidgION/Flongle components into the second half of next year, so no SmidgIONs adorning Christmas trees this year while happy recipients sing Flongle Bells ("Oh what fun, it is to sequence, in a one horse open sleigh, hey!").



Seriously, as suggested by the previous post I think these smaller flowcells are going to be hugely popular and influential.  For training and educational purposes, small is better.  The targeted application of field operations will be huge.  

But I think in the end the biggest use will be for many applications in which there are large numbers of samples from which small amounts of data will answer the scientific question and where multiplexing isn't a good solution.

To give one example, there is one of the burning questions of DNA sample prep: what contaminants damage flow cell performance?  Obviously that isn't a question suitable for multiplexing!

But there will be many others, particularly for counting applications.  Especially if "no library" approaches are developed along the lines suggested previously by ONT for their Cas9-based schemes.  If creating a sequencer-ready sample consists of just pipeting a small amount of inexpensive reagent, then a lot of new applications will open up.

GridION

No real news specifically about GridION X5, other than that many people have tweeted out pictures of their new GridION instruments and there have been very few reports of problems (I know of at least one example of one being dead-on-arrival, but that seems to be rare).  

But the big news tied to GridION is the launch of the first two contract research nanopore sequencing services, with the Garvan Institute in Australia and BaseClear / Future Genomics Technologies in the Netherlands.  Since Oxford won't license MinION users for service sequencing, only the availability of GridION made this possible.  Presumably nailing down a U.S.-based operation is a priority for ONT; I've shipped samples overseas for sequencing but it is never a calm process plus it creates additional scheduling headaches (never, never let your samples sit around at a shipping firm over the weekend!).

PromethION

I wrote a very critical piece on PromethION last year.  The instrument isn't out of the woods yet, but 
Twitter traffic does suggest that Oxford is sending out small quantities of good flowcells.  Clive Brown tweeted that his yield from a PromethION flowcell is pushing what would be needed for 30X coverage of a human genome; of course Clive's yields are historically about 2X the best field yields and 3-4X better than what most users achieve.  So perhaps PromethION will be a real star of data production for London Calling 2018 presentations, but I certainly don't see that as a sure thing.

Basecaller Widget

ONT started showing off their prototype of the FPGA-powered stand-alone basecalling widget, also announcing a contest to name the device.  


VolTRAX

VolTRAX is still in the "VIP" beta test phase, which I am not part of.  I believe the only available kit is still the rapid 1D DNA kit, which hasn't attracted a fan base as the conventional protocol is so simple.  ONT promised version 2 flowcells which will have capabilities such as thermocycling.


Software


On the software side, Oxford touted their improved Scrappie basecaller and a new Tombo package for modified base analysis. You can find tweets on this and others related to base modification in a Storify.

I really can't do justice to Ryan Wick's talk -- if you want to get the latest on basecalling performance, check out the publication-ready README file from Ryan Wick which compares just about every known basecaller -- including the not-yet-public Guppie GPU caller -- on a variety of metrics.  Here's one example, showing raw basecalling accuracy.

Cold Chain

ONT has been making progress in reducing the cold chain requirements for select kits.  Flowcells are now being shipped wrapped in wool and they are beta-testing lyophilized versions of library prep reagents.  That would of course be huge for field use, but not inconsequential would be reducing the shipping costs for all users.  If you're going to be a low cost platform for hobbyists and educators, those shipping charges add up.

NanoBind

Not ONT, but a company called Circulomics announced plans for a sample preparation technology called NanoBind.  These are described as
a thermoplastic disk that contains a high density of micro- and nanostructured silica. This unique structure enables vast amounts of DNA to bind and release without being damaged. Processing occurs through a rapid bind, wash, and elute process that parallels magnetic beads and is easily automated.
Prep time is promised at 45 minutes and claimed to deliver up to milligrams of high quality, high purity HMW DNA from 1.5mL of input material

RNA

Probably the biggest splash of the meeting was the release of a large consortium RNA dataset for human cell line NA12878, with both 13 million direct RNA reads (from 30 flowcells) and 24 million cDNA reads (from 12 flowcells), all released on github


With both the RNA and DNA, even this set of highly experienced labs obtained greatly varying yields.

Still, getting hundreds of thousands of RNA reads is nothing to sneeze at (particularly since that would spread RNase around the lab!).

More importantly, a large number of the direct RNA reads -- and far more than the cDNA reads -- appear to represent full length transcripts.  Furthermore, the poly-A tail lengths can be accurately estimated with the direct RNA, even when they are hundreds of As long.

Basecalling accuracy is in the same neighborhood as DNA, with RNA performing slightly better.

There's a lot more in that README file -- identifying base modifications in RNA, capturing multiple splice forms, etc.  I'll try to dig more into that soon.

A number of users also presented exciting RNA results, particularly for direct sequencing of RNA viral pathogens such as flu and rabies.  I've put all the RNA-related tweets into a single Storify.

At least one talk debuted single-cell RNA sequencing on nanopore.  Another talk referenced Deb Peattie's pioneering work on chemical sequencing of RNA back in the 1970s.


Other User News

MinIONs continue to go to previously unimaginable locations -- perhaps the strangest one presented here was deep in a mine.  Nick Loman reviewed again his group (particularly featuring Josh Quick) sequencing Ebola and Zika in the field.  More tweets and talks in the a Storify focused on field uses.

Rachel Rubinstein of Ginkgo Bioworks described how a fast nanopore run saved hundreds of thousands of dollars by identifying the contaminating organism in a bioreactor. 

There were multiple talks on antibiotic resistance and pathogen detection (disclosure: my day job is looking for new antibiotics and I am doing light consulting for a company in the sequencing-by0-diagnostics space).   I've collected tweets on those topics in a Storify -- except a few I missed in preparing that from Claire Jenkins on getting pathogen sequence databases filled out.

Other worthy talks I'm going to reduce to tiny summaries: Steven Salzberg on assembling wheat,
Svetlana Madjunkova on pre-implantation genetic screening, Chia-Lin Wei on structural variants.  And so many more.  Watch my Twitter for announcements of a few more Storify pages from the 600 or so tweets which haven't been incorporated in the ones mentioned above.

Thursday, December 07, 2017

On the Problem of Sequence Leakage

I've been spending some time lately in an unfamiliar world: the eukaryotic section of NCBI's NR protein database.  I've been almost exclusively a bacterial guy for six years, but the other side of starbase had an interest in find homologs of a particular protein so I went diving for some.  That experience has reminded me of two serious issues with public sequence databases.  Tonight I'll dash off a bit about one; expect the other complaint to show up in the not-so-distant future. And tonight's lament is the increasing dispersion of sequence respositories.

Sunday, December 03, 2017

SmidgION: Mac Classic for the 21st Century?

Apple launched the Macintosh computer with a famous television ad playing on the launch year, 1984. What emerged was what we now know as the Mac Classic.  What may be less known is why the Mac Classic had that distinctive shape: it was intended to be backpack-portable, as Apple had a deal with a consortium of top U.S. universities to sell Macintoshes to their students.  Perhaps even more forgotten is that one of those schools, Drexel University in Philadelphia, made owning a Macintosh a requirement for students.

Monday, November 06, 2017

A Nucleotide Mixture-Based Error Correcting Short Read Chemistry

Sometimes polony-style short read sequencing seems like old news.  The underlying technology has been commercially available for over a decade.  I focus much of my attention to gains in long read technologies, though incremental improvements to read lengths or polony densities still appear.  Now in Nature Biotechnology a group from Peking University has published a new twist on sequencing-by-synthesis that is claimed to offer significant improvements on read accuracy.

Wednesday, November 01, 2017

AlphaGo & Biology

A comment was left on an early piece suggesting I comment on the recent AlphaGo paper and the possible applicability of this approach to biomedical sciences.  I'm not sure I have anything terribly original to say, but who can refuse a request?

Tuesday, October 17, 2017

Mission Bio Launches Tapestri Single Cell Platform

The fact that tumors and their immediate environment is genetically heterogeneous has long been known, but tools for high-throughput assessment of this heterogeneity have only recently become available.  The whole field of single cell RNA-Seq has seen spectacular growth, as new methods enable greater and greater numbers of cells to be profiled from a sample.  Profiling the DNA content on an individual cell basis has not been quite as much in the spotlight, but now a start-up called Mission Bio is launching a microfluidic library prep workflow, Tapestri, to enable amplicon panels to be run in single cell mode.

Friday, October 13, 2017

iGenomX Riptide Kits Promise a Sea of Data

A theme for me in my six years on Starbase has been addressing the challenge of cost-effectively sequencing many small genomes.  While sequence generation bulk prices have plummeted, all-in library construction cost has tended to stubbornly resist dramatic change.  Large genome projects don't face quite such a pinch, but if you want to sequence thousands of bacteria, viruses or molecular biology constructs, paying many-fold more for getting a sequence into the box than you're paying to move it through the box ends up being a roadblock. Illumina's Nextera approach dropped prices a bit, but not really a sea change.  Various published protocols drop  costs further via reagent dilution, but these can suffer from variable library yield and an increased dependence on precise input DNA quantitation and balancing.  Even then, the supplied barcoding reagents for Nextera handle at most 384 samples, and that is only a relatively recent expansion from 96. I previously profiled seqWell's plexWell kits, which like Nextera use a transposase scheme but with modifications to enhance tolerance to input sample concentration variation.  plexWell also enables very high numbers of libraries, which better mates projects with large numbers of small genomes to sequencers with enormous data generation capabilities.  Now comes another entrant in the mass Illumina library generation space: iGenomX, which has reformatted their chemistry from a microdroplet mode intended for linked read generation to a 96-well plate format requiring no unusual hardware.

Wednesday, October 04, 2017

PacBio's Frankenpatent on Error Correction

Well, here we go again.  Pacific Biosciences launched yet another patent lawsuit towards Oxford Nanopore at the end of September, and already the hounds are baying for me to look at the patents -- which I've foolishly established a reputation of doing. I will remind readers that, to use a construction that exasperates my son, I have no memory of these topics being covered during the time I was in law school. (said construction also works for divinity school, seminary, yeshiva, dental school, military academy, etc). 

Sunday, October 01, 2017

Dispatches from CDC AMD Day 2017

I had the singular honor and pleasure of speaking this past Monday at the Center for Disease Control and Prevention's Advanced Molecular Detection(AMD) program's annual confab in Atlanta.  Just visiting the CDC campus was already a bit magical -- along with the Kennedy Space Center and Cold Spring Harbor it's one of mythical places of human exploration to me.  But to actually stand at the podium? Wow!

I've collected below a bunch of separate mental threads, many of which probably should be expanded out to a full post in the future.

Sunday, September 24, 2017

Why Is LISP So Rare in Bioinformatics?

LISP is one of the oldest computer languages and perhaps one of the most influential of the early ones.  Some of the other well-known Eisenhower era languages -- Fortran, COBOL and ALGOL, have certainly left their mark, but LISP and derivatives such as Scheme or Common LISP certainly carries more cachet among "serious" programmers.  COBOL has always been a bit of an easy joke and Fortran tends to mark you as old-school; use of APL (once a language of mine) would mark you as dangerously reactionary.  ALGOL begat Pascal and Modula II and clearly had impact on the C syntax family of languages (including bioinformatics mainstays Python, Perl and Java) As I'll detail below, learning LISP has embarrassingly ended up stuck seemingly permanently on my future plans queue.  But that's also because life never forced the issue:  while LISP has certainly been used in bioinformatics (as covered in a review from 2016 ) , its mindshare in the community would seem to be very minimal.

Monday, September 18, 2017

Teaching Biology Evidence: Old or New?

I've been toying over a week with writing something based on an interesting Twitter discussion started by Dr. Laura Williams (@MicroWavesSci) of Providence College pondering the best way to approach teaching molecular genetics (really, science in general) at the undergraduate level.  In particular, Professor Williams wondered about the dangers of branding various key experiments with the names of the experimenters, such as Hershey-Chase or Meselson-Stahl.  The risk she points out is that this can devolve into an exercise in memorizing names and dates without assimilating concepts, or conversely that some students will find the names more of a hindrance than a help.  I'm going to play a bit with this, but I do emphasize that for her this is reality and for me it is a hobby (or perhaps a retirement fantasy, if I should ever actually retire).  Or in other words, for the academic this is her industry but for this industrial scientist it is academic.

Tuesday, August 29, 2017

The Curse of Spammotation Lives!

High throughput sequencing of genomes is over twenty years old, which demanded the development of automated pipelines for annotating this data.  I've worked on such pipelines since the early 1990s, implementing them as a student and at two different corporate stops.  Indeed, we were reviewing results from my pipeline versus some of the other ones out there to see what can be done better.  And unfortunately, I've found infuriating problems with RefSeq entries annotated with NCBI's bacterial genome annotation pipeline.  Now I'm usually one to sing the praises of NCBI -- they are a key resource for biological research and they make available multiple spectacular public services freely to the entire world.  But I'm afraid this time I need to vent.

Tuesday, August 15, 2017

DNA vs. the Machine

Last week's news contained a story sure to raise eyebrows.  A group of computer security researchers from the University of Washington claimed to have demonstrated that they could hijack a computer via sequencing a carefully-constructed DNA fragment.  Visions of NextSeqs rampaging through the streets immediately sprung to mind.  The paper is interesting and has some useful warnings for the bioinformatics community, but certainly the news coverage has been strong on hype and alarmism.

Saturday, August 05, 2017

Computational Biology & Math: Am I Just Faking It?

Over on Quora a common type of question is "Can I be a computational biologist if I am now an X".  Personally I take a very broad view and think just about anyone with intellectual curiosity can become any kind of scientist.  A related type of question is "how skilled do I need to be in Y to succeed in computational biology", where Y is most often programming, biology or math.  I got thinking about this and started wondering whether I am actually at all skilled in math.  Here is the results of that analysis.

Friday, July 21, 2017

A Third GridION X5 Pricing Plan

When Oxford Nanopore announced their GridION X5 instrument in March, I and others attempted to parse the difference between the two pricing plans  -- and I made a bit of a hash of it.  The X5 runs 5 MinION flowcells independently in parallel from a single desktop instrument, which also includes FPGA-based acceleration of basecalling plus a license to perform sequencing-for-hire.  Indeed, Matt Loose tweeted out an image of an "X6" and then mention of an "X7"; the X6 had a MinION plugged into the USB port and apparently the FPGA unit can keep up with seven flowcells all running simultaneously.  Now Oxford has launched an interesting third "Starter Pack" plan that offers an even lower price point for the system.

Wednesday, June 28, 2017

STAT Proves Not Resistant To Antibiotic Tropes

Tuesday's Boston Globe carried a piece originating from STAT news on an interesting natural product antibiotic, pleuromutilin.  A research group recently published a new total synthesis of this fungal terpene, an advance which promises to enable greater medicinal chemistry around the molecule.  That part is cool.  Unfortunately, when it gets to the biology of pleuromutilin the piece by Eric Boodman completely spits the bit, trotting out some horribly inaccurate tropes.

Wednesday, June 14, 2017

New Life in the Sanger Market

In my bit on "I'm not dead yet" technologies recently, I included large scale Sanger sequencing. That reflects to a large degree my personal experiences and biases.  Targeted Sanger is great for spot checking the occasional junction or misbehaving clone or strain, but I forget that many clinicians still see it as a gold standard.  Apparently there are others who disagree with me, as Thermo Fisher recently launched a new Sanger instrument targeted at small labs, and according to GenomeWeb Promega plans an instrument offering in the same space as well.

Tuesday, June 06, 2017

Ice Ghosts:A Shortage of Maps

I'm going to step outside the usual topic space here and cover an interesting but frustrating book I read partly on the flight to London Calling (which is about the only connection it has to genomics).  Ice Ghosts, by Paul Watson, covers the searches for the lost Franklin Expedition, a mid-1800s British Navy attempt to find the Northwest Passage.  It's a pretty good book, after all it did win a Pulitzer Prize,  The topic is thrilling: explorers under difficult conditions and a mystery that lasted over a century.  There are lessons for science in general, such as the value in carefully evaluating oral histories that some would discard as unreliable. But what is maddening for me is that in a book for which a central theme is poorly understood geographies and their interpretations, the set of supplied maps fail miserably at assisting in the telling of the story.

Monday, May 22, 2017

What Is (and Is Not) Sequence Assembly?

In the closing talk of the pre-London Calling workshop, Hans Jansen had closed his presentation with a question whether at some future date sequence assembly would become obsolete.  This was meant to be an aspirational vision for a distance timepoint, but one correspondent on Twitter saw it as hype.  I got in a bit of a discussion, constrained by the dreaded 140 character limit, which ended up largely illustrating that I have a somewhat more restricted definition of assembly than some people.  I'm going to explore this and you can judge for yourself

Thursday, May 18, 2017

London Calling 2017: Plant & Animal de novo Genomes

Okay, I'm desperately behind on writing up the external science from London Calling.  Not helpful that I claimed I would not only do so, but in multiple installments.  A number of the plenaries focused on large genome assembly, so that's what I'll tackle now -- plus a few other bits.   See also my Storify summaries, which include other reports on the conference.  Also check out my storifies on the SMRT Leiden conference, which ran at the beginning of the same week and discusses many similar topics.