Sunday, December 10, 2017

2017 Nanopore Community Meeting: An Incomplete Summary

The 2017 Nanopore Community Meeting was over a week ago back in New York City, so I'm grossly overdue in cobbling together some observations and opinion based on the tweet stream (I had a critical day job meeting at the same time and wasn't in New York).  I did dash off the bit about SmidgION being potentially like the early Macs (though I got wrong the nomenclature, the original was the Mac 128K -- Mac Classic was a later model that resembled it).  Oxford also deviated this autumn from the pattern of public information they had seemingly established, with major news at London Calling and smaller updates at the community meeting but also a pair of Clive Brown webcasts each falling roughly halfway between the two meetings.  This fall, no webcast.

Nanopore's have their own Day 1 and Day 2 writeups and an independent write-up from Arwyn Edwards.


Per the usual pattern, Oxford showed off previously announced hardware but made no solid announcements.  I've put together a Storify of relevant tweets which may hold further information.


SmidgION pumping out data with an attached Android phone calling the bases was a heavily tweeted and retweeted photo.  Alas, Oxford apparently put release of the SmidgION/Flongle components into the second half of next year, so no SmidgIONs adorning Christmas trees this year while happy recipients sing Flongle Bells ("Oh what fun, it is to sequence, in a one horse open sleigh, hey!").

Seriously, as suggested by the previous post I think these smaller flowcells are going to be hugely popular and influential.  For training and educational purposes, small is better.  The targeted application of field operations will be huge.  

But I think in the end the biggest use will be for many applications in which there are large numbers of samples from which small amounts of data will answer the scientific question and where multiplexing isn't a good solution.

To give one example, there is one of the burning questions of DNA sample prep: what contaminants damage flow cell performance?  Obviously that isn't a question suitable for multiplexing!

But there will be many others, particularly for counting applications.  Especially if "no library" approaches are developed along the lines suggested previously by ONT for their Cas9-based schemes.  If creating a sequencer-ready sample consists of just pipeting a small amount of inexpensive reagent, then a lot of new applications will open up.


No real news specifically about GridION X5, other than that many people have tweeted out pictures of their new GridION instruments and there have been very few reports of problems (I know of at least one example of one being dead-on-arrival, but that seems to be rare).  

But the big news tied to GridION is the launch of the first two contract research nanopore sequencing services, with the Garvan Institute in Australia and BaseClear / Future Genomics Technologies in the Netherlands.  Since Oxford won't license MinION users for service sequencing, only the availability of GridION made this possible.  Presumably nailing down a U.S.-based operation is a priority for ONT; I've shipped samples overseas for sequencing but it is never a calm process plus it creates additional scheduling headaches (never, never let your samples sit around at a shipping firm over the weekend!).


I wrote a very critical piece on PromethION last year.  The instrument isn't out of the woods yet, but 
Twitter traffic does suggest that Oxford is sending out small quantities of good flowcells.  Clive Brown tweeted that his yield from a PromethION flowcell is pushing what would be needed for 30X coverage of a human genome; of course Clive's yields are historically about 2X the best field yields and 3-4X better than what most users achieve.  So perhaps PromethION will be a real star of data production for London Calling 2018 presentations, but I certainly don't see that as a sure thing.

Basecaller Widget

ONT started showing off their prototype of the FPGA-powered stand-alone basecalling widget, also announcing a contest to name the device.  


VolTRAX is still in the "VIP" beta test phase, which I am not part of.  I believe the only available kit is still the rapid 1D DNA kit, which hasn't attracted a fan base as the conventional protocol is so simple.  ONT promised version 2 flowcells which will have capabilities such as thermocycling.


On the software side, Oxford touted their improved Scrappie basecaller and a new Tombo package for modified base analysis. You can find tweets on this and others related to base modification in a Storify.

I really can't do justice to Ryan Wick's talk -- if you want to get the latest on basecalling performance, check out the publication-ready README file from Ryan Wick which compares just about every known basecaller -- including the not-yet-public Guppie GPU caller -- on a variety of metrics.  Here's one example, showing raw basecalling accuracy.

Cold Chain

ONT has been making progress in reducing the cold chain requirements for select kits.  Flowcells are now being shipped wrapped in wool and they are beta-testing lyophilized versions of library prep reagents.  That would of course be huge for field use, but not inconsequential would be reducing the shipping costs for all users.  If you're going to be a low cost platform for hobbyists and educators, those shipping charges add up.


Not ONT, but a company called Circulomics announced plans for a sample preparation technology called NanoBind.  These are described as
a thermoplastic disk that contains a high density of micro- and nanostructured silica. This unique structure enables vast amounts of DNA to bind and release without being damaged. Processing occurs through a rapid bind, wash, and elute process that parallels magnetic beads and is easily automated.
Prep time is promised at 45 minutes and claimed to deliver up to milligrams of high quality, high purity HMW DNA from 1.5mL of input material


Probably the biggest splash of the meeting was the release of a large consortium RNA dataset for human cell line NA12878, with both 13 million direct RNA reads (from 30 flowcells) and 24 million cDNA reads (from 12 flowcells), all released on github

With both the RNA and DNA, even this set of highly experienced labs obtained greatly varying yields.

Still, getting hundreds of thousands of RNA reads is nothing to sneeze at (particularly since that would spread RNase around the lab!).

More importantly, a large number of the direct RNA reads -- and far more than the cDNA reads -- appear to represent full length transcripts.  Furthermore, the poly-A tail lengths can be accurately estimated with the direct RNA, even when they are hundreds of As long.

Basecalling accuracy is in the same neighborhood as DNA, with RNA performing slightly better.

There's a lot more in that README file -- identifying base modifications in RNA, capturing multiple splice forms, etc.  I'll try to dig more into that soon.

A number of users also presented exciting RNA results, particularly for direct sequencing of RNA viral pathogens such as flu and rabies.  I've put all the RNA-related tweets into a single Storify.

At least one talk debuted single-cell RNA sequencing on nanopore.  Another talk referenced Deb Peattie's pioneering work on chemical sequencing of RNA back in the 1970s.

Other User News

MinIONs continue to go to previously unimaginable locations -- perhaps the strangest one presented here was deep in a mine.  Nick Loman reviewed again his group (particularly featuring Josh Quick) sequencing Ebola and Zika in the field.  More tweets and talks in the a Storify focused on field uses.

Rachel Rubinstein of Ginkgo Bioworks described how a fast nanopore run saved hundreds of thousands of dollars by identifying the contaminating organism in a bioreactor. 

There were multiple talks on antibiotic resistance and pathogen detection (disclosure: my day job is looking for new antibiotics and I am doing light consulting for a company in the sequencing-by0-diagnostics space).   I've collected tweets on those topics in a Storify -- except a few I missed in preparing that from Claire Jenkins on getting pathogen sequence databases filled out.

Other worthy talks I'm going to reduce to tiny summaries: Steven Salzberg on assembling wheat,
Svetlana Madjunkova on pre-implantation genetic screening, Chia-Lin Wei on structural variants.  And so many more.  Watch my Twitter for announcements of a few more Storify pages from the 600 or so tweets which haven't been incorporated in the ones mentioned above.

Thursday, December 07, 2017

On the Problem of Sequence Leakage

I've been spending some time lately in an unfamiliar world: the eukaryotic section of NCBI's NR protein database.  I've been almost exclusively a bacterial guy for six years, but the other side of starbase had an interest in find homologs of a particular protein so I went diving for some.  That experience has reminded me of two serious issues with public sequence databases.  Tonight I'll dash off a bit about one; expect the other complaint to show up in the not-so-distant future. And tonight's lament is the increasing dispersion of sequence respositories.

Sunday, December 03, 2017

SmidgION: Mac Classic for the 21st Century?

Apple launched the Macintosh computer with a famous television ad playing on the launch year, 1984. What emerged was what we now know as the Mac Classic.  What may be less known is why the Mac Classic had that distinctive shape: it was intended to be backpack-portable, as Apple had a deal with a consortium of top U.S. universities to sell Macintoshes to their students.  Perhaps even more forgotten is that one of those schools, Drexel University in Philadelphia, made owning a Macintosh a requirement for students.

Monday, November 06, 2017

A Nucleotide Mixture-Based Error Correcting Short Read Chemistry

Sometimes polony-style short read sequencing seems like old news.  The underlying technology has been commercially available for over a decade.  I focus much of my attention to gains in long read technologies, though incremental improvements to read lengths or polony densities still appear.  Now in Nature Biotechnology a group from Peking University has published a new twist on sequencing-by-synthesis that is claimed to offer significant improvements on read accuracy.

Wednesday, November 01, 2017

AlphaGo & Biology

A comment was left on an early piece suggesting I comment on the recent AlphaGo paper and the possible applicability of this approach to biomedical sciences.  I'm not sure I have anything terribly original to say, but who can refuse a request?

Tuesday, October 17, 2017

Mission Bio Launches Tapestri Single Cell Platform

The fact that tumors and their immediate environment is genetically heterogeneous has long been known, but tools for high-throughput assessment of this heterogeneity have only recently become available.  The whole field of single cell RNA-Seq has seen spectacular growth, as new methods enable greater and greater numbers of cells to be profiled from a sample.  Profiling the DNA content on an individual cell basis has not been quite as much in the spotlight, but now a start-up called Mission Bio is launching a microfluidic library prep workflow, Tapestri, to enable amplicon panels to be run in single cell mode.

Friday, October 13, 2017

iGenomX Riptide Kits Promise a Sea of Data

A theme for me in my six years on Starbase has been addressing the challenge of cost-effectively sequencing many small genomes.  While sequence generation bulk prices have plummeted, all-in library construction cost has tended to stubbornly resist dramatic change.  Large genome projects don't face quite such a pinch, but if you want to sequence thousands of bacteria, viruses or molecular biology constructs, paying many-fold more for getting a sequence into the box than you're paying to move it through the box ends up being a roadblock. Illumina's Nextera approach dropped prices a bit, but not really a sea change.  Various published protocols drop  costs further via reagent dilution, but these can suffer from variable library yield and an increased dependence on precise input DNA quantitation and balancing.  Even then, the supplied barcoding reagents for Nextera handle at most 384 samples, and that is only a relatively recent expansion from 96. I previously profiled seqWell's plexWell kits, which like Nextera use a transposase scheme but with modifications to enhance tolerance to input sample concentration variation.  plexWell also enables very high numbers of libraries, which better mates projects with large numbers of small genomes to sequencers with enormous data generation capabilities.  Now comes another entrant in the mass Illumina library generation space: iGenomX, which has reformatted their chemistry from a microdroplet mode intended for linked read generation to a 96-well plate format requiring no unusual hardware.