Sunday, May 29, 2016

London Calling: Notes on Brownian Commotion

I'm behind on writing up London Calling.  I can partly blame a failing computer -- though rebooting it seems to have righted it for the moment.  A bigger challenge is that I had the luxury of staying in London thru the weekend, and have been trying to pack as much in of England as I can.  To really do justice to everything, I need to scan all the tweets  -- and that will take some time.

But I have dug into everything around Clive Brown's talk (kudos to NextGenSeek for Storifying that portion of the meeting's tweets!_ about the current and future state of the Oxford Nanopore platform, so I will focus on that, with a few side-trips on closely related topics. A few gaps on topics I previewed but didn't show up in the presentation were filled in with chats with Clive.  Plus, the indescribably huge advantage of actually going to a conference are the tidbits gleaned from late night chats over drinks (and no, I didn't ply anyone to get them to spill -- all was coughed up out of pure free will).  I'm going to roughly divide these by the announced timeframe: now, imminently, later this year, perhaps next year and unspecified.


The key now announcement is tha the R7 pore has been retired, "gone to the dustbin of history", and the R9 pore is now the commercial product.  R9 was known last year only as "fast mode" and revealed as CsgA in a Google hangout in March,  The current protocols will run R9 at 250 bases per second, though Brown reiterated the idea that higher speeds are possible in the future.  R9 flowcells also use a newer membrane, M10, and an upgraded helicase motor, E7.  If you do have R7 flowcells to ditch, Karen James is interested in them for her citizen science outreach efforts.

The downside to any chemistry change is that all the previous statistics need to be updated as well as any models of the data in software. R9 also comes with a change in the FAST5 (HDF5) file format, as the new Recurrent Neural Network (RNN) basecaller doesn't generate some of the upstream information the old Hidden Markov Model (HMM) basecaller did.  Many software developers were in on early access to the R9, but the required changes may not all be in place.

R9 is superior to R7 for both data quality and yield (see the graph I captured below), in part because a very strong signal is seen from the central base.  Nick Loman has released an E.coli dataset (FAST5 or FASTA) for those who wish to measure the improvements themselves.  Of course, as I tend to complain about, this is E.coli, which is 50% G+C overall and not rich in significant deviations from that mark.  In a chat after the conference, someone who has run many different sample types confirmed my worry that high G+C and low G+C sequences did not perform as well on R7 as E.coli-ish ones - more sequences ended up in the fail bin that had usable data. Interestingly,  Sara Goodwin at CSHL is that for R9 the pass/fail calls on reads are reliable, whereas she (and multiple other presenters) found that with R7 there were many "fail" reads that contained good data regardless of the sample.  R9 also is capable of phenomenal read lengths -- Oxford reports a 2D read of 255Kb, which means they streamed 510 kilobases of nucleotides through their pore successfully.  And yes, it did align to the target, a criterion which Miten Jain of UCSD UCSC has dubbed "the Robison rule".

Homopolymers remain a problem - for now.  With R7, the HMM-based basecaller was actually incapable of calling runs of longer than 6., and after polishing de novo assemblies it appears that such homopolymers dominated the errors.  Brown spoke of a new "transducer-like" basecaller that should make inroads on the homopolymer problem, and also spoke of a "secret R9 chemistry" which has more predictable ratcheting during DNA translocatoin.  Brown also mentioned that ONT continues to look for "less noisy" pores, including further mutants of the R9 pore. ONT's goals for accuracy by end-of-year are 95% for 1D and 99% or better for 2D.  Again, I urge that any such stats be generated using a wide range of G+C contents.  Streptomyces rapamycinicus is a great high G+C organism, has an okay public reference genome and some truly evil repeats to test assembly algorithms.  I wouldn't want to grow Plasmodium, but that is one possibility for low G+C.

One tweak to the running protocol: the "fuel" now comes pre-mixed at final concentration, whereas previously the user had to perform a simple mixing with running buffer. Apparently the MARC consortium data contains tell-tale signs  (i.e. altered DNA translocation speeds) of the fuel concentration, suggesting that this was a source of site-to-site variability.

The physical devices have been tweaked in small ways to yield the MinION MkI-B.  Around the bar, I found out from some users that the MkI had a fragile USB3 port -- similar to port problems that have bedeviled some of my laptops and smartphones.  When you're plugging in, don't go overboard!   Clive Brown mentioned that electromagnetic noise from the cooling fan was injecting noise into the system, so shielding has been added.  Brown says the new devices should be able to handle pore speeds up to 1300 bases per second -- and that ONT has generated data at that pace in their R&D lab!  However, that speed requires shelving event detection, which doesn't work at such speeds, and basecalling directly from the raw signal -- which the "transducer-like" scheme is capable of.  Clive mentioned in passing that hardware acceleration of basecalling was being actively explored, though it wasn't clear which of the basecalling schemes (all of them?) this comment applied to.

Also in the now category is several changes to the MinKNOW software used to run the device.  A Mac port is out-and-running.  A new look has been released as well.  A local basecaller has been released (and put to use by Joe Parker and Alex Papadopulos, who reported MinION sequencing in Wale's Snowdonia National Park in a valley with no cell phone coverage., Integration  with MinKNOW was said to be have been accomplished the week of the conference, with release "imminent". 


Clive likes the word imminent.  Perhaps the "imminent" item which will get the biggest spotlight is customer data from PromethION.  While one unit was delivered last month to an unnamed customer (bar-side speculation centered on a flagship genomics center about an hour north of London), only the electronics test ("configuration test") has been run -- no flowcells have yet been delivered.  PromethION flowcells have a very different form factor than MinION flowcells, so they can't be interchanged.  PromethION will launch with 3000 channels, 6X that on a MinION device, divided into four sectors by a gasket.  Each sector has its own sample port, with a geometry compatible with multichannel pipets. A library prep from 100ng of DNA is rated for 8 loadings

For those of us all thumbs in the lab, as well as those of us having trouble rounding up talented experimentalists to run projects, a really exciting tool is the  rapid1D transposase library prep kit.  This is the kit that Nick Loman and Matt Loose used to make libraries in their AGBT hotel room, and they and others have praised it on Twitter and in person.  Unfortunately, as ONT expanded shipping of the kit they discovered that it was unstable in transit.  A new shipping formulation has been devised and is undergoing testing.

The data plot below is a little challenging to read (a rant for another time: trellising stomps on color coding), but suggests that his 1D plot gives plenty of alignable reads with accuracies of 80-90%.  As I said repeated to people at the meeting, this kit is a game changer: not only will it allow bunglers like me (at least, everyone who's used it says it is just simple pipeting and incubating) to make libraries, but it is a radically different proposition to ask for an overloaded tech to take 15 minutes to run a simple protocol vs. 1.5 hours to run a far more complex one.  My personal plan is to have every molecular biology Ph.D. and RA at our shop "learn" the 1D prep -- if reading a protocol that simple can really be called learning.  2D rapid preps (slated at 30 minutes), barcoded rapid 1D, and rapid amplicon kits are also in the works, with delivery sometime later this year.

Late this Year

Two eagerly awaited platform extensions were given launch windows of Q4 of this year: direct RNA sequencing and the library preparation automation solution VolTRAX.

Oxford's direct RNA kit will work off of polyadenylated RNA.  Interestingly, it will read the RNA 3' to 5', which means it will be critical for some step of the tool chain to reverse but not complement the data (and equally important that this not be done twice!).  Data quality is not as good as for DNA, though the ability to pick out modified bases is present -- and Brown thinks many new modified RNA bases will be found using this.  Early access is hoped to begin by Q3.  Direct RNA sequencing eliminates the cost and extra steps of reverse transcription, as well as the errors (such as strand switching) injected by that enzyme.  Oxford has tested this scheme with both viral RNA and Saccharomyces mRNA.

VolTRAX has undergone a few more design changes.  An early access program, called V.I.P., will launch later this summer, with registration now open via the Nanopore Community.  VolTRAX will consist of a small controller instrument and a disposable flowcell.  The initial version will probably require pipetting to transfer the completed libraries from from VolTRAX to MinION/PromethION.  Brown sees VolTRAX as a step in flattening both user-to-user variability and the level of user training required to make libraries.  Initially, it should have bead cleanups and some ability to heat/cool, though not full-fledged PCR on board.  Pricing was not announced

Okay, I'll go out on the limb and say the obvious: Oxford ain't exactly Ted Williams when it comes to hitting timelines. Indeed, their predictive accuracy is better modeled as "well below the Mendoza line" (apologies to non-Yanks; after going a bit native in London I need to demonstrate I still know the U.S.).  Hence the proviso on the title -- I'd love for Oxford to nail these, as they are exciting, but right now I see these as exciting concepts that Oxford is committed to developing, but no more than that.

[Ugh -- in my note I somehow slipped Zumbador to next year; Q3 2016 was the announced target.  Another embarrassing error; on the other hand, if Oxford delivers and people remember my date, then ONT will look really good :-) ]
Zumbador (Hummingbird) is the codename for a project from Clive's SkunkWorkx R&D effort to solve the problem of extracting DNA from samples, a challenge also taken on by Claire Lonsdale form the British defense sight at Porton Down.  Zumbador, for which proof-of-concept has been executed, is a handheld device (that's me holding the prototype) which would sip sample liquids with its beak, extract DNA and perform library preparation and cleanup, ultimately excreting (Clive used a word 5 letters shorter) the beads into a new top-loading flowcell, which is actually in the imminent category.  Direct loading of beads onto the flowcell is a concept Clive has been talking about for quite a while, and should enable libraries from smaller input amounts by eliminating a final elution step (instead, the sequencing chemistry will yank the strands off the beads).  Samples might need homogenization or bead bashing upstream of Zumbador -- that beak isn't going to sip chunks of tissue and difficult cell walls might require some abuse first.  Zumbador will be made a platform open to other developers, so that non-Oxford protocols can utilize this device.  Early access targeted for next year, but no pricing suggested.

With sample-to-library in one handheld device, and library-to-raw data in another, it would seem an obvious next step to merge these. While you're at it, make it run off an iPhone..  This is the concept of SmidgION.  The attached smartphone is projected to provide 4 hours of runtime as well as the compute for basecalling and perhaps many applications (a concept discussed back at the December meeting -- tools such as MinHash look very promising for this).  SmidgION would use the denser PromethION pore layout, but with only 256 pores channels, about half the number on a MinION, which will probably be enough for de novo sequencing of small bacterial genomes and certainly enough for many amplicon projects.  No pricing was discussed.

Next Year (Perhaps)

While both Zumbador and SmidgION are both far in the future, the concepts definitely caught the imagination of both attendees and many who saw the announcement on Twitter.  For some that will be an example of hype or a reality distortion field, for others it is fun to dream.

Oh, and another thing which Clive announced but didn't put any date on: Oxford has ideas for DNA synthesis.  In other words, taking their experience with long reads and creating a system for long writes.  The chemistry will not use phosphoramidites, but beyond that Clive said he couldn't really say anything (other than the name of the spinout, The Genome Foundry Ltd.).


There were also a bunch of odds and ends that don't quite fit my scheme; some were suggested to be just in R&D, others perhaps imminent.  Crumpet chips and pay-as-you-go pricing weren't mentioned in the talk, but I caught up to Clive later.  Crumpet chips are still in development, but he felt constrained by the talk time limits (I suspect all of us would gladly hear another hour from Clive, but one must eventually eat).  

The problem of blocking, in which flowcell performance degrades over time, was mentioned.  ONT thinks the problem is in the stalling chemistry, which is how they slow down the motor protein.  A fix is being actively worked on, but no dates given.  The topic of possible pore damage came up in a number of questions or discussions.  For example, Matt Loose presented on read-until, which relies on reverse translocating a partially sequenced DNA if it is decided to not be worth sequencing.  Matt was asked if this damages the pore; he hasn't looked at it.  Matt did comment he has great datasets to do so, as he often runs half the pores channels on a flowcell in standard mode and half in read-until; if retro-ejection damages the pores, then the pores in read-until pores channels should degrade faster.  Of course, if simply loading a DNA runs a risk of damaging a pore, then read-until might have a higher failure rate since these pores see more molecules loaded.  A post-conference bar discussion of the structure of the adapters and hairpin had some present wondering if any of the components of the hairpin might troublesome.  Hopefully ONT will nail this down soon, as it would increase the yield-per-flowcell by up to 100% for long runs.

Another interesting concept without a delivery date is an idea for altering the adapters so that molecules daisy-chain.  To review, each pore has a duty cycle consisting of being idle, grabbing a leader, translocating the DNA attached to that leader, and then returning to the idle state.  Minimizing the idle time is a route to enhancing throughput.  Brown envisions setting up the DNA molecules to interact so that as one DNA is translocating a pore, it is setting up another DNA for loading as soon the first molecule is complete.  And so on.  In the ideal case, this would result in zero idle time.  However, it could also result in a highly viscous DNA mixture, since the strands would appear much longer than they really are.  Perhaps direct loading will solve this problem, or the final link in the coupling will be in the running buffer.  An interesting related question that came up is whether long DNAs translocate any slower than short ones; nobody outside of ONT seems to have looked at this in detail (I don't know if ONT has; I'd be surprised if they haven't, but then again one can't do everything).

Well, that's it for tonight.  I'll try to write up the non-ONT side of the meeting while impressions are semi-fresh (and might throw any further ONT thoughts in there).  Another shout-out to NextGenSeek for the Storify of Clive's talk, and for everyone who tweeted during it or who was chatty at a bar.  And to all my Twitter followers for tolerating my selfie-indulgence as I've been touring London & Cambridge (you'll be appalled to know I had a few more ideas that just never got shot).

(two early commenters pointed out annoying and embarrassing errors on my part, which I've repaired but with the original in strike-through.  Miten is indeed at Santa Cruz, not San Diego, and I knew that.  I do need to be much more careful in the terminology of pores and channels -- as pointed out, there are N channels per device, each of which has a battery of 4 independent pores to listen to). 


Anonymous said...

Since the R9 is advertised as "clear IP" are they implicitly acknowledging that R7 was infringing, their previous statements to the contrary notwithstanding?

wdecoster said...

Great post and nice to read it again after hearing Clive. It's still exciting, even days later. A few times during your post I noticed a potential confusion between 'pores' and 'sensors', i.e. a MinION flowcell has 512 sensors and 2048 pores. Each sensor has 4 pores, out of which the best are chosen in the Mux scan. Additionally, when talking about the PromethION, this machine has 3000 sensors per flow cell, and 48 flow cells on the entire PromethION, making it far more than MinION. I'm not sure if I understood it correctly, but from what I heard in the live lounge the PromethION flow cell has 6000 pores and 3000 sensors, i.e. a 2:1 ratio.

Anonymous said...

Miten Jain is at UCSC, not UCSD.

Keith Robison said...

I'd say that in regard to R7 IP, Oxford has not made a statement. That could indicate they are infringing. There are also many ways they could be in a bit of a grey zone. For example, R7 might be in one of the more extreme claims of sequence identity, which Oxford thinks won't survive in court. Or that Oxford thinks they can prove prior art which could undermine the Gundlach patents but isn't a slam dunk. It could also be at this point Oxford is playing poker, in which you never reveal information without your opponent paying for it.

I'm no patent lawyer and never plan to be one, but if Oxford was found to have been willfully infringing with R7 they'd be liable for treble damages on the infringement -- which is 3X whatever revenue they earned on R7, as well as halting the infringement. If that were the case, then the infringement has now halted, and dragging the case out would put off when damages would need to be paid. That's the bear scenario here.

Thank you for the two corrections; the C->D substitution was really stupid; I do know the difference between San Diego and Santa Cruz and which Miten is at. The pores vs. channels distinction is quite important and I need to tighten up my thinking and language there. Both corrected now, with strike-through showing the original

Stephen Osborne said...

Hi Keith, enjoyed the post but was hoping for something more on the protein sequencing work being investigated at ONT. On 26/05 you posted a tweet . Just wondering whether you have any additional details that were either related during Clive's musings or in any apres-talk stories. Thanks.

Brian Naughton said...

I thought Clive said Q3 2016 for Zumbador. I agree that it seems way too soon for something so complex (especially since I thought he also said the Zumbador team was just a couple of guys), but I could have sworn that was the date. Either way, it's going to be a very interesting year for nanopores.

Keith Robison said...

Hi! I'm happy you like the writing, but I made a decision at the start to restrict this to my personal thoughts.

It's easy to create your own blog & I'm happy to help promote quality content via here & Twitter