Saturday, February 10, 2018

Brown Webcast Note: Corrections and Expansions

After I post something, there's almost always something I realize I left out.  In my piece on Clive Brown's webcast of ONT improvements, not only did I forget a few key details but my wording led to some unfortunate confusion, as judged by a comment.  Someone took me up on my idea on how detecting large fragments during a run might work -- and showed it doesn't pan out (which Clive Brown confirmed).  And to top things off, a BioRxiv preprint showed up that exactly covered something I alluded to.

Multiplexing Confusion

Okay, first let's clear the water I muddied by being careless about the term multiplexing.  It is fitting, but confusing, that this term is itself multiplexed.  In the nanopore world, there is multiplexing of the sensors and multiplexing of samples.

Sensor multiplexing, which ONT calls "mux", is the pairing of a given electrical sensor on the flowcell with multiple pores.  Muxing delivers multiple benefits.  First, it enables higher sensor utilization, as a single dead pore doesn't mean a sensor goes unused.  Second, it allows higher overall yield by switching pores periodically to rotate dead pores out in favor of live ones.

MinION flowcells use muxing, with four pores available to each sensor.  At the beginning of each run, the MinKNOW software steps through the pores to rank their quality determining when they will be used.  This step is called the "mux scan" or often (to enhance the potential for confusion) just "muxing".  Quite a few of the elite nanopore groups maximize their MinION productivity by "re-muxing", stopping runs and re-running the mux scan.  It's surprising that this isn't a built-in option to MinKNOW, particularly since it has some additional complications -- the voltage applied to the flowcells must be adjusted based on how long the cell has run and even accessing this option requires editing the MinKNOW running script.

Flongle flowcells do not use sensor multiplexing aka muxing -- each sensor is paired with only a single pore.  So if that pore is unusable, the sensor sits idle during the run.  There's also no opportunity for re-muxing -- nor any need for a mux scan (I think).  But Flongle seems to overcome these issues by having a very high initial pore availability -- 90% -- and then having a very slow drop-off of pore numbers.  So Flongle flowcells are much simpler in design yet are specified to deliver a lot of data (obviously, field testing data isn't available).  MinION has four times as many sensors as Flongle and each has access to four pores rather than only one.  So 16 times the possible capacity, though of course some sensors may be unlucky and receive multiple bad pores.  Yet Flongle is touted to deliver a gigabase or more, quite respectable versus the 10Gb+ yields being seen on MinION with R9.4.1 flowcells.  

Sample multiplexing is via DNA barcodes, which Flongle will fully support.  Alas, the term "barcode" is also multiplexed -- the labware often has conventional barcodes, unique patterns to identify species are often called barcodes and then there is the plethora of different uses for oligo-based barcodes.  Such barcodes can be stacked -- first add one and then another or perhaps even add two at once via PCR (such as in Nextera) -- and may track to a plate, well, row, column, single cell, individual molecule, droplet, stage in a process  -- or anything else that your molecular creativity dreams up.,  

Many ONT kits, such as the 1D ligation, Rapid 1D , support only a paltry 12 barcodes.  Kits using PCR have an expansion pack enabling 96 barcodes.  In general I'm a proponent of the idea that "you can never have too many barcodes".  Not only are there so many useful things to barcode, but it is also desirable to rotate through your barcode sets to enable detecting contamination.   So I'd love to see ONT launch additional barcode sets for the non-PCR kits.

But conversely, successful launch of Flongle may reduce some of the demand for barcoding.  If you have 96 samples, my preference is to give each its own barcode and run them together.  But with Flongle, if the numbers work right, one will have an option of running 8 batches of 12.  That entails some additional tracking -- keeping straight which library set corresponds to which row from the plate -- but would be an option.  Washing flowcells has also always been an option, but that's additional work and tracking the used life of each flowcell. 

New Ligation Kit

I realized I left out an interesting detail on the newest ligation kit -- that the new protocol omits an explicit fragmentation step.  Fragmentation was typically performed with Covaris' g-Tubes or by forcing the sample through a needle.  This was necessary to supply sufficient ends to ensure that plenty of DNA molecules were adapted for sequencing.

But the new kit dispenses with shearing, relying on the inherent shearing that occurs during typical DNA extraction protocols and with pipetting samples.  This is accomplished by supplying the sequencing adapters at higher concentration and changing the ligation buffer.   The new protocol also incorporates DNA repair, which has been an optional add-on in the past.  In the presentation, the read length distribution for lambda DNA was essentially a few adjacent spikes, with the highest at 48 kb.

Wrong Guess on Read Until Sizing

At multiple times in his presentation Clive Brown mentioned the concept that it is possible to enrich for long fragments via Read Until, but declined to discuss how long molecules are detected.  I had suggested what seemed obvious -- that the speed of translocation through the pore is affected by the drag of the DNA molecule, with long molecules creating drag.  Well, that's a dead idea now



I'm stuck for now.  Any suggestions from the floor?

Yet Another Golden Flongle Opportunity

I've mentioned before that a great use case for Flongle will be to work out DNA preparation conditions.  As a personal aside, I sometimes think this way because of my mother.  She graduated with a major in chemistry and a minor in math and got through one year of chemistry grad school.  After finding high school teaching not to her taste, she tried to get jobs in the chemical industry and encountered only slightly disguised gender discrimination.  But one job she was offered would have been to supervise a lab which took consumer products such as cake mixes and determine the mix's tolerance to deviations in recipe.  Too much milk, not enough eggs, the wrong kind of oil -- would it still work?

So I've envisioned a similar analysis of DNA extraction protocols.  Now some of this can be multiplexed, but if you're testing for what contaminants kill productivity you can't easily mix those up.  So small, inexpensive flowcells could really help in determining things such as how much of common detergents or solvents can be tolerated.

Another problematic issue on nanopore has been short fragments and adapters, which have a reputation for killing the pores.  So imagine feeding different PCR products of different lengths each into their own flowcell and seeing how the productivity dropoff varies.

Long Read Cappable-seq

At the end of my piece, I made a comment about a future post on "great missing protocol for feeding into Direct RNA" .  What I had in mind is Cappable-seq, a clever protocol from NEB which pulls out 5' ends of bacterial and archeal RNAs.  

The challenge in the prokaryotic world is that active mRNAs are not polyadenylated.  Poly-A tails have been amazing handles for molecular biology and underlie many protocols, particularly the standard ONT Direct RNA.  A variant protocol for 16S RNA uses a conserved region of the RNA to replace the poly-A tail as a handle for driving the RNA into the pore (and backwards! Direct RNA runs 3'->5'!).  Because ribosomal RNA grossly dominates bacterial RNA pools, any bacterial RNA-Seq protocol includes some sort of rRNA reduction step, which adds complexity and cost.

Cappable-seq solves this problem by chemically attaching a handle to the 5' triphosphate on transcribed RNA.  Due to processing, mature ribosomal RNAs lack this triphosphate, and so are not captured.  In the short read world, this allows generating reads which capture the 5' end of RNA and hence the transcriptional start site.  Since bacterial promoter prediction can be difficult -- particularly in organisms with complex life cycles and many sigma factors -- Cappable-seq has the potential to greatly extend our understanding of prokaryotic transcription.

Now a collaboration between NEB and PacBio has extended Cappable-seq to long reads, enabling not only nailing down transcriptional start sites but also the 3' ends.  After capture, RNAs are polyadenylated in vitro, enabling them to go into the PacBio cDNA protocol.  Any sequence that contains a polyadenylation should be a full-length transcript, and the paper has revealed that in E.coli there is great diversity in the extents of transcripts from the same promoter.  

It should be obvious that this same protocol can feed into any long read protocol -- such as ONT's cDNA and Direct RNA.  So if there is any epigenetic modification of RNAs -- which is well-known for bacterial rRNAs -- Direct RNA could capture it.  The scale of throughput that PromethION offers could enable very complex bacterial metatranscriptomes to be read out via Cappable-seq.

That should be enough nanopore for a while.  On Monday I return to AGBT after being absent too many years, so that should be the focus of multiple posts over the coming days. Of course, there are nanopore talks scheduled for Orlando, so don't expect this channel to be completely ONT-free during that period.



6 comments:

Duarte said...

Why do small fragments kill pores? Any idea? I was thinking of testing an application where I wanted to run a library prep consisting of 180 bp fragments (8 of those bp custom barcodes).

Clive indicated nanopore can read small fragments ( he suggested it could read even smaller than that - 50 to 60 bp)

If small fragments are really that bad together with the much lower number of pores on the flongle flowcell I'm not sure my application will work.

Keith Robison said...

Duarte:

I don't believe anyone knows for sure -- of if ONT knows they haven't been effusive in describing the phenomena. One possibility is that every time a DNA exits (or docks?) it has some potential to damage the pore -- so libraries full of many short fragments represent many such events

My belief that short fragments cause faster activity drops is based on experience with several ligation libraries of E.coli preps which had very sharp dropoffs & similar libraries that did not -- and the difference was the fraction of the library composed of short fragments. Now, that difference was due to another variable -- the E.coli genotype -- and the libraries were prepared at different times.

So that's why I see this pressing need -- which Flongle helps make more financially practical -- for ONT to have a test kitchen ala Mom's job offer to explore the landscape of potential problems. So take some DNA that behaves well, such as lambda, and then dope in defined amounts of potential problems such as short fragments and measure the effect on productivity vs. time. Perhaps short fragments are a red herring and there is something else going on. And, as you ask, if short fragments are a problem, then how short?

Knowing this is important for anyone designing counting experiments or trying to detect targets by PCR or working with degraded DNA or so many other scenarios in which the input DNA will be short. And for some of those, it won't be a deal breaker -- but it is important for users to understand how much less data they can expect if working with such materials.


David Eccles said...

Clive previously mentioned that the flow cells worked a bit like a battery in that they discharge over time. Pores discharge quicker when they're not actively sequencing, and I expect this has something to do with less electric current being required to zap through a DNA strand compared to an open pore.

There is a loading time between one strand finishing sequencing and the next one starting, so anything that leads to an increased loading time in proportion to the run length (with associated open pore state) will lead to quicker degredation of the pores. Assuming this loading time is a fixed period of time, shorter sequences will have a higher proportion of time loading vs sequencing when compared to long sequences, so will degrade the pore quicker.

Leeloo said...

I am planning an experiment using MinION with ~400bp fragments as well. The Sales Rep from ONT did mentioned about shorter fragments getting lower performance. I was imagining that I could overload the flowcell (increase my loading DNA library concentration) to compensate for the lagging time between strands at the pore. Not sure will this help yet as I still haven't gotten my MinION!

Unknown said...

I have no idea about anything of this so please don't hurt my inner child if you disagree, but might a separation stage using electrophoresis be helpful in separating large and small DANA segments for the nanopore?

Keith Robison said...

J Ir:
I'm just as in the dark as you are. Electrophoresis would be one approach, but I don't believe you could do it without physically modifying the device -- which I assumed isn't something Clive is talking about but that is pure assumption.