Okay, first let's clear the water I muddied by being careless about the term multiplexing. It is fitting, but confusing, that this term is itself multiplexed. In the nanopore world, there is multiplexing of the sensors and multiplexing of samples.
Sensor multiplexing, which ONT calls "mux", is the pairing of a given electrical sensor on the flowcell with multiple pores. Muxing delivers multiple benefits. First, it enables higher sensor utilization, as a single dead pore doesn't mean a sensor goes unused. Second, it allows higher overall yield by switching pores periodically to rotate dead pores out in favor of live ones.
MinION flowcells use muxing, with four pores available to each sensor. At the beginning of each run, the MinKNOW software steps through the pores to rank their quality determining when they will be used. This step is called the "mux scan" or often (to enhance the potential for confusion) just "muxing". Quite a few of the elite nanopore groups maximize their MinION productivity by "re-muxing", stopping runs and re-running the mux scan. It's surprising that this isn't a built-in option to MinKNOW, particularly since it has some additional complications -- the voltage applied to the flowcells must be adjusted based on how long the cell has run and even accessing this option requires editing the MinKNOW running script.
Flongle flowcells do not use sensor multiplexing aka muxing -- each sensor is paired with only a single pore. So if that pore is unusable, the sensor sits idle during the run. There's also no opportunity for re-muxing -- nor any need for a mux scan (I think). But Flongle seems to overcome these issues by having a very high initial pore availability -- 90% -- and then having a very slow drop-off of pore numbers. So Flongle flowcells are much simpler in design yet are specified to deliver a lot of data (obviously, field testing data isn't available). MinION has four times as many sensors as Flongle and each has access to four pores rather than only one. So 16 times the possible capacity, though of course some sensors may be unlucky and receive multiple bad pores. Yet Flongle is touted to deliver a gigabase or more, quite respectable versus the 10Gb+ yields being seen on MinION with R9.4.1 flowcells.
Sample multiplexing is via DNA barcodes, which Flongle will fully support. Alas, the term "barcode" is also multiplexed -- the labware often has conventional barcodes, unique patterns to identify species are often called barcodes and then there is the plethora of different uses for oligo-based barcodes. Such barcodes can be stacked -- first add one and then another or perhaps even add two at once via PCR (such as in Nextera) -- and may track to a plate, well, row, column, single cell, individual molecule, droplet, stage in a process -- or anything else that your molecular creativity dreams up.,
Many ONT kits, such as the 1D ligation, Rapid 1D , support only a paltry 12 barcodes. Kits using PCR have an expansion pack enabling 96 barcodes. In general I'm a proponent of the idea that "you can never have too many barcodes". Not only are there so many useful things to barcode, but it is also desirable to rotate through your barcode sets to enable detecting contamination. So I'd love to see ONT launch additional barcode sets for the non-PCR kits.
But conversely, successful launch of Flongle may reduce some of the demand for barcoding. If you have 96 samples, my preference is to give each its own barcode and run them together. But with Flongle, if the numbers work right, one will have an option of running 8 batches of 12. That entails some additional tracking -- keeping straight which library set corresponds to which row from the plate -- but would be an option. Washing flowcells has also always been an option, but that's additional work and tracking the used life of each flowcell.
New Ligation Kit
I realized I left out an interesting detail on the newest ligation kit -- that the new protocol omits an explicit fragmentation step. Fragmentation was typically performed with Covaris' g-Tubes or by forcing the sample through a needle. This was necessary to supply sufficient ends to ensure that plenty of DNA molecules were adapted for sequencing.
But the new kit dispenses with shearing, relying on the inherent shearing that occurs during typical DNA extraction protocols and with pipetting samples. This is accomplished by supplying the sequencing adapters at higher concentration and changing the ligation buffer. The new protocol also incorporates DNA repair, which has been an optional add-on in the past. In the presentation, the read length distribution for lambda DNA was essentially a few adjacent spikes, with the highest at 48 kb.
Wrong Guess on Read Until Sizing
At multiple times in his presentation Clive Brown mentioned the concept that it is possible to enrich for long fragments via Read Until, but declined to discuss how long molecules are detected. I had suggested what seemed obvious -- that the speed of translocation through the pore is affected by the drag of the DNA molecule, with long molecules creating drag. Well, that's a dead idea now
.@OmicsOmicsBlog 's good simple idea: Long molecule @nanopore Read Until - "long DNAs will have more drag." Here's length vs. speed in a subsample of 200K reads. Blue line is 450bp/s, orange is running median. Min time is 1s (hence sharp left limit). No obvious trend... pic.twitter.com/nihkjxxGte— olin silander (@osilander) February 10, 2018
I'm stuck for now. Any suggestions from the floor?Keep trying.— Clive G. Brown (@Clive_G_Brown) February 10, 2018
Yet Another Golden Flongle OpportunityI've mentioned before that a great use case for Flongle will be to work out DNA preparation conditions. As a personal aside, I sometimes think this way because of my mother. She graduated with a major in chemistry and a minor in math and got through one year of chemistry grad school. After finding high school teaching not to her taste, she tried to get jobs in the chemical industry and encountered only slightly disguised gender discrimination. But one job she was offered would have been to supervise a lab which took consumer products such as cake mixes and determine the mix's tolerance to deviations in recipe. Too much milk, not enough eggs, the wrong kind of oil -- would it still work?
So I've envisioned a similar analysis of DNA extraction protocols. Now some of this can be multiplexed, but if you're testing for what contaminants kill productivity you can't easily mix those up. So small, inexpensive flowcells could really help in determining things such as how much of common detergents or solvents can be tolerated.
Another problematic issue on nanopore has been short fragments and adapters, which have a reputation for killing the pores. So imagine feeding different PCR products of different lengths each into their own flowcell and seeing how the productivity dropoff varies.
Long Read Cappable-seqAt the end of my piece, I made a comment about a future post on "great missing protocol for feeding into Direct RNA" . What I had in mind is Cappable-seq, a clever protocol from NEB which pulls out 5' ends of bacterial and archeal RNAs.
The challenge in the prokaryotic world is that active mRNAs are not polyadenylated. Poly-A tails have been amazing handles for molecular biology and underlie many protocols, particularly the standard ONT Direct RNA. A variant protocol for 16S RNA uses a conserved region of the RNA to replace the poly-A tail as a handle for driving the RNA into the pore (and backwards! Direct RNA runs 3'->5'!). Because ribosomal RNA grossly dominates bacterial RNA pools, any bacterial RNA-Seq protocol includes some sort of rRNA reduction step, which adds complexity and cost.
Cappable-seq solves this problem by chemically attaching a handle to the 5' triphosphate on transcribed RNA. Due to processing, mature ribosomal RNAs lack this triphosphate, and so are not captured. In the short read world, this allows generating reads which capture the 5' end of RNA and hence the transcriptional start site. Since bacterial promoter prediction can be difficult -- particularly in organisms with complex life cycles and many sigma factors -- Cappable-seq has the potential to greatly extend our understanding of prokaryotic transcription.
Now a collaboration between NEB and PacBio has extended Cappable-seq to long reads, enabling not only nailing down transcriptional start sites but also the 3' ends. After capture, RNAs are polyadenylated in vitro, enabling them to go into the PacBio cDNA protocol. Any sequence that contains a polyadenylation should be a full-length transcript, and the paper has revealed that in E.coli there is great diversity in the extents of transcripts from the same promoter.
It should be obvious that this same protocol can feed into any long read protocol -- such as ONT's cDNA and Direct RNA. So if there is any epigenetic modification of RNAs -- which is well-known for bacterial rRNAs -- Direct RNA could capture it. The scale of throughput that PromethION offers could enable very complex bacterial metatranscriptomes to be read out via Cappable-seq.
That should be enough nanopore for a while. On Monday I return to AGBT after being absent too many years, so that should be the focus of multiple posts over the coming days. Of course, there are nanopore talks scheduled for Orlando, so don't expect this channel to be completely ONT-free during that period.