Comments on Omics! Omics!: Selective sequencing: A Programming Opportunity!

Great post. Just thought I'd chip in here on ...

2016-04-03T18:59:06.588-04:00

Great post.

Just thought I'd chip in here on the comment that you have "some overall throughput wasted on rejected reads". First the read lengths that you are quoting are not long enough to illustrate the benefit of read until. You need to think about much longer read lengths. In the example we use in the manuscript we focus on 2Kb amplicons. We'll stick with the 1D example you provide, so 2kb at 70 bases per second will take just over 28 seconds to sequence. If you are not interested in a specific sequence - say you have already seen this enough copies of this specific sequence or it just isn't of interest to you - then without read until you will have to wait 28 seconds until you can sample another read. With read until as we have implemented it, you collect 3.5 seconds worth of data (approximately 250 bases), analyse and choose if you want to sequence that read or not. That round trip might take a second with a further 1 second 'downtime' as the pore reloads another molecule. Thus you would be free to move on to the next read in 5.5 seconds, rather than 28 seconds, saving 22.5 seconds and allowing you to use that time for another read. Of course - as you move to larger reads, the savings become substantial - If you can match a 20kb read in 5.5s you will save over 4 minutes in sequencing time by rejecting it if it is of no interest.

Thus with read until I would argue that you aren't wasting time on rejected reads, rather you are saving time on sequencing unwanted reads.

Now the figures I am quoting above are based on a perfectly tuned system, but ultimately read until should allow you to achieve a specific sequencing goal faster than you would otherwise. Exploring these dynamics is extremely interesting and worth thinking about in some detail.

There are many other methods that can be applied to squiggle matching - and they all have accuracy/speed trade offs. I think that the new potential opened up by this type of approach requires careful thought to fully optimise and exploit maximally. As sequencing speed and throughput increases, the matching challenge becomes harder and in some respects the benefits may lessen (imagine the TBs of throughput suggested on the PromethION and consider the trade off in compute power versus just sequencing some more). But you might be able to exclude 'contaminating' reads from your data set. Or exclude specific regions of DNA from analysis. Or enable the analysis of subsets of a patients genome... Or choose to sequence potentially variant regions at much greater depth... Or dynamically consider haplotypes... My personal favourite is the concept of an interactive assembly - grow your assembly by choosing to sequence reads mapping to contigs and scaffolds as they 'grow'.

Yes, I think you have it. You are right that there...

2016-03-31T00:09:30.795-04:00

Yes, I think you have it. You are right that there would be a small tax in overall performance, since one is forcing pores to go into the "waiting for new DNA" state more often.

Interesting that two open-source basecallers have appeared this week.

NanoCall is fully local and can apparently keep up with the current MinION (~70bps) with only 8 cores. Now, with the R9 pores running at close to 300bps, that may require corresponding larger horsepower, though the model may be simpler since most of the signal is from 3 bases.

DeepNano relies on Metrichor for some critical parameters, but is very fast after that. It looks like NanoCall could generate those parameters, so perhaps the two can feed off each other -- DeepNano is claiming a higher accuracy on 1D than Metrichor achieves (again, using an R7 chemistry).

Thanks Keith for your clarification. So the main ...

2016-03-30T23:27:55.335-04:00

Thanks Keith for your clarification.

So the main advantage is to get the interested reads as soon as possible.

As to the overall output from the sequencer by doing selective sequencing, I think we can expect higher proportion of reads that we are interested by sacrificing some overall throughput wasted on rejected reads.

Does that sound right?

The MinION, as far as can be determined, has no li...

2016-03-29T09:26:36.467-04:00

The MinION, as far as can be determined, has no limit on the read length. What selective sequencing / read-until is attempting to maximize is the useful sequence which is read.

It all gets back to this concept of duty cycle. A given pore grabs a DNA molecule, processes it, and then is available to process another DNA molecule. The overall throughput of the sequencer in a given time is the number of bases read -- but the useful throughput is the number of those bases read which matter for the experimenter. So if we can determine reliably that a molecule is not of interest, we can try to cut down the number of bases read which are not of interest. Selective sequencing is attempting to increase the yield (and yield per unit time) of interesting information by actively avoiding sequencing uninteresting DNA.

In selective sequencing, the goal is to determine quickly which molecules are uninteresting (either overall uninteresting, or simply redundant given what has already been read). By deciding early on to eject a strand rather than continuing to sequence it, the pore is available earlier to start reading another molecule -- we've sped up the duty cycle, at the cost of throwing away (not reading) data. But we determined we didn't care about the data we are throwing away is of little/no interest, so that is a net gain.

Thanks for another blog on this subject. I have r...

2016-03-29T02:22:26.950-04:00

Thanks for another blog on this subject.

I have read the Loose preprint but I still not quite sure how selective sequencing works.

Is the throughput of a Minion sequencer limited by a certain number of bases it can sequence? Suppose we have a Minion that can sequence at most 1000 bases and has only one pore. We also only consider 1D reads. There are seven reads of lengths 50,101,145,222,345,137,310 going thru the single pore in this order. Then without selective sequencing, we should get them all except for the last read.

Suppose we now read the first 50bp to see if it contains a 30bp sequence we are interested in. If the first 50bp doesn't have it, then we reject the read. Assume only the 3rd,5th,7th pass the rejection test, then we would have sequenced 50,50,145,50,345,50,310 and the selective sequencing approach allows us to sequence the 7th read. Is this how it works?

Thanks a lot in advance.

On the subject of poretools -we do provide an API ...

2016-03-26T13:13:45.996-04:00

On the subject of poretools -we do provide an API that might be helpful for people writing code in Python. A unified interface across all tools is a nice idea, but may be difficult to achieve in practice.