Comments on Omics! Omics!: Homopolymers and Other Recurring Topics in Pore Taste

"Regarding base analogs - the compute load in...

2016-12-05T07:05:07.549-05:00

"Regarding base analogs - the compute load increases exponentially with the size of the nucleotide dataset. So the 50:50 mix proposed above would proba.."

none of what you say here applies to their latest base caller which isn't hmm based and the concept of a kmer is somewhat abstract. the notional kmers overlap by 1 base in any case so you can afford to miss a few. As their users have shown in public analogues can be detected very well in signal both single molecule and consensus.

Given how ambitious what they are doing is, some timing difficulties can be expected. Their customers dont seem to mind. What they have done successfully is get customers used to ongoing updates, both technical and time wise, and to a rapid rate of innovation, more like an agile open software project. That is very different from the traditional 'ABI' school.

I dont recall seeing any bragging about wiping arses with illumina stock? where was that ?

Regarding base analogs - the compute load increase...

2016-11-08T00:18:08.369-05:00

Regarding base analogs - the compute load increases exponentially with the size of the nucleotide dataset. So the 50:50 mix proposed above would probably make their current base caller difficult to implement locally (they still need to run 2D basecalling in the cloud due to computation load). This does not mean that a few years from now better methods would not be available.

On the other hand I am not sure that the readout speed of the electronics has as much headroom as indicated here. Reading out 10-20pA at 10KHz rates is difficult enough. Doing so with 0.5pA resolution increases the challenge, and the difference between some k-mers is likely on the order of 0.1pA (I believe ONT has claimed either 4-mers or 5-mers as their base set). At 500bps the transiebt signal between two neighboring nucleotides is ~0.2ms (assuming 10% of the time is a transition, likely a VERY generous assumption). In order to capture that adequately you would need 10x over sampling which gives you AT BEST a 20us integration time to work with. There is no way that I know of to measure with 0.1pA accuracy at 50KHz.

My biggest problem with ONT is that they grossly exaggerate their ability to deliver on the schedules they advertise. They either know that they are making misleading promises or need to get better planning. This has been true about EVERY announcement they have made since 2012 so it is not unreasonable to assume that they are intentionally misleading. Some may not want to call this lying, but at its core it essentially is. The latest example is their million pore thing that they discussed in the latest update. The statement was "this is at least two years away so don't wait for it". How about "at least 5 years if we stop work on anything else and likely 10 if we do it while improving and further developing our current product line and it will take another $200M+ to turn into a product". Sure, they need to raise money so a certain degree of hype is unavoidable. But I can't think of any other company that I pay attention to which is as liberal with facts.

They have what looks to be good technology and a decent shot at being successful in the marketplace. The CTO wouldn't to just shut up and focus on building a great product. This is what PacBio did under Hunkapiller and is the only reason they are still around. If they had kept bragging how they would wipe their arses with Illumina stock certificates, as they used to and as ONT's CTO claims today, they would have been out of business circa 2013.

"1.5-2 years to see the end of the story?&quo...

2016-11-07T06:11:59.272-05:00

"1.5-2 years to see the end of the story?"

People have been saying that for 9 years already.

Pacbio is loosing 15-20M $ per quarter, Illumina i...

2016-11-05T13:09:34.794-04:00

Pacbio is loosing 15-20M $ per quarter, Illumina is still alive and there are so many competitors.´Reducing the whole story to these two tiny companies may be a huge mistake. On the other hand the "agent of change" seems to be really sure about the future of his company (and so arrogant). 1.5-2 years to see the end of the story?

Mohan, My pleasure. Looking at Dovetail's ...

2016-11-04T08:40:47.400-04:00

Mohan,

My pleasure.

Looking at Dovetail's website, they are using PacBio in their genome sequencing service, but I read it as they first build a genome using conventional technology and then upgrade it with Chicago sequencing. It doesn't specify, but I'd be stunned if they ran this anywhere other than Illumina -- the technique wants lots of events (you are building a statistical model of what is near what using counts; big counts for smaller error and higher resolution) & getting long reads has no obvious benefit here.

WRT base calling on Oxford, you've touched on the fundamental challenge, even in the simpler space of assuming the sequence has only 4 bases (as opposed to more due to biologically modified bases and DNA damage). Even if nature/engineering gave you the perfect pore with 64 distinct signal levels for the 64 different trinucleotides (which is beyond unlikely), noise in the system will make those signal levels overlapping in real life. So looking at the raw signal will give an ambiguous result. But you can use adjacent information to help resolve this - trinucleotide N+1 must be one that with 2-base overlaps of trinucleotide N and trinucleotide N+2. So the HMM and RNN models (and whatever else anyone cooks up) fit the series of events to a model of the sequence. The sampling rate of the electronics exceeds the pace of the motor by several fold, so a model could also attempt to extract information from the transitions. And while most of the signal is from a trinucleotide, more distant sequences will shift it. In other words, the DNA passing through the pore is subject to many influences which are convolved into the output electrical signal.

One of the advantages to any "2D" technique (the one PacBio is claiming or the other options I outlined or even post-alignment polishing perhaps) that reads both strands is that the particular oddities will probably not be symmetrical, so by combining information one has a much richer dataset which can be resolved into a single model.

New players are always welcome; Genia is a more near-term threat to both companies but others such as TwoPoreGuys might make inroads -- but we can't really assess that until we start seeing some actual data, can we? And better doesn't always win -- just ask the Betamax team at Sony. I'm not saying the others won't win, but the market has its fair share of random noise as well

Hi Keith, Thanks a lot for that ! It's amazi...

2016-11-04T06:55:13.964-04:00

Hi Keith,

Thanks a lot for that ! It's amazing that you took the time to explain stuff and share some of your knowledge. I definitely learnt a few things there.

On the systemic homopolymer error, I now realize i was confused when I first asked about it. Now I understand the homopolymer error better , essentially if AA and AAA generate the same amplitude, it's difficult to know how many As there are and as a result, this results in indel errors. But some of this can be corrected by reading multiple times and through your potential solutions/tradeoffs.

That made me realize I also had a question on simple systemic error.

Nanopore tech relies on measuring the amplitude of current, I think ONT's case, it's measured for 3 nucleotides(?) at a time. if we have {YYY} as a triplet, then there are 64 (4^3) possibilities. In order to get it right, you need to have 64 different amplitudes. However, if there are less than 64 detectable levels, this will generate a systemic error which can't be recovered no matter how many times this is read.

https://www.ncbi.nlm.nih.gov/pubmed/23744714 looks interesting in that regard - You can skip to section 3.1 for an interesting explanation. I guess how good ONT's ASIC is in differentiating the amplitudes would tell us how low the systemic threshold can be. As a high systemic threshold can't be resolved by reading multiple times, the electronics has to close this gap. I guess they are always improving. Would you know where they stand on this, for eg a map of nucleotides to amplitudes that shows improvement etc?

I fully agree with you that portability, speed, selective sequencing etc are valued and therefore one can substitute some of the current NGS methods with ONT if they can sacrifice some other functionality. I also agree that was probably harsh to generalize ONT's behaviour, I guess it's more like promising too much too early as you mentioned.

In any case, the area is changing rapidly, companies like twoporeguys probably would take some of ONT's cake as ONT takes PacBio's and PacBio eats some of Illumina's :)

On your other points, for what it's worth, Dovetail offers Pacbio now and Pacbio reports their total no of papers > 1800.