Tuesday, March 21, 2017

Obviousness: Rarely Obvious

Pacific Biosciences has made new thrusts in their ongoing intellectual property action against Oxford Nanopore, adding two recently issued patents to the fray.  Oxford has publicly brushed these off as "another pore excuse for a lawsuit", but certainly the battle is not over.  One of these patents, 9,542,527 "Compositions and methods for nucleic acid sequencing", appears to concern using hairpin linkages to read both strands, much like the 9,404,146 "Compositions and methods for nucleic acid sequencing"  patent that PacBio led with.  Since Oxford has announced they will abandon their "2D" methods that use such hairpins, this angle would seem to be soon irrelevant (as I predicted back when PacBio originally attacked).  But the other, US 9,546,400 "Nanopore sequencing using n-mers" covers basecalling methods, which is a new twist.  A route to challenge any patent is to identify "prior art", information which was publicly available at the time of the patent filing which impinges on the claims in the patent application.  Not only can exact matches to prior art be an issue, but also anything which would be "obvious" to a skilled practitioner.  And that can certainly be a can of worms



Cas9: Illustrating the Difficulty Of Calling Out Obvious

The recent patent battle over Cas9/CRISPR technology is, in my opinion, illustrative of how messy determining obviousness can be.  Now, I haven't reviewed the decision in favor of the Broad's patent nor have I dug into the various filing dates and such which I believe proved pivotal, so I'm not really looking at the battle as it actually played out.  But there have been frequent complaints on Twitter and elsewhere that the Broad/Zhang claims to human genome editing were obvious extensions of the U.C./Doudna work on bacterial genome editing.  I'm now going to attempt to convince you that it was neither obvious nor non-obvious.

The argument for obviousness is, well, obvious, right?  Cells are cells, right??

Well, let's take the example of phage integrases.  If you take the integrase from PhiC31, a Streptomyces phage, and express it in human cells along with a DNA construct with the appropriate donor site, then you can integrate that construct into the human genome.  While PhiC31 doesn't infect human cells, the integration site has enough slop that in a genome as large as human there can be multiple sites which function.  Indeed, this could be seen as confirmation of the theory that DNA binding sites evolve to be just constrained enough to be found specifically in their home genome. In fact, PhiC31 integrase has proven so useful in this role that in PubMed I think there are more abstracts for PhiC31 use in mammals than in Streptomyces (though it is a workhorse there).  

Now, try the same thing with lambda phage integrase.  Won't work.  Why not?  Well, it turns out that lambda's integrase forms a complex with host proteins, and only with that complex formed can the integrase reaction proceed.  In contrast, PhiC integrase not only can function in a mammalian cell, but as a purified protein in vitro.  It actually follows that integrases with an active site serine, such as PhiC31 integrase, function autonomously whereas those relying on an active site tyrosine, such as lambda, never do.

So how does this apply to Cas9?  Well, there are two general schemes for arming Cas( with the guide RNA scheme.  One is to arm it in vitro and transfect the complex into cells, the other is to express the guide RNA in vivo.  Only if that RNA is correctly processed in vivo with the in vivo approach work, and that processing relies on specific RNases.  Was it obvious if this processing would correctly occur in mammalian cells?

I noted that PhiC31 integrase will function as a purified protein in vitro.  In the electrifying Jinek et al paper, Doudna's group showed Cas9 cleavage in vitro.  So that is what really made mammalian editing obvious, right?

Well, maybe.  But we now know that so called anti-CRISPR proteins exist.  What if somehow mammalian cells had ubiquitous anti-CRISPR activity?  Or what if trying to express guide RNAs or introducing guide RNA-Cas9 complexes had triggered antiviral responses, a bane of many early attempts at RNAi?  

In the end, Cas9 function has turned out to be relatively straightforward, with none of these bugbears being real.  But was this obvious?  What degree of uncertainty is sufficient to make something non-obvious?  Is it obvious only after successful experiments, in which case success wasn't obvious? My phage integrase example is hardly unique: lambda-red recombination doesn't work even in every bacterial system. 

What Basecalling Is And Is Not Potentially Covered by 9.546,400?

PacBio's basecalling patent has 15 claims. I'm going to copy all of them here because it will make things more clear.  Or pretty unclear.  As always when I try to read patents, I feel like I'm a Perl programmer trying to debug Prolog.  It's a whole special language which isn't on the same plane as what I'm used to, so things which might be clear to someone trained in patent law are just plain nonsensical.  So here's the list
1. A method for sequencing a nucleic acid template comprising:
    a) providing a substrate comprising a nanopore in contact with a solution, the solution comprising a template nucleic acid above the nanopore;
    b) providing a voltage across the nanopore;
    c) measuring a property which has a value that varies for N monomeric units of the template nucleic acid in the pore, wherein the measuring is performed as a function of time, while the template nucleic acid is translocating through the nanopore, wherein N is three or greater; and
    d) determining the sequence of the template nucleic acid using the measured property from step (c) by performing a process including comparing the measured property from step (c) to calibration information produced by measuring such property for 4 to the N sequence combinations.
2. The method of claim 1 wherein a property in step (c) comprises current.
3. The method of claim 1 wherein the translocation through the pore is driven by the applied voltage.
4. The method of claim 1 wherein the translocation rate through the pore is enzymatically controlled.
5. The method of claim 3 wherein the translocation through the pore is controlled by a polymerase, a 6. The method of claim 1 wherein N corresponds to n-mers comprising 3-mers, 4-mers or 5-mers.
7. The method of claim 6 wherein N corresponds to n-mers comprising 3-mers.
8. The method of claim 1 wherein the method is carried out on an array of nanopores in the substrate.
9. The method of claim 1 wherein the sequencing comprises peak finding by heuristic decision-tree algorithms, Bayesian networks, hidden Markov models, or conditional random fields.
10. The method of claim 1 wherein the comparing process comprises examining a lookup table for each of the 4 to the N combinations, and keeping only those meeting a threshold value.
11. The method of claim 10 wherein threshold value is within 2 sigma of the expected value.
12. The method of claim 1 wherein some of the values for the 4 to the N sequence combinations are degenerate within the error of the measurement.
13. The method of claim 1 wherein after each single-nucleotide translocation through the nanopore, the possible n-mers for that measurement are looked up, and all the possibilities from the previous measurement that are not consistent with the most recent measurement are thrown away.
14. The method of claim 1 wherein N corresponds to n-mers comprising 4-mers.helicase, a translocase, a viral genome packaging motor, or a chromatin remodeling complex.
15. The method of claim 1 wherein N corresponds to n-mers comprising 5-mers

Note that claim #6 covers 3-mers, 4-mers or 5-mers but claim #7 is for the method of claim #6 with 3-mers.  Huh??  On twitter, one respondent suggested that perhaps claim #6 covers models that mix N-mer sizes but #7 for those specifically around 3-mers.  Perhaps this is shades of the recent Oxford comma case.

But in any case, from a basecalling perspective the patent appears to cover training models of several types (lookup tables, decision trees, Bayesian networks, hidden Markov models or conditional random fields) with all possible nucleotide sequences of length N, where N is between 3 and 7.

I did a quick search for relevant papers published before the April 10, 2009 priority date on patent 9,546,400 and found a whole series of relevant papers with a common author.   "Nanopore cheminformatics" from Winters-Hill and Akeson back in 2004 describes using support vector machines (not mentioned in the patent), EM algorithms and HMMs for analyzing nanopore data.  "DNA molecule classification using feature primitives" by Iqbal, Landry and Winters-Hill (2006) also describes HMMs for classifying nanopore hairpins as well as tossing out the idea that decision trees are often used in this problem space.  The abstract for "Cheminformatics methods for novel nanopore analysis of HIV DNA termini" (Winters-Hill et al, 2006) specifically describes classification of dinucleotides with HMMs.  "Analysis of nanopore detector measurements using Machine-Learning methods, with application to single-molecule kinetic analysis (Landry & Winters-Hill, 2007) again mentions HMMs and SVMs.  "Duration learning for analysis of nanopore ionic current blockades" (Churbanov, Baribault & Winters-Hill, 2007) and "A novel, fast, HMM-with-Duration implementation - for application with a new, pattern recognition informed, nanopore  detector (Winters-Hill & Baribault, 2007), "Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory" (Churbanov & Winters-Hill, 2008) and "Clustering ionic flow blockade toggles with a mixture of HMMs" (Churbanov & Winters-Hill, 2008) all cover using HMMs for nanopore signal interpretation, with at least dinucleotides as the training set.  Curiously, the Pacific Biosciences patent cites none of these papers.

So once you've established HMMs for nanopore base calling trained on 2-mers, is it truly non-obvious to try training them with longer N-mers? After all, isn't every practitioner of the art always tempted to build a bigger, more complex model when a simple one fails?  Some of the above papers hint at issues with computing the models, particularly in real time, but the PacBio patent has nothing that I would see as teaching how to get past such issues should they arise.  So what exactly is novel and inventive about the PacBio patent?

Thinking in the other direction, suppose I train a model on 6-mers, which PacBio did not claim.  Suppose I successfully train that model and discover that the model has effectively zeroed out the last position, giving it no weight.  Am I now infringing on PacBio's patent by having a model that is effectively a 5-mer model, even though I intended it to be a 6-mer model?  Not obvious to me; perhaps someone with patent experience can chime in.

The one paper mentions decision trees, but mostly as a contrast to their method, which they say obviates the need for decision trees.  Is that simple mention in the open literature enough to torpedo patenting decision trees for basecalling?  And what algorithm is more obvious than decision trees?  I suppose the answer is lookup tables, but that is also a method named in the patent.

A number of commenters on Twitter and on my piece on GridION have made relevant comments as well -- as well as general complaints about the U.S. patent office.  Just to head off one rumor, I will note that PacBio's filing US20100331194, which I think is what established the priority date, long predates Oxford's electrifying 2012 AGBT presentation, and this application mentions N-mer methods and the range of algorithms described above. So it would not appear the patent cribbed from Oxford's presentation.

What is the potential impact of PacBio's patent on Oxford's current platform?  First, of course, the issues of prior art and obviousness could well erode the PacBio claims to an ineffectual wisp.  But even if they stand, Oxford's current base caller uses recurrent neural networks, which do not appear to be in the patents.  The not-quite-released Scrappie basecaller uses "transducer" architectures; unless these resolve to one of the named algorithms ("conditional random fields"?), that is outside the patent.  Yes, prior basecallers from ONT used HMMs, but those have gone by the wayside.  So is PacBio again charging forward shooting blanks?  I'd love to hear the counter-argument, because right now that is the opinion I would be leaning towards.  Just trying to throw shade on a competitor with low-probability infringement lawsuits seems like a poor strategy and a waste of money on legal expenses.

[22-MAR-17 11:29 fixed bad URL for one of the patents]

19 comments:

Anonymous said...

I have (unfortunately) had to read quite a few patents lately. There is so much of what I would consider obvious crap out there, but our lawyers tell me that this is beside the point, that it is quite difficult to invalidate a granted claim. Yes, RNN are likely not in the PacBio patent (which by the way has a single primary claim, everything else is secondary) but I am guessing PacB's lawyers would argue that RNNs do little more than matching patterns that are determined by n-mers. ONTs latest material from. Just a few days ago state that R9 generates data from 5-mers. Good luck trying to explain to a jury the difference between the two approaches.

Anonymous said...

"Just to head off one rumor, I will note that PacBio's filing US20100331194, which I think is what established the priority date, long predates Oxford's electrifying 2012 AGBT presentation, and this application mentions N-mer methods and the range of algorithms described above."

No, but their filing does follow hot on the heels of Illumina first investing in Nanopore. That must have really spooked them as the priority document looks like a brainstorm of anything they had seen, heard, or thought of relating to nanopores. They file it then they sit back and wait to see what Nanopore actually do in their product, from Nanopore's presentations and patent filings. Once PacBio know they can then file a stream of continuations specifically targeted to a key aspects. It is a well known trick and the USPTO is considering limiting the number of continuations to fight back against the patent trolls.

Anonymous said...

you also got the question of enablement. Pacbio have not implemented any of the claims.

Anonymous said...

"So it would not appear the patent cribbed from Oxford's presentation."

despite priority date, claims can be added post fling under some patent authorities, the US included. So narrower claims can be added provided the initial filing was broad enough to encompass them. The initial fling would establish the priority date. If the claims were added post ONTs AGBT12 talk, to a broad initial filing, that would not be unusual. Looking at the revision history would reveal if this was the case or not.

Anonymous said...

In Silicon valley, patent/trademark battle had always been a way to delay competitors. Oracle and Google had been fighting on whether others can create Java-like language (different name, same APIs), because Oracle owns Java trademark.

What I find interesting is that the battle is taking place entirely in Nanopore's side of the business. Even if ONT wins the battle, Pacbio will not have any impact on its own side of the business and IP ownership. OTOH, if ONT loses, its options in developing algorithms will get reduced.

Anonymous said...

"What I find interesting is that the battle is taking place entirely in Nanopore's side of the business"

Apart from legal bills which are astronomically high for both I imagine. District court actions take years so it always comes down to who has most cash in the bank to see it through. I bet when a decision comes one will be long gone (my bet is PacBio as this looks like a desperate act from someone sinking)

Anonymous said...

"if ONT loses, its options in developing algorithms will get reduced."

IF pacB has enough cash to last out the suit, and IF the patent is valid and holds up. The patent in question is only US although they are suiting ONT on 2D elsewhere but that now looks irrelevant. Any of the supposedly patented base calling methods could be made by 3rd party academics under open source, in which case we assume pacB will then be suing US universities.

David Eccles said...

If ONT loses, its options in developing algorithms will get reduced.

An algorithm on its own is not patentable:

http://yaroslavvb.blogspot.co.nz/2011/02/how-to-patent-algorithm-in-us.html

https://en.wikipedia.org/wiki/Software_patent

It needs to be combined into an invention in order to be eligible for a patent (or protection). If ONT uses a new algorithm on an existing [patented, or publicly known] device, it shouldn't be possible for a patent infringement to occur for patents with the new algorithm and a priority date after the creation/use of the existing device. I'm using "shouldn't be" rather than "isn't", because I'm very much not a lawyer, and what matters is what the lawyers are able to argue, rather than what armchair bioinformaticians believe.

On the CRISPR outcome, according to Wikipedia "the Broad patents with claims covering the application of CRISPR/cas9 in eukaryotic cells was distinct from the invention claimed by University of California":

https://en.wikipedia.org/wiki/CRISPR#Patents_and_commercialization

To me, this means that the patent claims were not identical. This does not mean that the claims did not explore the same concept. While there would have been some overlap, it's the entirety of each claim that matters for patent infringement (presumably also including an obvious extension of a claim). If each patent had a non-obvious and different dependent component for all claims, it would be difficult to justify infringement.

Anonymous said...

> An algorithm on its own is not patentable:

This appears like an uninformed statement. Computer industry has been patenting cryptography algorithms and compression algorithms for a long time.

https://www.google.com/patents/US5533051

https://www.google.com/patents/US5724428

http://softwareengineering.stackexchange.com/questions/32482/can-an-algorithm-be-patented

> Any of the supposedly patented base calling methods could be made by 3rd party academics under open source, in which case we assume pacB will then be suing US universities.

What nonsense. "Open source" is not a license to violate patents. Once again, you need to check the history of RSA cryptography algorithms and compression algorithms.

Anonymous said...

"What nonsense. "Open source" is not a license to violate patents. Once again, you need to check the history of RSA cryptography algorithms and compression algorithms."

I think his point was that if the customer infringes then they are hardly going to sue them as it is likely they also use their product or may in the future. In Life Sciences I am not aware of any company being bold/stupid enough to sue a university.

One may argue that regardless Nanopore are still enabling infringement though so it is irrelevant.

Anonymous said...

Here is a totally unrelated quote:

Peter Detkin coined the term ‘troll’ to avoid more lawsuits: “We were sued for libel for the use of the term ‘patent extortionist’ so I came up with the ‘patent trolls’,” Detkin said. “A patent troll is somebody who tries to make a lot of money from a patent that they are not practicing, have no intention of practicing and in most cases never practiced.”

Just saying, obviously nothing to see here...

Anonymous said...

> "Apart from legal bills which are astronomically high for both I imagine. District court actions take years so it always comes down to who has most cash in the bank to see it through. I bet when a decision comes one will be long gone (my bet is PacBio as this looks like a desperate act from someone sinking)"

I find comments about Pacbio's seemingly desperate last acts surprising especially if you compare it to ONT in the context of paying customers. Pacbio sells atleast 90mm USD of stuff every year. So it has paying customers and some of them have said they run all their machines non-stop 24hrs a day (Histogenetics etc), so it doesn't appear like these customers will just go away like that.

If one was to guess ONT's revenues based on the update from Clive Brown (he reported that there were now 4000 minions, there were about 3000 last year, so an addition of 1000 ~ 1mm USD, one can guess the revenues to be under 5mm, even assuming a very generous usage, about 10k flow cells at the highest price - please do correct if you have a better number). This is unlikely to be commercially viable and that's the reason why the GridIon was launched pricing effectively the same thing for 25x the price, with Promethion also in the works and the Minion also available, a number of offerings suggesting a confused business model or some ingenious plan that only insiders are aware of.

I think the 'desperation' that people read into Pacb probably has to do with it being publicly listed. Imagine ONT being publicly listed with revenues at 750k last year and about 5-10mm this year with operating margins around 25%.... On irrational (or rational for ONT bears) days the market might value it closer to 100mm than to 1000mm.

The patent action can be simply interpreted as trying to disadvantage a potential competitor. Especially one that they claim has copied a previous patent on the hairpin to its advantage.

It's probably fair to say, these are two closely matched competitors, ONT offering longer read lengths and promising higher throughput, lower accuracy with less reproducibility while Pacbio offers something more reliable, with not as exciting read lengths at a higher price for now.

It's certainly not netflix vs blockbuster, if anyone has an illuminating comparison for this competition for a poor second place (vs Illumina) with another industry that would be v useful.




David Eccles said...

> Computer industry has been patenting cryptography algorithms and compression algorithms for a long time.

Perhaps, but the people who write patents do attempt to structure them in such a way that it appears that they are more than just algorithms:

US5533051 -- a method of operating a digital processing device for compressing data represented in digital form... a method of extracting redundancy from a stream of electrically encoded binary data

US5724428 -- A simple encryption and decryption device has been developed ... a method of communicating information

To a casual observer these don't appear to be pure algorithm patents, and that's frequently all that matters to get it past the patent examiners. Until a patent is tested by the courts, it's not correct to use the awarding of the patent as a demonstration of validity. To quote the top answer on the stackexchange post you linked:

"""
Then there's also the problem that the people in the patent office are generally simply not qualified to determine whether a particular software invention is patentable or not, leaving it up to the courts to decide whether a patent was valid when the owner tries to assert their rights to it. That means if you're a small company and you "infringe" on an invalid patent, you likely don't have the resources to fight the patent anyway (even if it's invalid).
"""

And from the second-top answer:

"""
in the US Supreme Court case In re Bilski, the Court rejected the "machine-or-transformation test" as the sole test of patentability. (One of the Justices dissented from the opinion, stating that the Court did not go far enough in rejecting these kinds of patents wholesale.) The result is that many business method patents are now invalid, and the USPTO has begun denying software algorithms and other method patents - not all of them, but a few.

I'd suggest going to Groklaw's Bilski page and reading more about it.

It's worth adding that the more recent Alice Corp. v. CLS Bank International case, the Supreme Court recently overturned the CAFC's decision to affirm software patents. The patents cover what amounts to escrow, when done over the Internet. The Supreme Court held that merely adding "over the Internet" or "on a computer" is not enough to make a patent covering an abstract idea valid. This substantially narrows the field for software patents, but does not make them invalid.
"""

My own interpretation of all of this is that a patent can't depend entirely on a software algorithm; there needs to be a component of the patent claims that depends on a non-algorithmic process or inventive piece of hardware (and a general-purpose computer is not sufficient for that). If someone could theoretically carry out the method outlined in the patent using only a pen and paper, it shouldn't be patentable.

Anonymous said...

"If someone could theoretically carry out the method outlined in the patent using only a pen and paper, it shouldn't be patentable."

I think that's what used to be called a "mental act". However, it seems the USPTO has a much lower bar.

Anonymous said...

> Detkin said. “A patent troll is somebody who tries to make a lot of money from a patent that they are not practicing, have no intention of practicing and in most cases never practiced.”

That describes universities to a tee.

It is also funny that his next step was to start a patent troll company named Intellectual Ventures :)

Anonymous said...

"despite priority date, claims can be added post fling under some patent authorities, the US included. So narrower claims can be added provided the initial filing was broad enough to encompass them. The initial fling would establish the priority date. If the claims were added post ONTs AGBT12 talk, to a broad initial filing, that would not be unusual. Looking at the revision history would reveal if this was the case or not."

Added claims cannot claim the same priority date as the original filing.

Anonymous said...

"That describes universities to a tee."

You mean the people who put it into practice and are also funded by tax payers money? Idiot...

Anonymous said...

> You mean the people who put it into practice and are also funded by tax payers money? Idiot...

I would not call those running universities idiots. Crooks may be.

First they take huge cut (80-90%) from taxpayers money as "overhead" to pay the administrative overlords. Then they hire cheap slaves from India and China as graduate students to do the actual work. Counting nights and weekends those slaves work for, they may get paid $5/hr or less, while their liberal professors preach the virtue of $15/hr minimum payment to the sucker public. At the end, if the Chinese/Indian guy discovers something, the professors gets 1/3rd of the patent and the patent troll university administrators get another 1/3. So, the slave is essentially cut off from his discovery by 2:1 vote, because the professors always side with the administrators.

Anonymous said...

I was calling the guy who said that universities were patent trolls an idiot.