Sunday, March 28, 2010

Ridiculous Claims

An item last week in GenomeWeb covered a new analysis by Robin Cook-Deegan and colleagues of the Myriad BRCA patents. One bit in particular in the article has stuck in my craw & I need to spit it out.

This finding, involving an expressed sequence tag application filed by the NIH, was published in 1992 and the NIH abandoned its application two years later. Based on USPTO examiner James Martinell's estimation at the time, a full examination of all the oligonucleotide claims in the EST patent would have taken until 2035 "because of the computational time required to search for matches in over 700,000 15-mers claimed."

According to Kepler et al., this comprises "roughly half the number of molecules covered by claim 5 of Myriad's '282 patent."

While improvements in bioinformatics and computer hardware have made sequence comparisons much easier than they were in the early 1990s, the study authors arrive at no conclusions about why the USPTO granted Myriad claim 5 in patent '282 and not NIH's EST patent.

The claim by the USPTO examiner is bizarre, to say the least. The claim is 55 years to analyze 700,000 15mers for occurrence in other sequences. This works out to testing about 40 oligos per day. What algorithm were they using??

To look at it another way, if you use 2 bit encoding for each base, then the set of all 15mers can be described by 2^30 different bitstrings -- potentially storable in memory of the 32 bit machines available at the time (which can, of course, address 2^32 words of memory). Furthermore, this is a trivially splittable algorithm -- you can break the job into 2^N different jobs by having each run look only at sequences with a given bit prefix of length N. When I started as a grad student in fall 1991, one of my first projects involved a similar trivial partitioning of a large run -- each slice was its own shell script which was forked onto a machine.

Furthermore, anyone claiming that a job will take 50+ years really needs to make some reasonable assumptions about growth in compute power -- particularly since 64-bit machines were becoming available around that time (e.g. DEC Alpha). Sure, it's dangerous to extrapolate out 50 years (after all, progress in Moore's law from shrinking transistors will hit a wall at one atom per transistor), but this was a ridiculous bit of thinking.

No comments: