Tuesday, January 29, 2019

Patent Dive: Genapsys

Here's a dangerous statement for me: I actually enjoyed reading some patents recently.  Now, before you get any ideas in your head about suggesting more patents for me to read, let me be clear that these were unusual patents -- they're written to be read! -- and were read under strict conditions. The patents in question are from Genapsys -- found via my good friend Justia.
I've done a few pieces on patents in the past, driven by various patent squabbles between sequencing vendors -- Illumina vs. Oxford Nanopore and Pacific Biosciences vs. Oxford Nanopore.  Those quickly become not fun.  Why?  Primarily because I am trying to compare patents, search for prior art and in particular figure out precisely what each patent is claiming.  That's hard work, tedious and requires understanding minutia of patent law -- which makes me feel guilty trying to write anything since I am not a trained expert on patent law.  Plus patents tend to be written in a distinctive dialect that comes across as stilted and annoying to read.

But these Genapsys patents I was reading to just get windows into their thinking and the possible configuration of their sequencing box. A thought about prior art or possible encroachment on others' intellectual property might pop into my head, but it isn't the key focus  -- which makes reading much more relaxing.  And these patents were made to be read!  Sure, they still have "particular embodiment" and some of the other linguistic cruft required of the genre, but large portions were designed to enlighten.  Yeah, there's some extreme silliness -- or maybe not silly enough.  One section described portability and different forms of transportation one might sequence in -- I really wish it had ended "In a boat! With a goat! In the rain! On a train!"  But most importantly, they also explored a lot of ideas in useful detail on topics I now realize I hadn't thought enough about. 

Now just to make the ground rules clear, I'm trying to keep the reading fun.  So that means not obsessing over every detail, not trying to exhaustively compare the different patents and as noted before pretty much completely avoiding any thought about prior art or where it might overlap other patents.

The patents explore a wide range of topics and it is clear the Genapsys team is attempting to mark a lot of territory -- single molecule approaches (complete with sequencing circular templates, so treading on PacBio's CCS patents), terminator chemistries and more.  But given that there are sketches of the Genius instrument and most time spent on other topics, there is clear direction to where they are probably going -- which jives with what I remember from their 2014 AGBT talk.

The instrument will use direct electronic sensing of extension reactions using native nucleotides ala Ion Torrent - or sensing the extension products via their accumulated charge.  Or perhaps having modified nucleotides with something removable which would enhance their charge signature.  Templates will be on magnetic beads with clonal populations prepared using an isothermal scheme of virtual confinement using electric fields.  I'm still trying to wrap my head around this one, but I think it is roughly using pulsed field electrophoresis to confine PCR products to their respective beads.  The instrument will probably have DNA shearing, library preparation, bead enrichment (for beads with polonies as opposed to lacking them) and sequencing all in one box -- actually all on one flowcell.  So squirt purified DNA in and get data out -- that is at least the grand vision if not version 1.0.  The flowcells may be either field reusable or able to be sent back for reconditioning.  A graph in a patent shows signal linearity for homopolymers out to 35 bases, which would be amazing.  Reagents will probably be in flexible bags which can be coupled onto the flowcell, with tiny needles piercing the bags.  

The fun stuff in the patent begins with detailed discussion of dephasing and how to deal with it.  Dephasing -- the loss of synchronized behavior across the molecules in a clonal population -- is what limits read lengths in any clonal sequencing system.  As detailed in the patents, there are two major types of dephasing: lagging dephasing in which a molecule fails to extend when it should and leading dephasing where a molecule extends inappropriately.  Each has different causes and some solutions will be specific to one or the other while other solutions may deal with both. 

For example, one cause of lagging dephasing could be that in the allotted time no DNA polymerase is in the vicinity of the extendable end.  A solution proposed in the patents is to "depot" polymerases on the DNA.  For example, if a random oligo which cannot be extended is hybridized to the single stranded region, it will bind polymerase but can't be extended.  So in this way, polymerases can be parked on the DNA waiting for an extendable end to be near.  This obviously does require the use of a strand-displacing polymerase.

Another cause of lagging dephasing could be simply too much jostling of polymerases -- if the DNA is attached to the beads too closely, then significant steric hindrance could come into play, particularly during early extension reactions.  The patents talk of using a physically larger DNA polymerase during attachment to the bead so that the strands are separated sufficiently for a later, smaller DNA polymerase to be comfortably used during sequencing flows.  Interestingly, they cite a 10-fold difference in physical size among known DNA polymerases. Alternatively, they discuss using other large DNA-binding proteins to bring some order to the DNA attachment phase. 

One mechanism of leading dephasing is simply incomplete washout of a previous nucleotide.  Another source can be incorrect incorporation of a mismatched nucleotide.  One solution proposed for this is to provide the polymerase more choices, as misincorporation events represent a failure to discriminate that is made more likely by presenting the polymerase with only a single type of nucleotide. However, the decoys can't be extendable, since for direct electronic detection all nucleotide incorporation events look the same.  So they would be dideoxys or other terminator chemistries.

I mentioned in my piece on possible new entrants (and thank you for comments or emails noting ones I should look at further or should have highlighted) I suggested that Genapsys might need some file formats beyond FASTQ -- and perhaps even more sophisticated than the Standard Flow Format (SFF) often used with 454 and Ion Torrent.  That idea is because of some of the other schemes which are proposed for dealing with dephasing.

Ideally there would be a way to re-phase all the molecules.  That's a tough problem, but what about a partial solution?  One approach proposed in the patents is to periodically flood the system with three of the four nucleotides.  That would advance all molecules on the flowcell to just before the next instance of the missing nucleotide -- except of course a few perverse cases of new leading or lagging dephasing.  But now each read will have a region of undefined sequence of undefined length -- in some ways harking back to PacBio's strobe sequencing -- ending with a known nucleotide.  That's something valuable to capture in the output so that some future tools could understand what really happened.

In case that wasn't clear, and it took me a few reads to get it, imagine you have the sequence ACGTGCATC  and a flow order of A,C,G,T.  In that case, we'd have read the four bases I bolded and be waiting on that G -- except for some laggards back on the prior G and some leaders already on the next G.  If we flood the chamber with T, G and C then every polymerase should advance up to just before that italicized A.  Now, if another sequence is ACGTAGCAT it won't be quite as rosy -- the bulk population can't advance and any laggard catches up to it but any leader will race ahead to the second italic A.  So no free lunch, but a huge improvement.  And it requires no additional chemistry or plumbing -- just open three valves instead of one.

I'd like to note hear a certain light bulb moment.  Ion Torrent in particular talked about using specific orders of nucleotide flows -- not just A, C, G, T, A, C, G, T -- in order to correct for dephasing.  I never understood that -- now I do.  Of course it won't fix everything, but by picking particular orders -- especially if you know that specific leading dephasing events are more likely -- you can get some of the benefit of the above scheme by just mixing up the order.  

There are a few solutions proposed that require additional chemistry but also would generate multiple segments of sequence per template with some relationship between them.  With certain 3'->5' exonuclease activities in the presence of nucleotides the chewback will proceed until reaching a nucleotide present in the mixture.  So this is similar to the flooding idea but in the reverse direction -- and now instead of a variable length region of unknown sequence one gets a variable length walkback ending in a known nucleotide.  Another described way to accomplish this is to sometimes perform the sequencing reaction with phosphorothioate nucleotides.  So if you had a round of phosphorothioate C, a later exonuclease reaction could walk back to that C.  

An interesting extension of the walkback idea -- particularly with phosphothioate nucleotides -- would be in the context of PCR panels.  For many cancer mutation hotspots, the region of interest is very small - often a single nucleotide.  If one designed primers carefully, the specific hotspots could all (or mostly) be within a certain distance range of the primers.  Phosphorothioate rounds just before that region would enable reading the region, digesting back to the phosphorothioate, and then reading the region a second time (and even a third and a fourth...) to enhance accuracy.

Yet another idea are clamps: probes that would bind the DNA and block further elongation -- either via use of a polymerase lacking strand displacement activity or somehow making the clamp resistant to displacement.  Use of such a clamp would allow extension reactions to occur until the clamp is reached, followed by removal of the clamp to restart all the reactions in perfect synchrony.  It's a little tricky to see this used in practice -- after all, you must somehow want to read something, then know some conserved site to sit the clamp on and then have something after that worth reading too.  

The patents describe yet another way to improve accuracy, which would be to melt off the synthesized material and add a new primer and repeat the run -- but perhaps altering run parameters.  That could include different orders of nucleotide flows, different timings of re-phasing approaches or even just running the chemistry differently.  They comment that the ideal ionic conditions for the polymerase may be very different than the ideal ionic conditions for their detection hardware.  An interesting bit related to this is that keeping ionic strength low may be challenge for buffering -- buffers add to ionic strength.  So instead of buffering with a typical salt -- such as TRIS HCl, they propose buffering with TRIS-HEPES (I've long since forgotten my buffering equations, so I won't go further).  Curiously, while they describe atmospheric CO2 as a major source of acid in need of buffering, I don't see the idea of running reactions under a neutral gas (such as dry nitrogen) mentioned. Another little charge issue -- they are very concerned with controlling charge to maximize sensor sensitivity -- is that on the surface of the beads.  Negative charges on the beads help keep DNA from sticking to it, but too much negative charge interferes with sensing.

Just to emphasize again: there are no free lunches.  Many of the above require additional reservoirs, pumps and plumbing and will add time to generating data -- and the issue of capturing all these quirky nuances of runs.  But for clinical applications getting a read length or accuracy edge might be valuable.  

Off to a different direction.  Ion Torrent's progress towards a whole human genome chip stalled out with the S2 chip when they couldn't get the data off the chip fast enough.  The Genapsys patents discuss synchronizing data acquisition with movement of the liquid through the flowcell.  They also discuss various ways of incorporating sample flow information into the basecalling algorithms, explicitly modeling cross-talk between beads.  There's also discussion of potential value of having some null beads on the array, beads with control sequences or even beads that are extended on every flow.

Just reading the patents brings a new appreciation for the huge number of possible confounding factors in devising a practical sequencing chemistry.  And then they throw in other stuff -- the possibility of using the same detection electronics for reading out immunoassays or running qPCR.  They also touch on the possibility of controlling DNA synthesis or using the sensors for DNA-based data storage.

There's also a lot of attempts to cover wide ranges of possibilities.  A huge range of microfluidic possibilities are covered, including using the electronics to deliberately generate hydrogen or oxygen bubbles to cap wells or possible microdroplet schemes. Or PDMS valving.  And maybe using gels to retard fluid movement.  Or maybe using D2O so that the hydrogen (well, deuterium) ions diffuse more slowly. 

Or going back to detection chemistries -- some are much like Ion Torrent and detect transient changes in pH, but they also describe using a redox-active surface that would be reduced by a DNA extension event and then could be re-oxidized by applying the appropriate charge.  That, and some other schemes, would have a stable signal which could be detected -- and one possibility suggested would be to run the extension reaction under one set of ionic conditions and the detection under a different set, each optimized for the particular step. 

And much more -- I've pretty much ignored the actual electronics schemes because that was never my forte.  And I'm sure I've mangled a few things and probably a number of my predictions of the box won't pan out.  But it's been fun -- dangerous fun, as I've started reading patents from some other startups.  But those are topics for some future post.

No comments: