Thursday, October 13, 2016

DNA: How Clean is Clean Enough?

Nick Loman has a new blog post nicely covering the current state of affairs for DNA preparation in the field.  Earlier this year I had some thoughts about DNA preparation and the degree to which our current methods reflect history rather than some ideal.  Even before Nick's post I had some new thoughts on the topic, but seeing his exposition helped me consolidate my own musings.
A key question for DNA purification is how clean does the DNA really need to be?  There are all sorts of standards out there, but which really matter?  For example, I've tussled with a very good sequencing shop over excess RNA in samples we shipped -- several attempts to reduce the RNA in the sample failed to be enough, so we cancelled the project.  Would the project have succeeded with the RNA at any stage?  We'll never know.

Potentially troublesome contaminants in a DNA prep can be divided into two categories: those that came in from your sample and those you added during the prep.  In the first category would be RNA, protein, lipids, humic acids, polysaccharides, lipids and what-have-you.  The second category is dominated by alcohol and detergents, plus phenol and chloroform if your tastes run old-school.   Of course, anything going into the prep, from ions to chelators, could potential interfere downstream.

Which of these matter and how much?  Well, that's going to depend on the downstream assay.  Some are going to tolerate a lot of some contaminants but very little of others.  A few assays may be extraordinarily tolerant, while others could be hothouse flowers requiring the purest of pure.  But which are which for what?

To give two extremes from my experience. I've recently become more familiar with Nanostring's technology for profiling nucleic acids.  A true beauty of their approach is that it relies entirely on hybridization; no enzymes are involved whatsoever.  As a result, crude cellular lysates work quite nicely with Nanostring, greatly simplifying the effort to profile large numbers of samples.  At its base, Nanostring involves hybridizing two oligonucleotide probes adjacent to each other on the target molecule.  This complex is pulled down via a biotin tag, bound and aligned along an axis of the flowcell and finally imaged with a powerful microscope to read out target-specific barcodes.  In one version of the chemistry, the oligos themselves have the biotin and specific labels are on the other; in the other version (Elements) one oligo has a universal barcode to bind a generic biotin-bearing oligo and the other oligo carries a barcode to bind to a specific label-bearing oligo.  Between relying only on hybridization and the biotin pull-down step, Nanostring has a very limited number of vulnerabilities to sample contaminants.

At the other extreme, we once had a very annoying problem with some of our Actinomycete samples that was specific to Pacific Biosciences.  We would see sporadic sequencing failures on PacBio, which appeared to be due to poor DNA preps.  These would not repeat; make a new prep and glorious long reads would come out.  But there were a few strains which would fail time-after-time.  Worse, they failed late: by BioAnalyzer or other means they would appear to be yielding perfectly good libraries, but on the sequencer they yielded only a few short reads.  We worried a lot about what might be doing this, given that we know these bugs are highly versatile at making very interesting compounds -- including a lot of DNA-damaging agents.  Plus, Actinomycetes are full of known unusual DNA modifications (Streptomyces beat humans to using phosphorthioate linkages in DNA by millions of years!), so who knew what unknown modifications might be tripping up PacBio?  Diligent effort by our sequencing partners revealed that extensive washing of the libraries, after binding them to beads, would yield passable (but not exceptional) PacBio data.  Tracking down the culprit would be an arduous task with uncertain success, so we didn't pursue it. Another enigmatic feature of these preps: they sequenced just fine on Illumina using conventional shearing preps (don't know if Nextera would be bothered; we never tried). 

There are two solutions to the more general problem: either tune your DNA preparation methodology to deliver clean-enough DNA or engineer/evolve your enzymes to tolerate not-so-clean DNA.  There are a number of polymerases for PCR which are marketed by brands such as Kapa as being tolerant to such notorious contaminants as heme.  Since PCR involves only a single enzyme, that's somewhat easy to imagine, though it still must be long (but interesting!) exercise in directed evolution. For various sequencing platforms, a whole series of enzymes might be involved in a prep and the actual sequencing.  Some contaminants might be lost to dilution or in bead purification steps, so the relative risk is concentrated on the early enzymes -- but as my tale above of the DNA that resisted being SMRT illustrates, problems could arise anywhere.

If my readers know of papers on this topic, I'd be interested to hear of them.  The original transposase library prep paper claimed success at making libraries with crude colony preps, but this seems to have never been repeated.  Oxford Nanopore touted sequencing out of crude blood preps in their 2012 AGBT presentation and periodically refreshes the claim, but has yet to actually demonstrate this (or more importantly, have someone outside the company succeed).

Of course, a serious hindrance to exploring this is the high cost of library preparation and sequencing.  It is easy to imagine laying out all sorts of dilution series of contaminants within a 384-well plate, but if your library preps are $100 a pop I just spent (on paper) nearly $40K before hitting the sequencer. There are many claims of lower cost library preps out there, but even at a tenth that price this is still a somewhat pricey experiment.  A very worthy experiment, but who will take it on?  Furthermore, at a minimum such an experiment would need to be run for each sequencing platform --- and variants therein; ligation libraries on HiSeq X with exclusion amplification could well have a very different pattern of vulnerabilities than Nextera libraries on MiSeq with bridge PCR.  Such tests would probably need to be repeated each time chemistry changes, which is several times a year for Oxford Nanopore.  All this suggests that it is the sort of project which either must be undertaken by the manufacturers themselves, or by large laboratory consortia such as ABRF.

Some of this also depends on who is doing the library prep.  As detailed in Nick's note, there are a number of early-stage portable devices for purification.  Automated purifications should show much less sample-to-sample variation than manual preps.  On the other hand, depending on the reagents involved, variation there might still be an issue -- or just getting the reagents when you need them.  Nick cites the example of pure ethanol, which isn't legal to fly on planes and isn't widely available in the field.  Would vodka work  -- which would really solve the flying problem as you could buy it from your attendant? Is there too much variation in consumer high-proof alcohols to work?

Even if automation solves the manual variation issue, it won't directly solve the sample-specific contaminant issue. For a platform such as Oxford Nanopore, that's going to mean being tested with field-quality preps from a very wide range of sample types -- blood, soil, plants, feces, sludge, pond scum and all the other exciting places to explore biology.  Vetting a platform's tolerance for DNA sample preparation contaminant isn't the most glamorous space for sequencing R&D, but the potential payoff for enabling field use is enormous.

1 comment:

Nick Loman said...

Thanks Keith for the post - thought I'd drop a link to the blog post in case people wanted the necessary context!