Wednesday, March 02, 2011

Emulsion PCR: First Notes

One theme in some of the comments on my Ion Torrent commentary has been around the limitations of emulsion PCR. Some have made rather bold (and negative) predictions, such as Ion Torrent dooming themselves to short read lengths or users being unable to process many samples in parallel without cross-contamination.

Reading these really drove home to me that I didn't understand emulsion PCR. I've done regular PCR (once in a hotel ballroom, of all places!) but not emulsions. It seems simple in theory, but what goes on in practice? My main reasoning was based on the fact that emPCR is the prep method for both 454 and SOLiD; 454 clearly demonstrates the ability to execute long reads (occasionally hitting nearly a kilobase) and SOLiD the ability to generate enormous numbers of beads through emPCR. I also have a passing familiarity with RainDance's technology (we participated in the RainDance & Expression Analysis Cancer Grant program). But, I've also seen a 454 experiment go awry in a manner which was blamed on emPCR -- a small fraction of primer dimers in our input material became a painful fraction of the sequencing reads. Plus, there is that temptation to enter the Life Tech grand challenge on sample prep, or attempt to goad some friends into entering. So it is really past time to get more serious about understanding the technology.

So, off to the electronic library. Maddeningly, many of the authors in the field tend to publish in journals that I have less than facile access to, but between library visits, PubMed Central and those that are in more accessible journals, I've found a decent start.

First, the easy stuff. Emulsion PCR (or BEAMing, as one early group called a variant of it), involves generating a water-in-oil emulsion, in which the aqueous phase contains all the components for PCR amplification. For second-generation sequencing sample prep, the aqueous droplets also contain a solid bead with at least one primer type bound and also a single initial template DNA molecule. Well, that's the ideal, though in reality a population of droplets will have some with no DNA, some with one, and some with more than one; only the solitary templates will be useful. This is driven by Poisson statistics and is why it is important to know the concentration of a DNA sample going in, in order to set up the most favorable generation of single-template droplets.

Droplet sizes and beads span a wide range. Comparing the two required dredging up the formula for the volume of a sphere (4/3 pi r cubed), which I hadn't needed since high school, as well as working with some hairy exponents (by happy chance, TNG is just learning about exponents so I could drive home the point that these are useful, since I'm currently using them!). It's also handy to know that a cubic millimeter is the same as one microliter. Please be kind in correcting any math errors I make: I've tried hard to get them right, but this is all stuff I've been able to allow to rust since at least my undergraduate days.
[Note: corrected the mistyped sphere formula after this was pointed out by a commenter]

Each droplet has all the reagents which will be available for the PCR run. This has two important implications.

First, the size of a droplet and the concentration of the reagents determines the length of the DNA which can be amplified. Imagine the PCR actually went to completion and consumed all of the primers. Clearly, the amount of nucleotides is fixed so the number of nucleotides in the droplet divided by the number of primers is the absolute limit on the length of the amplifiable DNA -- to go to completion. In reality, you would never want the nucleotide concentrations to go anywhere near exhaustion, since the polymerase reaction kinetics will become unfavorable and more errors will be made if the four bases are not depleted uniformly.

In the real world, if a template is larger than this maximum then it will stop amplifying prior to exhaustion of the primers, meaning that the initial signal in sequencing will be lower and will fade to noise sooner. This is another key value Ion needs to reveal: how much DNA needs to be on a bead to achieve a given read length. That would enable groups both to model the whole process computationally as well as test sample prep methods by easy fluorescence measurements of the beads (under a microscope or in a flow cytometer) rather than the time and expense of sequencing (obviously, final proof of a functioning system requires sequencing).

The corollary to the above is that the size distribution of droplets in the emulsion is very important. If the size determines the maximum fragment size amplified, then a highly varying population will encompass a range of such maximum sizes. In the worst case, there may be some droplets which can amplify only very small products -- those pesky primer dimers. Some of the papers and reviews have claimed that emPCR is resistant to primer dimer amplification, but I've seen (and cursed!) the counterexample.

A final implication I hadn't considered before is the volume. We are used to packing lots of molecules in solution into a small space. For example (leaping onto the thin ice of my arithmetic), if a basepair weighs 660 daltons then 100 pg of a 200 bp amplicon would be 4.55e07 molecules of DNA. For the 318 chip delivering nearly 8e06 reads, this is the order of magnitude of the right amount of input DNA (assuming I haven't botched anything) -- if there were no losses. In conventional work, those 100 pg could easily be contained in a microliter of buffer.

But, suppose ten million droplets of ten nanoliter mean volume are generated instead. Not accounting for the empty space between packed spherical droplets, that would be 10 milliliters of emulsion. It's not surprising to stumble on a Life Technologies patent for emulsion PCR in flexible pouches; the volumes can get much two big for typical molecular biological tubes.

Now, I've picked some numbers above to be easy and reasonable, but one approach is to use smaller droplet sizes. RainDance routinely uses droplets in the 17 pL range, or about 1/50th my example above. Still, the volumes can be sizable.

The simplest approach for emulsion generation is mechanical agitation. This is what the Ultra Turrax instrument recommended for the PGM does, with the added bit that it performs agitation in a sealed chamber to prevent aerosol generation and cross-contamination. By the looks of it, the EZBead system for SOLiD also relies on mechanics.

The problem with these mechanical approaches is the non-uniformity of the emulsion droplet sizes. The not-very-nice moniker used in the literature for these methods is "shake and bake".

An alternative to mechanical generation is microfluidic droplet generation. I found a bunch of interesting papers from Richard Mathies' lab on the subject (and did Europe's premier science journal really embarass itself as that site suggests?) as well as a publication from Rothberg and colleagues at RainDance. With microfluidic generators, very tight control on droplet size is possible. The RainDance paper is slick in that PCR was run on the same chip whereas Mathies has published systems that have beads-in-droplets. Mathies has demonstrated amplification of nearly kilobase sized products to 50-100 amol on beads; they cite the original RainDance publication as achieving 10 amol (or 6e07 molecules) on beads. This gives a ballpark figure of what Ion might require, though it's clearly not much better than a rough guess.

The RainDance paper describes generating 1.8 million droplets per hour and then amplifying them with 34 cycles of microfluidic PCR in just over half an hour. That sounds great -- until you remember that the 318 chip would need somewhere north of ten times that. So such a chip might be a solution to sample prep for the 314 and 316 chips but not the big boy -- not if you crave speed. The RainDance paper also used only 50um diameter droplets (65 pL); whether this would comfortably fit the Ion Torrent beads is a very important question (the size of those beads is yet another key value that needs to be made public!). The Mathies paper generated much larger droplets (2.5nL, or about 38 times larger) at a much slower rate of 21,600 per hour. That would make for slow prep even for the 314, though a later paper packed 96 of these generators onto a single chip to yield 3.4 million droplets per hour.

Microfluidic emulsion generation, and perhaps emPCR on the same chip, would have a number of attractive features -- automation, low risk of cross-contamination in the emulsion generation stage (emulsion breaking is still in a centrifuge, which requires care), and very homogeneous droplet sizes to minimize preferential amplification of short fragments. The key question is how much these disposable microfluidic chips would cost and how much for the instrument to drive them. Most of the microfluidic devices I know of that have been commercialized are not cheap and start above the PGM's price ($80K-$200K. If the chips cost a few hundred each, that would add substantially to the cost of a PGM run.

Still, it might be an interesting premium option that some labs would prefer, given the automation and quality benefits. Also, perhaps my reference microfluidics instruments bear very high prices -- and this is potentially an application in which such devices would be produced on a large scale which could bring prices down. The same for the chips; perhaps with mass production the unit costs could be reduced. But, I'm not terribly optimistic: any microfluidics products are going to be very high precision devices.

One final comment: it's unfortunate that the three different sequencing platforms have three different bead systems. That's perhaps inherent in the execution of their technologies, but if there were more uniformity it could encourage third parties to jump in with products aiming for all three platforms.

12 comments:

  1. Nice introduction, I'm new in the field as well. Don't miss the Nature Protocols' special on the "the science of mayonnaise" from 2006:
    http://www.nature.com/nmeth/journal/v3/n7/index.html

    I've tried the protocol in Williams et al and it works well, but as you pointed out: The reaction volume becomes a real issue for high copy#.

    ReplyDelete
  2. ARRRGH! 4/3 pi r cubed was in my notes (and what I rattled off for TNG) but I apparently couldn't type it straight.

    ReplyDelete
  3. We have a 454 Jr and the short read 'spikes' (half the reads being ~60 bp) have been a sporadic, but significant problem. Half our amplicon runs to date have been affected. The way to alleviate it is to clean the PCR product and clean again, multiple ampure bead extractions with a relatively long base pair cut off help a lot. Even then, primer dimers undetectable by gel or high sensitivity bioanalyser DNA chip can still cause the issue (it is a 50 cycle PCR after all).

    The Ultra Turrax, in practise, also has its downsides (apart from being slightly annoying to program to the correct RPM and time), it is possible for the tube not to engage with the drive properly, the emulsion will start to be produced, turn milky, but not actually continue to run. So, walking away a leaving it to run you may miss the fact that the emulsion wasn't correctly made (milky being milky whatever size your droplets)! Luckily I sit in front of ours whilst carrying out the rest of the steps, so can keep an eye. It does take a firm hand to make sure it's properly engaged though.

    Finally, our amplicon is towards the upper end of the length of the stated maximum (600bp off the top of my head) for sequencing on the Jr, so we've been concerned about improving the emPCR to favour the longer amplicons. Roche have released an update to the emPCR methodology recently for long (and short) amplicons for FLX, though it doesn't quite work for the Jr kits as there isn't enough primer supplied. When it comes to the promised 1000 bp reads it'll be interesting to see what other changes are made as the emPCR is going to be the key step here.

    In summary: simple in theory, but a bugger to optimise!

    Mike

    ReplyDelete
  4. I've just been reading senior theses about optimizing emPCR for the Ion Torrent. Doubling the number of bases over the standard kit should be trivial. Achieving the advertised number of bases or quality may be worth $1M

    ReplyDelete
  5. so nucleotides/primers = limit of amplicon length?

    ReplyDelete
  6. The Ion Torrent Spheres are hydrogel beads about 2 microns in diameter, I believe.

    The Ion Torrent error rates are quite high still, particularly after the first 50 bases of the read.

    ReplyDelete
  7. In relation to classic PCR, here is an article on rolling circle replication. The process involves a circular plasmid that is nicked by an initiator protein and the 3′ end serves as a primer for unnicked strand replication. Replication proceeds around the remaining circular strand producing linear concatemers thereby amplifying the original signal many-fold. http://cbt20.wordpress.com/2011/03/06/pcr-amplification-circu/

    ReplyDelete
  8. Actually, my error rates have been better than anticipated. I'm not sure where people are getting their numbers. They are pretty tight lipped about the contents of the Ion Spheres.

    ReplyDelete
  9. As I understood it the 1kb "upgrade" will only be available for the GS Flx... not Junior. It seems they just cram more chemicals in there....

    ReplyDelete
  10. ten millions of ten nanoliter drops pile up to 100 mL, not 10mL

    ReplyDelete
  11. I like the accurate and briefly described information relevant to my search results. Thanks for putting up this post online.

    ReplyDelete