One theme in some of the comments on my Ion Torrent commentary has been around the limitations of emulsion PCR. Some have made rather bold (and negative) predictions, such as Ion Torrent dooming themselves to short read lengths or users being unable to process many samples in parallel without cross-contamination.
Reading these really drove home to me that I didn't understand emulsion PCR. I've done regular PCR (once in a hotel ballroom, of all places!) but not emulsions. It seems simple in theory, but what goes on in practice? My main reasoning was based on the fact that emPCR is the prep method for both 454 and SOLiD; 454 clearly demonstrates the ability to execute long reads (occasionally hitting nearly a kilobase) and SOLiD the ability to generate enormous numbers of beads through emPCR. I also have a passing familiarity with RainDance's technology (we participated in the RainDance & Expression Analysis Cancer Grant program). But, I've also seen a 454 experiment go awry in a manner which was blamed on emPCR -- a small fraction of primer dimers in our input material became a painful fraction of the sequencing reads. Plus, there is that temptation to enter the Life Tech grand challenge on sample prep, or attempt to goad some friends into entering. So it is really past time to get more serious about understanding the technology.
So, off to the electronic library. Maddeningly, many of the authors in the field tend to publish in journals that I have less than facile access to, but between library visits, PubMed Central and those that are in more accessible journals, I've found a decent start.
First, the easy stuff. Emulsion PCR (or BEAMing, as one early group called a variant of it), involves generating a water-in-oil emulsion, in which the aqueous phase contains all the components for PCR amplification. For second-generation sequencing sample prep, the aqueous droplets also contain a solid bead with at least one primer type bound and also a single initial template DNA molecule. Well, that's the ideal, though in reality a population of droplets will have some with no DNA, some with one, and some with more than one; only the solitary templates will be useful. This is driven by Poisson statistics and is why it is important to know the concentration of a DNA sample going in, in order to set up the most favorable generation of single-template droplets.
Droplet sizes and beads span a wide range. Comparing the two required dredging up the formula for the volume of a sphere (4/3 pi r cubed), which I hadn't needed since high school, as well as working with some hairy exponents (by happy chance, TNG is just learning about exponents so I could drive home the point that these are useful, since I'm currently using them!). It's also handy to know that a cubic millimeter is the same as one microliter. Please be kind in correcting any math errors I make: I've tried hard to get them right, but this is all stuff I've been able to allow to rust since at least my undergraduate days.
[Note: corrected the mistyped sphere formula after this was pointed out by a commenter]
Each droplet has all the reagents which will be available for the PCR run. This has two important implications.
First, the size of a droplet and the concentration of the reagents determines the length of the DNA which can be amplified. Imagine the PCR actually went to completion and consumed all of the primers. Clearly, the amount of nucleotides is fixed so the number of nucleotides in the droplet divided by the number of primers is the absolute limit on the length of the amplifiable DNA -- to go to completion. In reality, you would never want the nucleotide concentrations to go anywhere near exhaustion, since the polymerase reaction kinetics will become unfavorable and more errors will be made if the four bases are not depleted uniformly.
In the real world, if a template is larger than this maximum then it will stop amplifying prior to exhaustion of the primers, meaning that the initial signal in sequencing will be lower and will fade to noise sooner. This is another key value Ion needs to reveal: how much DNA needs to be on a bead to achieve a given read length. That would enable groups both to model the whole process computationally as well as test sample prep methods by easy fluorescence measurements of the beads (under a microscope or in a flow cytometer) rather than the time and expense of sequencing (obviously, final proof of a functioning system requires sequencing).
The corollary to the above is that the size distribution of droplets in the emulsion is very important. If the size determines the maximum fragment size amplified, then a highly varying population will encompass a range of such maximum sizes. In the worst case, there may be some droplets which can amplify only very small products -- those pesky primer dimers. Some of the papers and reviews have claimed that emPCR is resistant to primer dimer amplification, but I've seen (and cursed!) the counterexample.
A final implication I hadn't considered before is the volume. We are used to packing lots of molecules in solution into a small space. For example (leaping onto the thin ice of my arithmetic), if a basepair weighs 660 daltons then 100 pg of a 200 bp amplicon would be 4.55e07 molecules of DNA. For the 318 chip delivering nearly 8e06 reads, this is the order of magnitude of the right amount of input DNA (assuming I haven't botched anything) -- if there were no losses. In conventional work, those 100 pg could easily be contained in a microliter of buffer.
But, suppose ten million droplets of ten nanoliter mean volume are generated instead. Not accounting for the empty space between packed spherical droplets, that would be 10 milliliters of emulsion. It's not surprising to stumble on a Life Technologies patent for emulsion PCR in flexible pouches; the volumes can get much two big for typical molecular biological tubes.
Now, I've picked some numbers above to be easy and reasonable, but one approach is to use smaller droplet sizes. RainDance routinely uses droplets in the 17 pL range, or about 1/50th my example above. Still, the volumes can be sizable.
The simplest approach for emulsion generation is mechanical agitation. This is what the Ultra Turrax instrument recommended for the PGM does, with the added bit that it performs agitation in a sealed chamber to prevent aerosol generation and cross-contamination. By the looks of it, the EZBead system for SOLiD also relies on mechanics.
The problem with these mechanical approaches is the non-uniformity of the emulsion droplet sizes. The not-very-nice moniker used in the literature for these methods is "shake and bake".
An alternative to mechanical generation is microfluidic droplet generation. I found a bunch of interesting papers from Richard Mathies' lab on the subject (and did Europe's premier science journal really embarass itself as that site suggests?) as well as a publication from Rothberg and colleagues at RainDance. With microfluidic generators, very tight control on droplet size is possible. The RainDance paper is slick in that PCR was run on the same chip whereas Mathies has published systems that have beads-in-droplets. Mathies has demonstrated amplification of nearly kilobase sized products to 50-100 amol on beads; they cite the original RainDance publication as achieving 10 amol (or 6e07 molecules) on beads. This gives a ballpark figure of what Ion might require, though it's clearly not much better than a rough guess.
The RainDance paper describes generating 1.8 million droplets per hour and then amplifying them with 34 cycles of microfluidic PCR in just over half an hour. That sounds great -- until you remember that the 318 chip would need somewhere north of ten times that. So such a chip might be a solution to sample prep for the 314 and 316 chips but not the big boy -- not if you crave speed. The RainDance paper also used only 50um diameter droplets (65 pL); whether this would comfortably fit the Ion Torrent beads is a very important question (the size of those beads is yet another key value that needs to be made public!). The Mathies paper generated much larger droplets (2.5nL, or about 38 times larger) at a much slower rate of 21,600 per hour. That would make for slow prep even for the 314, though a later paper packed 96 of these generators onto a single chip to yield 3.4 million droplets per hour.
Microfluidic emulsion generation, and perhaps emPCR on the same chip, would have a number of attractive features -- automation, low risk of cross-contamination in the emulsion generation stage (emulsion breaking is still in a centrifuge, which requires care), and very homogeneous droplet sizes to minimize preferential amplification of short fragments. The key question is how much these disposable microfluidic chips would cost and how much for the instrument to drive them. Most of the microfluidic devices I know of that have been commercialized are not cheap and start above the PGM's price ($80K-$200K. If the chips cost a few hundred each, that would add substantially to the cost of a PGM run.
Still, it might be an interesting premium option that some labs would prefer, given the automation and quality benefits. Also, perhaps my reference microfluidics instruments bear very high prices -- and this is potentially an application in which such devices would be produced on a large scale which could bring prices down. The same for the chips; perhaps with mass production the unit costs could be reduced. But, I'm not terribly optimistic: any microfluidics products are going to be very high precision devices.
One final comment: it's unfortunate that the three different sequencing platforms have three different bead systems. That's perhaps inherent in the execution of their technologies, but if there were more uniformity it could encourage third parties to jump in with products aiming for all three platforms.