Saturday, February 01, 2020

A qPCR (aka RT-PCR aka rRT-PCR) Explainer

I've gotten in a number of Twitter threads and seen a lot of Quora questions about the qPCR test for the Wuhan coronavirus that I realized would really be best handled by writing an explainer. I'm intending it for financial types, reporters and anyone from the lay public interested in learning a bit more.   For most regular readers of this blog, there won't be anything new to you.  If you'd check me for accuracy, I'd be grateful but perhaps many will skip over this one.  That also means I going to try to resist my usual urges to make lighthearted references to popular culture; they're a good way to be confusing.

What Does qPCR Stand For?

qPCR is quantitative Polymerase Chain Reaction, a variant of Polymerase Chain Reaction (PCR) designed to determine the amount of a DNA or RNA target in a sample.  It is also known as RT-PCR, for Real Time PCR -- but as noted below there is another technique called RT-PCR which is confusing.  

How Does PCR Work?

PCR is a method enabling an exponential amplification of a targeted DNA.  It works by repeatedly heating and cooling the DNA in the presence of two short DNAs called primers that specify the region to be amplified, an enzyme called polymerase, the four nucleotides found in DNA, and some other chemicals required to keep the polymerase happy. called a buffer  The common components -- polymerase, the nucleotides and buffer -- are often supplied pre-mixed, a "master mix".  Each time the mixture is cycled, the number of copies of the targeted region doubles.  PCR is often run for 30 to 40 cycles; with 30 cycles each targeted DNA region in the input material should have generated over a billion copies.

At the beginning of each cycle, the sample is heated to nearly boiling to cause the two strands of DNA to separate.  The sample is then cooled to allow the primers to bind according to Watson-Crick base pairing; this is called annealing.  In the third phase of the cycle, the polymerase extends the primers by adding the correct nucleotide given the nucleotide on the opposite strand.  Then the next cycle begins.

How Do You Do All That Heating And Cooling?

PCR is usually run using a special instrument called a thermocycler.  Most thermocyclers use Peltier elements which are related to thermocouple, which consist of two wires connected at each end, with each wire made of a different metal.  If you run a current through the wires, one junction will become colder and the other hotter.  The same technology is used in some portable refrigerators.  It also works the other way: apply heat to one junction and cold to the other and power will be generated.  This is how NASA has powered probes that cannot use solar energy, such as all probes to Jupiter and beyond.
There are thermocyclers aimed at hobbyists for a few hundred dollars.  Professional thermocyclers generally cost several thousand dollars, but various useful optional features can drive that price up.
[2020-02-7 -- per comment, fixed my conflating of thermocouples and Peltier elements.  My father worked on Peltiers; he'd be annoyed I was imprecise]

Why Doesn't the Polymerase Get Cooked by All That Heating?

The first PCR reactions required new polymerase to be added after each round, which was tedious and expensive.  A huge boost to PCR taking off was the purification of Taq polymerase, which comes from the bacterium Thermus aquaticus, which was isolated from near Old Faithful in Yellowstone National Park.  Taq polymerase can survive the repeated heat cycles.  Some PCR enzymes came from microorganisms living near deep sea vents where the water can be above what would boil on land (the pressure of the deep ocean prevents it).  Some PCR enzymes today are genetically modified versions of these, engineered for various superior properties.

What's Different Between qPCR and Regular PCR?

In standard  PCR, we just see the final result.  This isn't very easy to quantify, because for a number of reasons the later cycles of PCR may not fully double the mixture -- the reactions may max out or plateau.  qPCR adds to PCR the ability to watch the reaction and see how much double-stranded product is present in each sample at the end of each cycle.  If we plot the amount of product versus the cycle number, it forms a characteristic curve, and that curve can be used to derive the amount of DNA present in the original sample.  

Because of the addition of optics to watch the reaction, thermocyclers for qPCR are more expensive -- the least expensive ones appear to be about six thousand dollars and more typically they will be twenty thousand or more.  Again, there are different variables that affect this -- how many samples can be run at a time, is it designed to work with laboratory robots and so forth.

How Do You Watch the PCR in qPCR?

There are two basic types of qPCR, one using two primers just like regular PCR and one adding a third short DNA piece called a probe.

In two primer PCR, the master mix also supplies a dye, typically one called SYBR-Green.  The dye will bind to double-stranded DNA but not single-stranded DNA, and furthermore its fluorescence is much, much stronger when bound to double-stranded DNA.  So at the end of each cycle, the amount of fluorescence is related to how much  product has been made.  The downside is that any incorrect products, such as when the two primers meet up and make double stranded product (called a primer-dimer) will also be counted.

The scheme with two primers and a probe is often called TaqMan, which is a trademark of one manufacturer but also a reminder of how it works -- that it rhymes with "PacMan" is no accident.  In the annealing stage, not only will the primers bind to the target DNA but also the probe, which is situated so that extending one of the primers will cause Taq polymerase to bump into the probe.  The probe has two modifications, a dye which will fluoresce and a quencher that stops it from doing so if the quencher is very close to the dye.  So the probe will not fluoresce when you set the reaction up.  But when Taq bumps into the probe, it has the ability to destroy the probe DNA in front of it just like a snow blower chews up a snow drift.  Or a PacMan eats the dots in the maze.  Taq destroying the probe releases the dye and quencher, and now that the quencher isn't always nearby the dye can fluoresce.  Because the probe is specific to the target, non-specific products don't generate signal -- hence this is a much more specific assay.  But the probe is extra cost, so if not needed some will dispense with it.

How Are Primers and Probes Designed?

Given a target to amplify by PCR, there are computer programs to design the primers.  This has been going on for several decades, so the programs are extremely good at generating primers that work the first time.

A catch with a diagnostic for something like the Wuhan coronavirus is that the virus accumulates mutations and a mismatch from a mutation could cause the qPCR to fail as the primer or probe wouldn't bind.  That risk can be minimized by first comparing many coronavirus sequences to attempt to identify regions that don't tolerate mutation.  Second, one can actually put a mix of primers in the reaction, each primer covering a slightly different variant in the same region.  Third, one can design and run multiple primer sets - one might fail but it is unlikely they all will.

PCR Works With DNA But Don't Coronaviruses Use RNA?

The coronavirus test is technically a reverse transcriptase-qPCR.  RNA from the sample is first copied into DNA form by an enzyme called reverse transcriptase.  These come from a number of rare retroviruses; while coronavirus is an RNA virus it is not a retrovirus because it never copies its genome into DNA.

While there are some engineered and natural enzymes that can serve as both reverse transcriptase and the PCR polymerase, usually these are two separate enzymes. But, they can be put in the same master mix -- the first cycling of the reaction kills the reverse transcriptase.

[added 2020-02-01 22:52]
Because it uses reverse transcriptase and is also called Real Time PCR, or RT-PCR, the CDC is referring to the test as rRT-PCR.  That's their call, but it risks confusion as RT-PCR  the RT can stand for Reverse Transcriptase or Real Time.  I like qPCR because it emphasizes it is quantitative.

How Hard Are qPCR Kit Components To Manufacture?

PCR and qPCR are used heavily in the worldwide biotechnology market as well as in other spaces such as DNA forensics. A number of suppliers worldwide can make PCR primers and the probes required for 3-primer qPCR.  For the primers themselves, in the continental U.S. and much of Europe one can order a primer one afternoon and receive it the next day by package express; the probes often take longer due to their special chemistry but still under a week.  The capacity of these manufacturers is enormous and the amount of primer required for each assay is quite tiny; there is effectively unlimited capacity to make the components of qPCR assays very quickly

This is a powerful aspect of qPCR assays.  The specifications for an assay -- the sequences of primers and probes and the thermocycling recipe -- those can all be transmitted electronically around the world.  The widely distributed manufacturing capacity means that assay components can be made in many places.  qPCR thermocyclers are quite common in molecular biology and diagnostics lab, so there is a lot of infrastructure to run the assays.  So a qPCR assay can be created very quickly once the virus sequence is known and distributed worldwide almost as quickly.

How Difficult Is It To Run qPCR?

qPCR assays are very easy to run -- mix the correct volumes of master mix and sample and pick the right thermocycler program, then analyze the data when it comes off.  I'm a biologist who works entirely with computers in part because I'm not very good in lab -- but I've successfully run qPCR!  It isn't uncommon now for high school labs to have thermocyclers; setting up and running qPCR is really no harder than than for regular PCR.  A medical technician or pathologist who has never set up a qPCR can be very quickly trained to do so, so there should be no shortage of individuals qualified to run these tests.

How Much Might A qPCR Test Cost?

I've done a rough back-of-envelope estimate that suggests a two primer assay should cost less than a dollar a PCR reaction for materials -- but remember, a full clinical test may have multiple PCR reactions.  The probe adds in probably a dime or so to that.  So qPCR can be a relatively inexpensive assay in terms of the ingredients; the time of the personnel running and interpreting the assay are not included here.

How Long Does a qPCR Assay Take?

The exact time required for each cycle is determined by the length of the PCR product and some characteristics of the thermocycler. Multiply that by the number of cycles, add in some setup time, and that's the time it takes to generate the data.   In general, it should be possible to keep qPCR cycles well under a minute by designing for very short amplification products.  So a qPCR test should be able to be performed in the lab in around an hour.   But, I failed when first writing this to account for the time to heat up and cool down the sample, which adds up to a lot  [some math below in the section on the CDC assay details], so 4-5 hours to run the test may not be unusual.

But in a clinical setting, there may be other requirements to make sure that quality data is generated and that the correct data is attached to the correct patient file.  Those checks and re-checks can add additional time.


How Sensitive Are These Tests?  What Could Generate A False Negative? A False Positive?

In theory PCR methods can detect a single molecule of the target in the sample.  In real life, that is very hard to routinely achieve.  

In a clinical setting, there can be many things that could interfere with the assay, so properly extracting material from the sample can be essential.  For example, standard Taq polymerase is inhibited by the red component, heme, of red blood cells.  Some polymerases have been engineered to reduce this problem.

Good laboratory technique is critical for consistent and accurate results -- one doesn't want a false negative from a failure to add the sample!  But also, good technique is critical for preventing false positives.  A PCR reaction generates millions of copies of the target; should these be carelessly handled they could get into new samples.  Just opening a PCR reaction at the end of a run could spray DNA into the air where it could travel to other samples.  So rigorous labs have separate areas, often with separate ventilation systems, for setting up the reactions and for actually running them.  These often have a wall separating them with a tiny pass-through area with doors on either side, acting somewhat as an airlock.  There are also products to destroy DNA on surfaces and UV lights can be used when people are not present to destroy DNA on surfaces.

[added 2020-02-02 8:20]

What Are Some Caveats to qPCR Tests?

It is worth noting that a qPCR test detects RNA, not active virus.  The reasonable assumption is that virus RNA is active RNA, as RNA tends to be attacked by enzymes in your body which are for the very purpose of destroying foreign RNA. 

Similarly, qPCR assays cannot be expected to say anything about the virulence of virus in a given patient.  First we don't know enough and second qPCR isn't giving much information.  For that we need full DNA sequencing, which is more expensive, requires more skill and takes longer.

[Addition after initial release]
The CDC's test protocol gives a number of other reasons tests might not perform correctly: improper sample collection, improper materials in the swabs used (synthetic, not cotton is recommended) ,  freezing a sample, poor sample storage and so forth.

How Many Tests Could A Laboratory Run?

Many qPCR thermocyclers run either 96 or 384 reactions at a time.  Divide that by the number of PCR tests per sample and subtract out some for running controls (very important!!!) and you can estimate how many tests per run.

In real life, you can't just multiply that by 8 hours and get tests per operator -- there's time to set up the reactions, decontamination and so forth. It also depends on what automation there is for extracting samples.  So in the end this is like many processes -- how many operators, how many qPCR instruments and how many samples per qPCR instrument.  But qPCR is potentially a very high throughput testing method, which is another plus for testing large numbers of samples.

[2020-02-01 22:57 addition]
As was pointed out by a professional virologist on Twitter, further attrition can be expected from the ideal numbers due to the need to rerun samples from failed runs or samples that generated odd results.  The number of patients tested may be less than the number of samples due to multiple samples from a patient, such as an initial sample and then later follow-up samples.

[Second addition]

What Sorts of Controls Might a qPCR Assay Use?

Caveat: I haven't looked at either the BGI or CDC actual assays.  But here are the sorts of controls that a good design would include.

A Negative Control (NC) proves that your reagents aren't contaminated.  Negative Controls should never work.  So they would be mastermix, primers (and probe, if 3-primer) but no sample.  They should always come up negative.

A Positive Control (PC) would be a known true sample.  This always come up positive; it proves that the system is working.  The PC will also have a known amount of material and can be used for calibration.

Additional positive controls might have different subtypes of the virus to prove the assay is working on each of them.  They might also be different amounts of positive material to establish a standard curve and to prove your test is as sensitive as needed.

Controls are always a sticky issue.  Each control sample is one less patient sample you can run.  But each control also is insurance against some sort of problem that would generate invalid results -- without a positive control, if every patient is negative do you believe it?  Without a negative control, if every patient is positive do you believe it?  So good assay designs often have many positive controls to check and re-check to assure the data is correct.


[added 2020-02-02 8:20]

Details on the CDC's qPCR Test

The CDC has published a guideline to using their test which includes everything needed to run it except the specific primer sequences -- that way the document is flexible in case those sequences must be changed.  Primers can be ordered from multiple commercial suppliers.

The assay is a primers+probe design using a single primer+probe design using FAM as the dye.  Once reagents are loaded in the thermocycler, the protocol has a 2 minute enzymatic step at 25C that appears to be to reduce carry-over contamination, a 15 minute reverse transcription at 50C, a two minute step at 95C to activate the PCR enzyme and then 45 cycles of 3 seconds of denaturation (strand separation) at 95C and a 30 second combined annealing-extension step at 55C.

For those interested, at this temperature Taq polymerase adds about 24 nucleotides per second, which implies the primers are 720 bases apart if the assay is optimized for speed.

[2020-02-06 -- as per comments, fixing my botched units]]
Editorial note: I really bungled below, as noted in comments, reading the ramp times as seconds per degree C instead of the reverse. Stupid, stupid young student mistake.  Which I should have checked but didin't

That would add up to a bit under 45 minutes -- but we need to account for ramp time, the time to move between temperatures.  That depends on the thermocycler.  One manufacturer shows a range for heating of between 3.4 to 4.6 seconds per degree C, degrees C per second with cooling running from 2.8 to 4.2 seconds. With over thirty cycles, this adds up to a bit over 10 minutes of additional time.  Because you do this so many times, it really adds up -- using one of the faster cyclers listed its 114 minutes of ramping up and 117 minutes of ramping down!  Adding up to 4.5 hours total.

The CDC test has three primer+probe sets. All three must be positive for a positive result; if one or two are positive the result is classified as inconclusive.  If a test is scored inconclusive, the CDC guidance is to re-extract nucleic acids and rerun the panel. A repeated inconclusive would require consultation with the CDC.

For controls, there is one run positive control and one negative control.  There is also a control for the extraction procedure that is recommended for each batch of extractions; this is also a negative control. In addition, they recommend a second PCR reaction for each clinical sample which detects a known human gene; this is a control for sample integrity.

Any Questions?

If you have further questions about qPCR in the context of the Wuhan coronavirus, please leave a comment or ping me on Twitter (@omicsomicsblog) and I will answer to the best of my ability with a reply in the comments.  Similarly, if you are concerned I have made an error of fact, please leave a comment.  Comments are moderated; it doesn't go public until I click a box.  Off-topic comments, particularly those spreading rumors and fear mongering, will not be made public -- there's too much garbage out there and I won't allow this space to be used to echo it.

7 comments:

Unknown said...

Very nice description.

Anonymous said...

Good overall summary but the rate/size description is off, for diagnostic qPCR amplicons are always very short, tending to be 100 bp or less. You want maximum speed and efficiency, no need to use the DNA product for anything afterwards so who cares how long it is. Theoretically you could use the rate of synthesis of a polymerase and extension time equal to number of seconds required...but in practice that's not how it works, there's no 100% efficient reaction, processivity means multiple association/dissociation events, etc. so all PCR uses way more extension time than it "should". Shorter amplicons always better for Dx. The CDC assay "N1" for example is 72 bp end to end with the hydrolysis probe in the middle.

Keith Robison said...

Thank you for the constructive comment.

I agree that diagnostic PCR amplicons should be kept short and generally are. I have a colleague who is a fiend on this point.

When I wrote that, I hadn't yet found the actual CDC primers and probes, which are indeed much shorter. Tried to get to clever doing the calculations.

It would be interesting to see how the test would perform with much shorter cycle times, though in the end it is the ramp times that dominate the overall time. Given the short amplicons, Taq polymerase should be processive* enough to finish such short amplicons most if not every time


Definition for non-experts:
processivity is the property of a polymerase of how long it polymerizes before spontaenously stopping and falling off the template. Highly processive polymerases will rarely stop; less processive polymerases will stop more frequently

Anonymous said...

The ramp rates you cite are in degrees C per second, not seconds per degree C.

Keith Robison said...

Much egg on my face -- thank you for noting that. It seemed odd but then I just rechecked my arithmetic, failing to check that I had the units rightside up! Oy!

Anonymous said...

"Most thermocyclers use Peltier elements also known as thermocouples"

No, Peltiers are not also known as thermocouples. These are two distinct pieces of hardware with distinct purposes. Thermocouples are usually used to control the operation of Peltier elements. The operation of both relies on a thermoelectric effect, but the construction is vastly different

yuri said...

Is this the same technology that used for HIV testing?