One
of the most electrifying talks at AGBT this year was given by Joe DeRisi of
UCSF, who gave a brief intro on the difficulty of diagnosing the root cause of
encephalitis (as it can be autoimmune, viral, protozoal, bacterial and probably
a few other causes) and then ran down a gripping case history which seemed
straight out of House.
The
patient in the case history was a 14 year old boy (which hit home; that’s my
son’s age) born with Severe Combined Immunodeficiency Syndrome (SCID). He had
previously received a bone marrow transplant from his mother. In summer 2013 he swam on vacations in Puerto
Rico and Florida resort (nervous murmurs from the crowd, many of whom had just
swam in a Florida resort). The household
also had multiple cats (toxoplasmosis)?
Given the setting and the preamble, this was clearly going to be a zebra
hunt.
I
can’t do justice to all the stages of the progression of this patient’s
illness, so I won’t try. It started with
a bout of conjunctivitis, then later uveitis.
Then more medical visits, with elevated white cell counts. Given the patient’s complex background,
sometimes he was treated for an infection and other times for possible Graft-vs-Host
Disease (GVD) and other times for a
recurrence of SCID. Clearly these are
approaches which might not be sympatico: GVD is treated with immunosuppressants
but for infections one might want to tune up the immune system. Various assays for infection, using cultures
and PCR, repeated came back negative. Diagnostic
tests became increasingly invasive, starting with spinal taps.
Eventually,
the boy landed in the hospital and was there for over a month, with steadily
worsening condition. A 1 cubic centimeter brain biopsy (shudders from the
crowd) revealed inflammation, but no definite cause. DeRisi showed a picture of the boy which his parent's had made public; he was nearly encased in medical tubing. Due to worsening mental status, a coma was
induced. In desperation, the doctors
approached DeRisi to use sequencing as an unbiased search for an occult pathogen.
So,
a protocol was quickly thrown together and approved by an emergency Institutional
Review Board (IRB). DNA was purified from
the brain biopsy and subjected to sequencing on a MiSeq. The entire process from DNA to results took 2
days, with the team painfully aware that their patient might expire at any
time.
The
sequence reads were mostly human, and after that was culled out a lot were very
pedestrian. But 400+ reads mapped to
Leptospira, a spirochete which can cause encephalitis. Which often presents as eye infections.
Further sequencing fingerprinted it as a strain common in the Caribbean. The IRB reconvened to consider the issue of
treating a patient based on research data from a non-CLIA lab, but given the
general safety of the recommended course (high dose penicillin) and the grave
condition of the patient, treatment was initiated and the boy recovered. Further testing at the CDC with a specific
PCR test for Leptospira, under CLIA conditions, was negative (but the test has a rated sensitivity of only 60%!) – a reminder
that CLIA is solely a set of procedural requirements but not around analytical
value.
It’s
a great story (I hope DeRisi and team publish it), but it can also be seen as a
great jumping off point for designing a system to tackle such cases.
For
example, DeRisi stated that most of the first day was taken up by library
preparation. There are certainly faster
library preps; Illumina claims 1.5 hours for a rapid Nextera XT protocol. Should the protocol include a step to deplete
human DNA? NEB has a kit based on methylation and Carlos Bustamante from
Stanford talked about using RNA baits derived from human genomic libraries to
enhance microbiome studies (or conversely, to enrich for ancient human DNA). Would the extra complexity and time gain
added sensitivity? Could no-PCR or low-PCR library preps be used, to both acceleratethe preparation and reduce the risk of carryover contamination? Or perhaps using other tools, such as selecting
sample barcodes from a very large pool of barcodes be the best way to detect
cross-contamination?
There’s
also the choice of platform and read characteristics. DeRisi used MiSeq, which has some fast
options. I’ve heard some vocal
proponents of the Ion platform for rapid analyses such as this. How many reads and how long should they
be? What are the effects on downstream
informatics. And would this be a good
target for some of the emerging platforms?
The David Jaffe (Broad Institute) talk on assembling with Oxford
Nanopore data was a bit short on statistics, but perhaps this platform would
have enough firepower to detect a pathogen, with the advantage of no PCR. But is the desire to simply identify a
bacterium, or should one be shooting to detect subtle features? I doubt this; the depth of the data DeRisi
described was far short of being able to assemble or to detect some small
feature, but should this be a goal?
If
the key requirements for rapid pathogen identification are speed and read
quantity, but with relaxed demands on base accuracy or read length, then this
field may represent a huge opportunity for emerging sequencing technologies. Several such companies were at the conference
in different forms – Genapsys presented, Genia & PicoSeq had posters and a
charming fellow from Quantum Biosystems was showing off the evolutionary
history of their chips in the bar. If simple
identification of pathogens is sufficient, then perhaps lots of really noisy
reads would be sufficient – and pathogen detection an early revenue opportunity
for these companies -- particularly if this sort of analysis becomes routine and expected at every major medical center.
The
downstream informatics could be a rich source of innovation, as this took a
significant amount of time (as long as the sequencing, if I recall correctly). Could reads be scanned for human-ness as they
are generated, with the sequencer only exporting non-human reads? DeRisi used BLASTN versus all of GenBank
after depleting the human reads, which is neither the fastest algorithm nor an
ideal database. Genbank has both a lot
of redundancy and a lot of uninteresting genomes; if you identify Ficus reads will you care in this
setting? Perhaps just matching the k-mer
profile of the sequences would be sufficient.
How much time could be saved by a tool which wasn’t actually aligning to
the human genome, but simply finding fragments that could align to the human
genome? Or do you just take their protocol and have ginormous compute resources
on standby, using a large cloud very briefly to chew through the data? And for clinicians who are not genomicists,
how do you best present the end results?
DeRisi showed a taxonomy browser,
but using such implies a certain degree of training and background. Perhaps a list of bad actors reported in rank
order of abundance makes more sense.
A truly poignant question is whether this infection could have been detected by NGS earlier and less invasively, before the patient and family went through so much suffering and anxiety? Could it have been detected in the spinal taps? I don't believe DeRisi addressed that question, though specific PCR assays designed from the sequence data failed to detect Leptospira in the blood samples.
DeRisi
is setting up a center at UCSF to routinely use sequencing to identify
pathogens, as well as a rat brain slice assay for autoimmunity (I just caught
myself before explaining that one to someone over lunch!). Centers such as this
will presumably work out the sorts of questions above: what are the requirements
for this space and what are the best ways to meet them. There are some great opportunities here for
the bioinformatics community to focus on something truly different and
potentially life-saving, far better than dubious performance improvement claims for short read aligners. And also more
challenging: real time bioinformatics has not received much attention, and
carries with it some strong programming issues. If releasing human datasets has
privacy concerns, perhaps DeRisi could release some of data from his ursine subjects? Diverse public datasets, and perhaps even CASP- or Assemblathon-style challenges would seem very apropos for rapid pathogen detection, where the stakes are potentially very high.
15 comments:
Thanks for the awesome post. Glad that you beat me to do it :) I thought this great story got completely swept away due to Nanopore talk at AGBT. Really glad that you wrote it. I have storified tweets from Joe DeRis's talk here
http://nextgenseek.com/2014/02/ngs-in-critical-care-a-feel-good-story/
"In desperation, the doctors approached DeRisi to use sequencing as an unbiased search for an occult pathogen." This part is fascinating to me. Do we know how the doctors knew to contact DeRisi? How did they know who he was, or what he might be able to do? If this were fiction, this would be a deus ex machina. It's that missing link that explains so much of what goes on in the world. Sure, maybe he plays golf with one of the doctors and it was totally serendipitous. Or is there now a consciousness among clinicians of NGS (and knowledge of the people doing the sequencing and bioinformatics) that stories like this actually happen all the time. I'm just curious.
Anonyomous: that's how DeRisi presented it in my memory - a desperation call. Whether they were aware of his polar bear work he didn't say; I found that when checking if the story had been published. Not being in an academic medical center, I can't comment on the degree to which this communication occurs -- but it certainly needs to!
Many thanks Keith for this write-up. (You saved me the effort!)
There are a few items that I've thought about since that great talk, in order:
1) A bit of a custom-tailored situation to demonstrate the power of NGS for a critical-care case. You point out the multiple possible etiologies for encephilitis, and in this case the physicians presumed since the infectious diagnostics came back negative, it was presumed it was autoimmune in nature. And the patient's condition only grew worse. So here was a case where the patient was both immune-compromised and exposed to a possible environmental pathogen.
2) DeRisi focused his efforts on the 90 min of analysis and how it could be accelerated, and no detail about how the sequencing could have been sped up. Thanks for the acknowledgement that Ion Torrent PGM could have been used to shave at least 10h if not 12h from the sample to answer cycle.
3) Also Joe did not mention anything at all about the 7h sample preparation. He knew he was working with an unknown causative agent, including fungus and virus. In the Q&A he was asked about it, he just mentioned it was a 'total nucleic acid prep', which presumably meant accounting for both RNA and DNA viruses, along with fungi which could pose problems of its own, along with Gram+ and Gram- bacteria. But DeRisi knew all this, and knew how to prepare separate libraries, equalize/pool them, and sequence.
Bonus point 4: Out of 1570 cases in the past 7y, a full 63% went undiagnosed. So there's a real unmet healthcare need here that NGS can solve.
Thanks again for the post.
Dale
Many thanks Keith for this write-up. (You saved me the effort!)
There are a few items that I've thought about since that great talk, in order:
1) A bit of a custom-tailored situation to demonstrate the power of NGS for a critical-care case. You point out the multiple possible etiologies for encephilitis, and in this case the physicians presumed since the infectious diagnostics came back negative, it was presumed it was autoimmune in nature. And the patient's condition only grew worse. So here was a case where the patient was both immune-compromised and exposed to a possible environmental pathogen.
2) DeRisi focused his efforts on the 90 min of analysis and how it could be accelerated, and no detail about how the sequencing could have been sped up. Thanks for the acknowledgement that Ion Torrent PGM could have been used to shave at least 10h if not 12h from the sample to answer cycle.
3) Also Joe did not mention anything at all about the 7h sample preparation. He knew he was working with an unknown causative agent, including fungus and virus. In the Q&A he was asked about it, he just mentioned it was a 'total nucleic acid prep', which presumably meant accounting for both RNA and DNA viruses, along with fungi which could pose problems of its own, along with Gram+ and Gram- bacteria. But DeRisi knew all this, and knew how to prepare separate libraries, equalize/pool them, and sequence.
Bonus point 4: Out of 1570 cases in the past 7y, a full 63% went undiagnosed. So there's a real unmet healthcare need here that NGS can solve.
Thanks again for the post.
Dale
Thanks for sharing this. When doctors trying to use culture to identify the pathogen for my 2-month old son one month ago, I strongly feel sequencing should be used for such purposes.
I also like your proposal on CASP or Assemblothon style challenge. There was such a challenge organized by DTRA of Department of Defense last year (DTRA Algorithm Challenge, 1 million dollar prize) covering exactly the same problem. We won the challenge by developing a series of new algorithms, such as fast host read filter, fast and sensitive GenBank alignment tool (has to work with reads from Miseq Ion Torrent 454 PacBio), and an accurate taxa assignment algorithm. Hopefully those algorithms will be released soon, and hope sequencing will be routinely used in hospitals to detect pathogens soon.
Thanks for sharing this. When doctors trying to use culture to identify the pathogen for my 2-month old son one month ago, I strongly feel sequencing should be used for such purposes.
I also like your proposal on CASP or Assemblothon style challenge. There was such a challenge organized by DTRA of Department of Defense last year (DTRA Algorithm Challenge, 1 million dollar prize) covering exactly the same problem. We won the challenge by developing a series of new algorithms, such as fast host read filter, fast and sensitive GenBank alignment tool (has to work with reads from Miseq Ion Torrent 454 PacBio), and an accurate taxa assignment algorithm. Hopefully those algorithms will be released soon, and hope sequencing will be routinely used in hospitals to detect pathogens soon.
Great post Keith - your comment on using the Ion platform for "rapid ID of pathogens" is spot on - some protocols in my lab are <12 hours at the moment (ampli-seq) for biosurveillance purposes. The MiSeq though - is a very capable fast, brute force "metagenomics" platform when it needs to be as well. (speaking from experience) But in a diagnostics/biosurveillance role - you're not _really_ doing metagenomics. The basic question is "Given a list of pathogens (maybe a fairly big list, but smaller than Genbank) - are any of these bugs in this sample?"
IMHO - this is where NGS is going for rapid pathogen diagnostics. In a few years, given the rate of NGS innovation, the days of PCR testing samples for pathogen ID will be dead.
The quality of the reads coming from NGS platform is a key for diagnostic purposes. The time is fine but the quality is everything. !!!
@Anonymous: I was a grad student in the DeRisi lab. Before switching over to NGS, the lab did similar pathogen hunting using a custom designed mircoarray called the ViroChip. In fact, they (I say "they" because I worked on a completely different project) helped identify the causative agent for SARS, and there were several projects in the lab on identifying novel viruses in patient samples of unknown etiology from Bay Area hospitals. So, the lab is actually somewhat well-known for this type of thing.
Thanks for this story - it is particularly interesting to me having worked on Leptospira genomics since I started in bioinformatics 11 years ago!
The k-mer approach for identification is a good strategy, recently implemented in Kraken by Wood and Salzberg: http://ccb.jhu.edu/software/kraken/
Leptospira has now been at AGBT, on The Simpsons, on Mythbusters, and on The Big Bang Theory! :-)
Thanks for the interesting post: a triumph for diagnostic metagenomics! Readers might be interested in some of our recent publications on this approach:
Diagnostic metagenomics: potential applications to bacterial, viral and parasitic infections
http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=9186805
A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4
http://jama.jamanetwork.com/article.aspx?articleid=1677374
Metagenomic analysis of tuberculosis in a mummy
http://www.nejm.org/doi/full/10.1056/NEJMc1302295
Thanks for the interesting post: a triumph for diagnostic metagenomics! Readers might be interested in some of our recent publications on this approach:
Diagnostic metagenomics: potential applications to bacterial, viral and parasitic infections
http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=9186805
A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4
http://jama.jamanetwork.com/article.aspx?articleid=1677374
Metagenomic analysis of tuberculosis in a mummy
http://www.nejm.org/doi/full/10.1056/NEJMc1302295
What I think the bigger issue is the failure of the directed PCRs. What this shows is a disconnect of curation of the PCR or similar technique (qPCR, sanger sequencing, fusion NGS) that keeps the sequence design current with the correct specifications. NGS is a great tool and I use it quite often for specific cases like this, however after reviewing what was done and not done it almost always comes down to failure to update the oligo design for current strain and region information that causes a specific assay to fail. Other times, it is the time point of when the sample is collected an analyzed makes it so the target is not present in the sample. NGS is a great tool, but for several hundred/thousand dollars per sample it would be much more beneficial to the patient and cost effective to run an accurately designed and curated qPCR or panel of qPCRs in a half day than spending the effort required by NGS. That said, each molecular analysis tool has its place. The key to successful molecular analysis is rapid, redundant, and repetative bioinformatics curating the assay desings in silico on a regular and ongoing basis. Good to hear the causative agent was identified in this case.
PasserBy: thanks for the comment! While I agree with you that a good qPCR panel could be an option, the appeals of sequencing are that on the one hand one is empowering detection of a very broad range of possible pathogens and on the other that the cost or sequencing is still on a steady downward drop, whereas qPCR is a pretty mature technology. Shotgun sequencing also has the advantage of not requiring any specialized (for the assay) reagents to be pre-positioned at point-of-care; for this setting time is critical.
Post a Comment