One of the most electrifying talks at AGBT this year was given by Joe DeRisi of UCSF, who gave a brief intro on the difficulty of diagnosing the root cause of encephalitis (as it can be autoimmune, viral, protozoal, bacterial and probably a few other causes) and then ran down a gripping case history which seemed straight out of House.
The patient in the case history was a 14 year old boy (which hit home; that’s my son’s age) born with Severe Combined Immunodeficiency Syndrome (SCID). He had previously received a bone marrow transplant from his mother. In summer 2013 he swam on vacations in Puerto Rico and Florida resort (nervous murmurs from the crowd, many of whom had just swam in a Florida resort). The household also had multiple cats (toxoplasmosis)? Given the setting and the preamble, this was clearly going to be a zebra hunt.
I can’t do justice to all the stages of the progression of this patient’s illness, so I won’t try. It started with a bout of conjunctivitis, then later uveitis. Then more medical visits, with elevated white cell counts. Given the patient’s complex background, sometimes he was treated for an infection and other times for possible Graft-vs-Host Disease (GVD) and other times for a recurrence of SCID. Clearly these are approaches which might not be sympatico: GVD is treated with immunosuppressants but for infections one might want to tune up the immune system. Various assays for infection, using cultures and PCR, repeated came back negative. Diagnostic tests became increasingly invasive, starting with spinal taps.
Eventually, the boy landed in the hospital and was there for over a month, with steadily worsening condition. A 1 cubic centimeter brain biopsy (shudders from the crowd) revealed inflammation, but no definite cause. DeRisi showed a picture of the boy which his parent's had made public; he was nearly encased in medical tubing. Due to worsening mental status, a coma was induced. In desperation, the doctors approached DeRisi to use sequencing as an unbiased search for an occult pathogen.
So, a protocol was quickly thrown together and approved by an emergency Institutional Review Board (IRB). DNA was purified from the brain biopsy and subjected to sequencing on a MiSeq. The entire process from DNA to results took 2 days, with the team painfully aware that their patient might expire at any time.
The sequence reads were mostly human, and after that was culled out a lot were very pedestrian. But 400+ reads mapped to Leptospira, a spirochete which can cause encephalitis. Which often presents as eye infections. Further sequencing fingerprinted it as a strain common in the Caribbean. The IRB reconvened to consider the issue of treating a patient based on research data from a non-CLIA lab, but given the general safety of the recommended course (high dose penicillin) and the grave condition of the patient, treatment was initiated and the boy recovered. Further testing at the CDC with a specific PCR test for Leptospira, under CLIA conditions, was negative (but the test has a rated sensitivity of only 60%!) – a reminder that CLIA is solely a set of procedural requirements but not around analytical value.
It’s a great story (I hope DeRisi and team publish it), but it can also be seen as a great jumping off point for designing a system to tackle such cases.
For example, DeRisi stated that most of the first day was taken up by library preparation. There are certainly faster library preps; Illumina claims 1.5 hours for a rapid Nextera XT protocol. Should the protocol include a step to deplete human DNA? NEB has a kit based on methylation and Carlos Bustamante from Stanford talked about using RNA baits derived from human genomic libraries to enhance microbiome studies (or conversely, to enrich for ancient human DNA). Would the extra complexity and time gain added sensitivity? Could no-PCR or low-PCR library preps be used, to both acceleratethe preparation and reduce the risk of carryover contamination? Or perhaps using other tools, such as selecting sample barcodes from a very large pool of barcodes be the best way to detect cross-contamination?
There’s also the choice of platform and read characteristics. DeRisi used MiSeq, which has some fast options. I’ve heard some vocal proponents of the Ion platform for rapid analyses such as this. How many reads and how long should they be? What are the effects on downstream informatics. And would this be a good target for some of the emerging platforms? The David Jaffe (Broad Institute) talk on assembling with Oxford Nanopore data was a bit short on statistics, but perhaps this platform would have enough firepower to detect a pathogen, with the advantage of no PCR. But is the desire to simply identify a bacterium, or should one be shooting to detect subtle features? I doubt this; the depth of the data DeRisi described was far short of being able to assemble or to detect some small feature, but should this be a goal?
If the key requirements for rapid pathogen identification are speed and read quantity, but with relaxed demands on base accuracy or read length, then this field may represent a huge opportunity for emerging sequencing technologies. Several such companies were at the conference in different forms – Genapsys presented, Genia & PicoSeq had posters and a charming fellow from Quantum Biosystems was showing off the evolutionary history of their chips in the bar. If simple identification of pathogens is sufficient, then perhaps lots of really noisy reads would be sufficient – and pathogen detection an early revenue opportunity for these companies -- particularly if this sort of analysis becomes routine and expected at every major medical center.
The downstream informatics could be a rich source of innovation, as this took a significant amount of time (as long as the sequencing, if I recall correctly). Could reads be scanned for human-ness as they are generated, with the sequencer only exporting non-human reads? DeRisi used BLASTN versus all of GenBank after depleting the human reads, which is neither the fastest algorithm nor an ideal database. Genbank has both a lot of redundancy and a lot of uninteresting genomes; if you identify Ficus reads will you care in this setting? Perhaps just matching the k-mer profile of the sequences would be sufficient. How much time could be saved by a tool which wasn’t actually aligning to the human genome, but simply finding fragments that could align to the human genome? Or do you just take their protocol and have ginormous compute resources on standby, using a large cloud very briefly to chew through the data? And for clinicians who are not genomicists, how do you best present the end results? DeRisi showed a taxonomy browser, but using such implies a certain degree of training and background. Perhaps a list of bad actors reported in rank order of abundance makes more sense.
A truly poignant question is whether this infection could have been detected by NGS earlier and less invasively, before the patient and family went through so much suffering and anxiety? Could it have been detected in the spinal taps? I don't believe DeRisi addressed that question, though specific PCR assays designed from the sequence data failed to detect Leptospira in the blood samples.
DeRisi is setting up a center at UCSF to routinely use sequencing to identify pathogens, as well as a rat brain slice assay for autoimmunity (I just caught myself before explaining that one to someone over lunch!). Centers such as this will presumably work out the sorts of questions above: what are the requirements for this space and what are the best ways to meet them. There are some great opportunities here for the bioinformatics community to focus on something truly different and potentially life-saving, far better than dubious performance improvement claims for short read aligners. And also more challenging: real time bioinformatics has not received much attention, and carries with it some strong programming issues. If releasing human datasets has privacy concerns, perhaps DeRisi could release some of data from his ursine subjects? Diverse public datasets, and perhaps even CASP- or Assemblathon-style challenges would seem very apropos for rapid pathogen detection, where the stakes are potentially very high.