Monday, October 19, 2015

MARC spots the Ox(ford)

Last week's end brought the initial report from MARC, the MinION Analysis and Reference Consortium, detailing a body of experiments intended to benchmark the performance and consistency of the Oxford Nanopore MinION sequencing device.   The MARC paper is also the inaugural research article in F1000's new channel for nanopore papers.
First all the necessary disclosures.  I could have been part of MARC, but wasn't able to carve out the time, but the MARC coordinators have been too polite to evict me from the mailing list. My sole contribution has been after the release of the document, spotting a copy editing error that many elementary school students could spot. Ewan Birney led the consortium, and has always been assiduously transparent about the fact that he is a paid consultant to Oxford Nanopore, so I'll repeat that declaration here.  MARC ran their work with full knowledge of ONT, which provided free reagents for the effort.

The goal of MARC Phase I (as this is intended to be an ongoing series of analyses), is to nail down sources of variability in the MinION platform.  To do this, they first ran 10 flowcells, each in a different lab, using E.coli DNA as the reference.  After Oxford revised their protocol significantly, another 10 experiments were run by the same labs using the same material and the new protocol. So there are 20 flowcells of data, all of which has been deposited publicly at the European Nucleotide Archive as native FAST5 files.

One warning: don't try reading the paper by skipping straight to the Results section; the Introduction contains a detailed and critical description of the MinION process in great detail, and also introduces the necessary technical vocabulary to understand the rest of the paper. It is also important to note that the paper, with a few exceptions, looks only at the 2D reads (in which both strands of a library fragment were sequenced and the information combined).

I can't do justice to the wealth of analyses in the paper.  A quick count suggests the paper has about 130 distinct plot panels in the main text, with another 110 or so panels in the supplementary data.  Many possible sources of variability were explored, and while quite a few of these analyses found little or no signal.  For example, variation in the flowcell temperature observed within the experiments appeared to have no effect on performance.  However, a number of items of note were discovered.

Despite intense effort to standardize the process, deviations still occurred. Some of these were small deviations from protocol due to human error, others were just pure bad luck.  For example, during one experiment the computer running the MinION crashed.  All of the deviations are neatly tallied, though even with this at least one run was distinctive from the others for unknown reasons.  However, most deviations from wet lab protocol appeared to have little impact on instrument performance, with one strong exception.  In the outlier run, too much fuel mix was added to the flow cell, which resulted in faster data generation but much poorer quality. 

Perhaps the most important story out of MARC's analysis is a clear decline in base calling quality over the length of a run.  This is seen most strongly in overall 2D reads (Figure 9), but is less pronounced in the "2D pass" reads which have passed an additional filter -- but the fraction of 2D reads which pass the filter declines significantly over the course of a run (Figure 11).   Additional analyses of time-dependent performance drops can be seen in Figure 8 and Supplementary Figures S4, S5 and S8.  

Another important note out of the report is that base quality scores from the caller really do correlate with error frequencies.  Confirming that qualities are useful enables them to be used by downstream informatics. Error rates could also be improved significantly using the Expectation Maximization (EM) re-aligning algorithm marginAlign, although marginAlign did change the mix of error types.

The paper makes clear that this is a start for the MARC, not an end.  Bridging studies are planned to test new protocols as Oxford updates kits and flowcells (which has indeed happened since the last MARC Phase I experiments were conducted 6 months ago).  The declared goal of Phase 2 is to explore improved performance via protocol changes.

The pattern in sequencing and related technologies has been to organize such standardization consortia long after a platform has become widely accepted.  MARC gets this important work rolling early, completing before the MinION had been available for even a single year.  The MARC work also gives a huge body of data for bioinformatics developers interested in tuning tools for nanopore data or building new ones. 

No comments: