Just to repeat, hopefully correctly this time, my conflict-of-interest statement: Henry sits on the Board of my employer. My memory decided to promote him to Chairman in the original version of the last post (one of two sloppy errors-of-fact in that) and that isn't a position he's held. And I probably should note Theral's been daring enough to let me one a couple of times.
Happy to See: PacBio and Invitae's Collaboration Parameters
It's clear PacBio and Invitae is an excellent matchup. Invitae gains access to technology sooner and PacBio knows they have a committed customer for new improvements plus plenty of front-line insight into what should change in their platform. What is particularly interesting about this collaboration is that they apparently didn't fence off any of the IP generated into exclusive areas: PacBio is free to use what is developed together for anything in their space and Invitae similarly has free reign. That's a great attitude, though understandably it carries risks. Invitae knows that competitors will be free to buy PacBio instruments in the future that benefit from Invitae's work. This "growing the pie" strategy won't work for everyone, but it is good to see.
Intrigued: PacBio's Mongo Sequel Under Development
Henry discussed two different hardware platforms under development. One would be for small research labs and wasn't described in any detail. But the other is an ambitious large scale instrument that he threw a few teasers out that are worth exploring.
First, let's sketch the current throughput of the Sequel II platform. If all goes well, it takes about three SMRTcells to generate enough data to cover a human genome. At 1.5 days (36 hours) per SMRTcell, that's less than 2 genomes per week per Sequel II and about 80 genomes per instrument if we assume nearly no downtime.
Henry suggested that the instrument being developed would have a capacity to generate over 10 thousand human genomes per year, or at least 100-fold more than the current instruments can. How could PacBio get there?
Sequel I to Sequel II saw an 8-fold increase in ZMWs per SMRTcell. So a 64M chip has often been projected as the next step and that should deliver 8-fold more data. So we still have about 13 fold to account for.
Loading efficiency is always a target, but even if PacBio just had Poisson loading (I believe they do better) that would leave a ceiling for improvement of 3-fold. Increasing the insert lengths is one route, but runs the risk of fewer CCS reads having sufficient passes to generate "HiFi" (0.1% error aka Q30) consensus reads. Pushing the DeepConsensus machine learning algorithm for CCS refinement into production might help boost both the number of reads meeting HiFi (perhaps by up to 40%, though rarely do such numbers from papers survive the real world) and by enabling HiFi from even longer inserts. Relying on insert length for throughput will also generate impressive numbers that will be harder to repeat in the field.
Add all these together and 100-fold still seems very aggressive. Of course, one semi-cheat would be for each instrument to have multiple flowcells -- NovaSeq has two and Singular G4 is slated to have 4.
Hyperskeptical: unified sample prep for long and short reads
Timpson asked an obvious question: if PacBio has spent so much time extolling the advantages of long reads, then why did they buy Omniome's short read technology? I can certainly agree with Henry's answer that there are applications where each tech has an advantage. His vision of having a unified bioinformatics platform for both makes complete sense.
But Henry also suggested that PacBio will unify sample prep for both long and short reads, and there I'm very skeptical. Perhaps some common prep instruments and perhaps some common reagents, but the requirements are just very different so much of the time.
For example, getting high molecular weight DNA for long reads is difficult, and you just don't need to go through that trouble for short reads. Conversely, if you do have long molecules and want short reads, you must fragment them somehow. There's been a flurry of preprints around concatarmerizing inserts upstream of HiFi sequencing; PacBio Sequel line is the only platform currently existing where this makes sense (and it would also make sense for me to write on this topic specifically!)
I suppose I should keep an open mind. Supporting the ambitious goals of the Invitae collaboration must mean some serious effort on sample prep automation, and if that means a microfluidics approach it could mean an adaptable platform suitable for both regimes. But, no microfluidic genomic sample prep platform brought to market has ever been high throughput -- Oxford Nanopore's VolTRAX and Illumina's discontinued NeoPrep were both in the sub-20 sample range and Miroculus' platform isn't high throughput either. Once you get into 96-well or 384-well pipetting robots, shear forces that fragment long DNA start being difficult to avoid.
PacBio and Invitae both present (virtually) next week at the J.P. Morgan conference; it will be interesting to see if any other details on the new hardware are revealed or any timelines for it being seen in the wild.
[2022-01-07 - as pointed out by a commenter, I erred in putting 1% instead of 0.1% as the HiFi error rate]
"HiFi" reads are defined as having a score of Q30 or 99.9% accurate not Q20 or 99% accurate as written in your post.
ReplyDeletehttps://www.pacb.com/smrt-science/smrt-sequencing/hifi-reads-for-highly-accurate-long-read-sequencing/
Thank you for catching that!!! Fixed
DeleteAnother potential route to more genomes per year: faster base synthesis. It's something like 2.5 bases per second at the moment, isn't it?
ReplyDeleteYou still have "0.1% error aka Q20". 0.1% error is three nines of accuracy (99.9%), so q30
ReplyDeleteIn the 'Hyperskeptical' section, did you mean to say "...extolling the advantages of [long] reads..." rather than saying short reads twice?
ReplyDelete