Thursday, January 06, 2022

Three Reactions from December's PacBio+Invitae Mendelspod

Theral Timpson hosted PacBio CEO Christian Henry and Invitae CEO Sean George for a Mendelspod podcast  back on Pearl Harbor day last month.  It's a fun, chatty interview with the two which illustrated why these two companies have an excellent strategic fit.  I won't summarize all of it, but I did have strong reactions to three points

Just to repeat, hopefully correctly this time, my conflict-of-interest statement: Henry sits on the Board of my employer.  My memory decided to promote him to Chairman in the original version of the last post (one of two sloppy errors-of-fact in that)  and that isn't a position he's held.  And I probably should note Theral's been daring enough to let me one a couple of times.

Happy to See: PacBio and Invitae's Collaboration Parameters

It's clear PacBio and Invitae is an excellent matchup.  Invitae gains access to technology sooner and PacBio knows they have a committed customer for new improvements plus plenty of front-line insight into what should change in their platform.  What is particularly interesting about this collaboration is that they apparently didn't fence off any of the IP generated into exclusive areas: PacBio is free to use what is developed together for anything in their space and Invitae similarly has free reign.  That's a great attitude, though understandably it carries risks.  Invitae knows that competitors will be free to buy PacBio instruments in the future that benefit from Invitae's work.  This "growing the pie" strategy won't work for everyone, but it is good to see.

Intrigued: PacBio's Mongo Sequel Under Development

Henry discussed two different hardware platforms under development.  One would be for small research labs and wasn't described in any detail.  But the other is an ambitious large scale instrument that he threw a few teasers out that are worth exploring.

First, let's sketch the current throughput of the Sequel II platform.  If all goes well, it takes about three SMRTcells to generate enough data to cover a human genome.  At 1.5 days (36 hours) per SMRTcell, that's less than 2 genomes per week per Sequel II and about 80 genomes per instrument if we assume nearly no downtime.

Henry suggested that the instrument being developed would have a capacity to generate over 10 thousand human genomes per year, or at least 100-fold more than the current instruments can.  How could PacBio get there?

Sequel I to Sequel II saw an 8-fold increase in ZMWs per SMRTcell.  So a 64M chip has often been projected as the next step and that should deliver 8-fold more data.  So we still have about 13 fold to account for.

Loading efficiency is always a target, but even if PacBio just had Poisson loading (I believe they do better) that would leave a ceiling for improvement of 3-fold.  Increasing the insert lengths is one route, but runs the risk of fewer CCS reads having sufficient passes to generate "HiFi" (0.1% error aka Q30) consensus reads.  Pushing the DeepConsensus machine learning algorithm for CCS refinement into production might help boost both the number of reads meeting HiFi (perhaps by up to 40%, though rarely do such numbers from papers survive the real world) and by enabling HiFi from even longer inserts. Relying on insert length for throughput will also generate impressive numbers that will be harder to repeat in the field.  

Add all these together and 100-fold still seems very aggressive. Of course, one semi-cheat would be for each instrument to have multiple flowcells -- NovaSeq has two and Singular G4 is slated to have 4.  

Hyperskeptical: unified sample prep for long and short reads

Timpson asked an obvious question: if PacBio has spent so much time extolling the advantages of long reads, then why did they buy Omniome's short read technology?  I can certainly agree with Henry's answer that there are applications where each tech has an advantage.  His vision of having a unified bioinformatics platform for both makes complete sense.

But Henry also suggested that PacBio will unify sample prep for both long and short reads, and there I'm very skeptical.  Perhaps some common prep instruments and perhaps some common reagents, but the requirements are just very different so much of the time.

For example, getting high molecular weight DNA for long reads is difficult, and you just don't need to go through that trouble for short reads.  Conversely, if you do have long molecules and want short reads, you must fragment them somehow.  There's been a flurry of preprints around concatarmerizing inserts upstream of HiFi sequencing; PacBio Sequel line is the only platform currently existing where this makes sense (and it would also make sense for me to write on this topic specifically!)

I suppose I should keep an open mind.  Supporting the ambitious goals of the Invitae collaboration must mean some serious effort on sample prep automation, and if that means a microfluidics approach it could mean an adaptable platform suitable for both regimes.  But, no microfluidic genomic sample prep platform brought to market has ever been high throughput -- Oxford Nanopore's VolTRAX and Illumina's discontinued NeoPrep were both in the sub-20 sample range and Miroculus' platform isn't high throughput either. Once you get into 96-well or 384-well pipetting robots, shear forces that fragment long DNA start being difficult to avoid.

PacBio and Invitae both present (virtually) next week at the J.P. Morgan conference; it will be interesting to see if any other details on the new hardware are revealed or any timelines for it being seen in the wild.

[2022-01-07 - as pointed out by a commenter, I erred in putting 1% instead of 0.1% as the HiFi error rate]


Anonymous said...

"HiFi" reads are defined as having a score of Q30 or 99.9% accurate not Q20 or 99% accurate as written in your post.

Keith Robison said...

Thank you for catching that!!! Fixed

David Eccles said...

Another potential route to more genomes per year: faster base synthesis. It's something like 2.5 bases per second at the moment, isn't it?

David Eccles said...

You still have "0.1% error aka Q20". 0.1% error is three nines of accuracy (99.9%), so q30

Simon B. said...

In the 'Hyperskeptical' section, did you mean to say "...extolling the advantages of [long] reads..." rather than saying short reads twice?