A maxim from the great but fictional advertising executive Don Draper: "if you don't like what people are saying, change the conversation". In an online strategy update presented two weeks ago ( Slides / Replay ), Illumina announced they'd like a new conversation around sequencing costs. No longer will they tout reagent cost per basepair, but instead will be focused on the total cost of sequencing workflows. The obvious cynical response is that Illumina is conceding defeat on the raw cost, having been severely beaten by Ultima Genomics (and Complete Genomics aka MGI, but that group continues to face stiff headwinds) and even matched - if you have the volume - by Element Biosciences. Total cost of ownership is what really matters, right? The catch is how is it being calculated and who is doing the calculating?
It has always been known that cost per gigabase or per million reads was a convenient fiction. Convenient because only simply arithmetic was required to convert performance specs and list prices into the metrics. But a fiction since all the other costs didn't magically go away. But which costs are we now counting? And how do you count them? For example, if the library prep requires 4 hours of hands-on time, whose hands? A Ph.D. paid at Boston rates or a fresh B.S. graduate paid at U.S. heartland rates? (not knocking either - but cost-of-living in Boston is particularly painful for those starting out and that is reflected in higher wages). Illumina would particularly like to highlight the value of their DRAGEN computational acceleration platform - but when comparing it to conventional compute, what number do you pencil in? It all runs afoul of a dictum thrown out at a class on product financial modeling back at Millennium: keep it simple - "why spend the effort to invent a lot of numbers when you can just invent a few?"
Illumina would like to calculate from having a purified DNA sample to results on the other end, which fits with their strategy of offering - but not insisting on - vertical integration. So library prep, running the sequencer, primary bioinformatics and secondary bioinformatics. The same webinar teased that two new library prep products will be coming, though a year to a year-and-a-half (if they keep schedule) away that will further fit this model.
Other companies have already been taking potshots at Illumina on cost angles that might not make it in Illumina's official numbers. For example, Ultima Genomics UG100 has a "daily care and feeding" arrangement which differs greatly from Illumina's "load a new run after the next has finished" - since Illumina runs often annoyingly exceed an even multiple of 24 hours, full Illumina instrument utilization will ultimately require night and graveyard shifts. Oxford Nanopore would similarly tout the ability of PromethION to launch new runs at will. Element and Oxford would both count to lower capital costs. And so on.
Which also brings up under what scenario are we calculating costs? One with enough samples arriving all-at-once to get maximum cost efficiency on a NovaSeq X 25B flowcell? Or a scenario favoring Element where you must run now with a much smaller batch of samples - which seems to be a more practical model for the majority of core labs. So many ways for each company to frame the problem to favor themselves and prevent any sort of apples-to-apples comparison!
Two New Library Preps -- in the Future
Illumina touted two new library prep approaches they are developing - one which claims it will perform library prep on the flowcell and another offering "5 base" sequencing which would call 5-methylcytosine (5mC). No details were provided as to how either of these would accomplish this.
Element has been leading in moving library processes onto the flowcell, though in their case it isn't the initial library prep but hybrid capture enrichment. The Illumina prep won't be cost feasible without some sort of pre-instrument operation; the input DNA's must be tagged because there are just about no applications which call for running an entire 25B flowcell on a single sample. Perhaps this would just be tagging with barcoded Nextera (Tn5), but then the samples can be pooled and placed on the flowcell to complete the process. Another speculation I've seen is that the PIPseq templating technology acquired from Fluent would somehow apply.
Illumina not only is promising a simplified workflow, but also that the quality of the final data would be better than any other solution out there - and they were clearly aiming at (but without naming) PacBio HiFi data. That is certainly in the category of "show me the data!", as that is a very hard challenge - particularly since good long range contiguity data requires high molecular weight preps going into the process. This claim might suggest they are using the PIPseq technology to generate linked reads ala the old 10X Genomics kit - but I still remain skeptical that such data can deliver in the face of certain types of repetitive content, such as Variable Number of Tandem Repeat (VNTR) alleles where the repeat array is longer than the actual read length. And there are a range of applications - perhaps not yet as big as whole human genomes but someday - which require high accuracy single molecules - each single molecule read is the datapoint.
The other big promise is a 5-base reading chemistry. The first thing to note is it isn't the same as the "on instrument library prep". Illumina also didn't talk about reading 5-hydroxymethylcytosine (5hmC), the rarer but potentially buzzier additional mammalian epigenetic mark. The claim is their method will be a simple workflow with a single library, so not a case of running one bisulfite or enzymatically modified library to read 5mC and another native one to read the genome itself. A speculation I'll throw out is again around PIPseq - perhaps some partitions would have the enzymes to recode 5mC to something else (or all the non-5mC to U, as most modification methods do.
The most advanced approach in this space is Biomodal, which is overdue for a focused approach (and was founded by the creator of Solexa technology, Shankar Balasubramanian, originally under the name Cambridge Epigenetix). Biomodal creates libraries which effectively are duplexes, with one read reading one strand and the other reading the other. By clever series of enzymatic steps, the end result is that comparing the two strands can reveal both 5mC and 5hmC while still reading the underlying sequence - 6 base sequencing. Of course, there ain't no such thing as a free lunch - any advantages of having paired end reads for mapping are no longer available, and there's always the danger of creating noise by the enzymes not always hitting their marks.
Illumina didn't announce a purchase of Biomodal, so they must have found a different way of converting. They also promised a simple workflow - a knock I've heard on Biomodal is the workflow is not simple.
One smaller tease from Illumina is a goal of putting XLEAP chemistry on the MiSeq - which would certainly tidy up their product line. But would this be existing MiSeqs or is a next generation MiSeq under development? That was left ambiguous - as well as what would happen to MiniSeq and iSeq in the process.
All-in-all, it is a welcome change to see Illumina acting as if competition exists - the webinar was full of claims that the company is listening to their customers and seeking input. So they are going to talk the talk of not being stuck in monopolist mode - but will the walk the walk? Let's see how the next few years play out
Many thanks for this useful analysis Keith. Shifting to a Total Cost of Ownership (TCO) model as you mention brings up what the assumptions are - labor (at what rate, and how much of HOT - hands-on-time - required), and the compute costs (is it on-prem, is it cloud, is it from the vendor as in the case with DRAGEN)? Would welcome what some generalizable assumptions could / should be made here. IMHO count HOT + 20% fudge factor x $75/hour (that amount could go up to $100?), and would the compute then just be on a per-platform basis? (Any useful 'rule of thumb' to use here?) Cheers Dale
ReplyDeleteNot sure how occupying a very expensive sequencer with library prep duties is time or cost effective.
ReplyDeleteWithout knowing their scheme it's hard to say but perhaps there are modules. Think of the old cBot->HiSeq arrangement. Maybe there's an L-Bot (library Bot) that preps the library, flows it onto the flow cell and clusters it. Or maybe a sequencer with it built-in 🤷♂️
DeleteI don't really see how this marks a departure from Illumina's monopolistic MO of recent years. Isn't this just a new way to describe their vertical integration, just with a different target market? It's clear that they aren't going to be allowed to do this in the clinical space as they tried with Grail, but the up- and down-stream integration they've described still has the same aims, just with the RUO market.
ReplyDelete