Tuesday, May 22, 2018

Should PentaSaturn Buy An iSeq: A Hypothetical Scenario Illustrating Platform Picking

Editorial note: I wrote this in early January, then planned to slot it in after some other items.  Then life knocked me upside the head, then AGBT came along and then it was forgotten.  Once I remember it, I fretted it had gone stale. But I had put a lot of effort into it and really nothing has changed with regard to iSeq, other than it should be shipping now.  Besides, this week is London Calling and so having an Illumina-centric piece could be a bit of useful balance.  So, for your consideration:

Some of the online discussion around this January's iSeq announcement, springing from my piece or elsewhere, explores how the iSeq fits into the sequencing landscape.  In particular, how does it fit in with Illumina's existing MiniSeq and MiSeq and how does it go against Oxford Nanopore's MinION.  For example, in Matthew Herper's Forbes piece, genomics maven Elaine Mardis compares iSeq unfavorably to MiSeq in terms of cost-per-basepair.  I'm a huge believer in fitting sequencing to ones scientific and practical realities and not the other way 'round: no one platform quite fits all situations nor do even the same metrics fit all situations.  So in this piece, I'm going to illustrate what I believe is a plausible scenario in which iSeq would make sense.  Now, I have designed this to play to iSeq's characteristics and very realistically have many dials which I could turn to go in another direction.  Which I will try to note as I go along.
First, the ground rules.  Other than pretending that iSeq is available right now, only released hardware and chemistries are fair game.  That's a bit unfair to other platforms since only Illumina's closest friends will see iSeq this quarter and even next quarter there may be a bit of a wait (though not as bad as for a Tesla Model 3).  Given the relatively conservative design of the system, it seems Illumina will launch close to their projected schedule, whereas some companies are wONT to make many announcements which sometimes are a long time coming.

PentaSaturn LLC

I'm going to launch a fictional company, PentaSaturn LLC.  PentaSaturn's scientific braintrust believes that they can use synthetic biology to make a valuable fine chemical, codename LC39A.  Proof-of-concept experiments suggest that LC39A synthesis (or the closely related compound LC39B) is biochemically feasible via a multi-enzyme biosynthetic route, but at the moment only spectroscopic quantities have been made.  

The company has lined up financing, but the investors want to have some control over losses and so have broken the funding into three stages: success at one stage will enable the next one to fire.  The gate for the first stage is to demonstrate significant production of LC39A or LC39B.  The plan is to use a combinatorial gene assembly scheme to explore different enzymes, promoters and ribosome binding sites to generate multi-milligram quantities of the target compounds.  While the molecular biology will support generating ginormous combinatorial libraries, the assays (codenamed CSM and LEM) are of only modest throughput, enabling approximately 100-200 clones to be screened per week.  That process will go on for 20 weeks to screen approximately 4000 clones.  Those clones will need to be sequenced to determine which components and combinations appear to work and to identify which are duds.  Given the limited screening capacity, rapid course corrections are highly desirable -- sequencing can deliver huge forward thrust if results can quickly identify the most promising clones to fly through the assay bottleneck.

PentaSaturn's investors have stressed to the team the need to quickly come to a go/no-go decision.  Time is money: rent and salaries are major expenses.  Rent has been mitigated a bit by subleasing a small lab space from another biotech; bigger space will come with success.  The investors also want a very lean capital expenditure n for the first stage and tell the team to build the first stage with quality but minimal expenditures; much more money will flow if that first stage gate is passed.  


Outsourcing the sequencing has a number of appeals for PentaSaturn.  Personally, I've done a spectacular amount of sequencing with 99% of it via trusted outsource partners.  Outsourcing is capital lean, a big plus given the funders' preferences.  Outsourcing requires no space; not only does your own sequencer chew up space, but there is often ancillary equipment.  Most importantly, someone needs to run the sequencer, and when they're doing that they aren't doing something else.  Oursourcing is very space efficient!  Outsourcing also means not locking in equipment or a platform and also access to essentially unlimited sequencing capacity.  Those features aren't important for PentaSaturn, but can be valuable for other companies -- in my current position I've outsourced sample runs onto at four different technologies (Illumina, Ion, 454 and PacBio) and at times had far more equipment running my projects (in excess of $2M on multiple occasions) than my sequencing budget could ever buy.

But there are serious cons.  The biggest in PentaSaturn's eyes is the turnaround time.  The fastest outsource partners I have worked with return data in roughly two weeks.  In a pinch they can be amazing, but that's the sustainable rate.  There's also the problem of work weeks -- I don't ship to domestic providers on Friday because then my samples will sit who-knows-where at who-knows-what temperature all weekend.  Overseas vendors are even worse.  National holidays also throw a wrench in schedules, as does severe weather.  Doesn't even need to be bad weather in your area -- thunderstorms in Memphis can wreck FedEx and   There's also a small risk of shipping failing; I gave a talk on sequencing outsourcing at BioIT one year and had a slide of a FedEx truck that failed to yield at a grade crossing to an Amtrak train. 

So PentaSaturn decided outsourcing didn't look attractive and decided to review the options for sequencers.

Initial Considerations

PentaSaturn is looking to spend minimal capital and has minimal space.  That puts PacBio Sequel pretty far behind the eight ball, with a $300K price tag and a decent-sized footprint (Sequel is roughly the size of a deli refrigerator).  Reviewing the construct designs, there aren't any pressing needs for long reads.  Some synthetic biology vectors have repeats in them -- cos sites or Gateway components, but these don't.  Some people are crazy enough to play around with gene components that are themselves hideously repetitive, but PentaSaturn doesn't need that.  So for this application, long reads aren't an advantage.  

How much sequence will be generated how often?  How will it be batched?  The CSM and LEM assays are run in 48-well format four days a week (each takes two days to run, so you don't start runs on Friday).  So the ideal would be to run the sequencer daily, but perhaps nearly as good would be to batch things into 96-well sets for two runs per week.  Such batching could potentially lighten the burden of library preparation.  One 192 well run per week is also an option.

The constructs are at most 30Kb, so achieving 100X average coverage means generating 3 megabases of data per sample -- if the samples were truly clean.  Realistically, even with exonuclease treatment to remove host contamination there will be a background of around 1-2X coverage of E.coli, so dial that up to 10 megabases of output per construct.  So if we run batches of 96, that means an overall output requirement just under 1 gigabase.  If we run 192-well batches, then double that to 2 gigabases.

So let's look at Illumina and MinION options for this project (sorry, Ion Torrent).


With iGenomX's RipTide kits we could actually pool even deeper than needed, and the list price is $12K to make 10 plates of libraries.  So we need about $48K in library kits.  For comparing within Illumina we can just park that; library costs won't depend on platform.

Going with 192 sample batching and 2 gigabases of output would neatly fit MiniSeq's MidOutput kit.  That's a 17 hour run after library prep, so it is good we aren't planning to run every day.  We'll need 20 runs and the kit is $550, so that's $11K in reagents plus $49.5K for the instrument -- a total of $60.5K going Illumina's way.  This also covers my earlier dismissal of Ion Torrent, as the ancillary equipment runs up a bill that is already well north of $60.5K.

Suppose we try iSeq instead.  We're forced to run twice per week to fit the 1 gigabase output, so 40 runs.  Those kits cost $625 -- so we're paying 13% more to get about half the output.  So if you are looking at cost per basepair, iSeq is a real loser.  But the total project cost is $25K in kits and $19.5K for the instrument for a total of $44.5K -- or $16K less thrown into the sequencing operation. A 26% savings is nothing to sneeze at.  Tying up even less bench space than a MiniSeq is a nice bonus; it is stunning how every square centimeter of startup lab space can become a precious commodity. If library prep is batched into one day, then that means someone is tied up for one morning with library prep and starting the first run and then a quick loading of the second run on a second day.

The example emphasizes the fact that cost-per-basepair is not always the right metric to optimize on, even when comparing similar systems.  Sometimes cost-per-run is what matters.  Batching is often raised as a way to fit small projects into big sequencers, but the reality is that frequently delaying results is effectively more expensive than getting the best cost per base.


Second option would be to consider MinIONs.  The iSeq has set a price to beat, which takes GridION off the table. 

The first question is whether the data quality is sufficient.  That depends on the type of features one must be sensitive to.  Nanopore data quality continues to improve (see Ryan Wick's spectacular preprint masquerading as a README file), but it still plateaus out at an error rate of around 2 errors per kilobase.  Since the major error mode are homopolymers, that number may be worse on sequences with high nucleotide bias.  But in any case, for PentaSaturn that means about 60 sequencing errors per 30 kilobase construct.  If the team is mostly concerned with gross assembly errors, then that error rate won't be critical.  But if the plan includes PCR amplification of parts and flagging frameshifts or subtle changes in promoters is critical, that error profile may make MinION either unusable or borderline.

MinION has a wide array of library preparation protocols.  Clearly the most attractive is the Rapid kit, enabling multiple libraries to be made in under ten minutes.  If we assume a yield of 4 gigabases per flowcell , then in theory the project can be run in 10 flowcells.  Since we need only 1-2 gigabases per run, that implies washing flowcells.  But we actually need to plan to do far more of that, as the Oxford Rapid Barcoding kit supports only 12 barcodes (you can never, never have too many barcodes -- and 12 is very miserly!).  So to run 96 samples we need to split them into 8 batches and now more human time must be spent monitoring runs, stopping them when sufficient data is attained, washing the flowcell, loading the new library and restarting the flowcell.  Those restarts will be spaced at odd intervals due to the typical flowcell performance decay and library concentration.

It must be noted at this point that one of the many missing capabilities in ONT's MinKNOW software (a topic able to consume at least several pages of ranting) is anything remotely resembling proper support for flowcell reuse.  During a run the voltage of the flowcell must be changed and in a complete run MinKNOW manages that.  But if you stop a run, then the total time a flowcell has been run must be tracked and then compared against a table found in the Nanopore Community and the number found there entered into a text box -- after you've edited the MinKNOW running script.  Plus, remember to make a copy of the edited script so it doesn't disappear in the next MinKNOW update.

Now, another solution would be to use the ligation library generation approach and add a step to ligate on some custom barcodes so that a single run can process 96 or 192 samples.  That's definitely possible, but now the beauty of the ultra-simple, ultra-fast rapid process has been thrown aside.  Of course, the library preps for Illumina will be similar so this is really a push between the two platforms.

The longer term solution proposed by ONT is the SmidgION/Flongle scheme, in which case instead of sequentially loading pools of 12 an array of 8 Flongles could be used, ideally in a yet-to-be-discussed GridFlongle instrument.  After thinking, I realized one could take the same route with the current toolset: run 8 MinION simultaneously to process the 8 pools of 12 from a batch, then stop the runs.  It's a bit of a tracking nightmare (96-well plates of 96 barcodes are soooo much better from that standpoint), but would eliminate the more troublesome serial loading of a single flowcell.  Flowcells would still be washed and reused.

If we buy 8 basic starter packs, a 12-pack of flowcells, 54 Rapid Barcoding kits and 27 Wash kits, that comes out to $59K -- or about 64% the cost of the iSeq solution.  Partly that is because ONT has a crazy low price on the Rapid Barcoding kit (but last week it was out-of-stock, which is worrisome) of $7/sample ($672 per kit to make 6 batches of 12 samples each) versus the $12/library cost for RipTide.  

So it would come down to this: is the higher level of sequence confidence worth $33K?  

Concluding Remarks

I've designed an example which I think is realistic, but easily I could design many other hypothetical companies or projects which could be worked through.  I hope I've made clear a number of the considerations that might go into such decision and in particular how small variations in these could really change a decision. 

Those considerations extend to your economic model.  I've constructed one here that uses project-based accounting, ignoring treating the iSeq as a capital item.  But more typically it would be seen as a capital item and depreciated over a longer time period.  That's a reason ONT has come up with some of their pricing models, with one model for those who want to tap into a capital budget and others who just see money flowing.  

Sequence quality continues to be a big factor.  ONT has made progress, but sentiments definitely lie towards there probably not being any huge conceptual breakthrough that improves error rates in a huge step, but instead what is left is a large number of small contributors. Base modifications are a favorite candidate, but who knows what other phenomena can systematically alter the signal of a single strand passing through a pore?  For many applications, as I have long argued, it doesn't matter.  For other applications, it clearly is an issue.  I picked something decidedly in the middle just to emphasize the point -- though I suspect many will argue I'm being lenient and this sort of QC needs the highest quality possible.  Hybrid sequencing is one option, but obviously greatly increases costs -- imagine going to PentaSaturn's board saying we want to spend on both the iSeq and MinION options. 

As noted before, I did pick an application in which I could wave away the advantages of long reads.  For other cases similar to this, that may be much tougher.  If you are working with highly repetitive genes and your are worried about getting them base-perfect, you have a problem.  That's an area where PacBio will shine, but now we get into the conflict between PentaSaturn's turnaround time (which isn't compatible with outsourcing) and low capital expenditure (not 

I'm not an accountant and my example has almost certainly missed some key costs.  For example, I just assumed there is a thermocycler available for making PCR-based libraries.  To me that's a given -- a lab without a thermocycler isn't a DNA lab -- but that point could be argued.  Certainly a synthetic biology company such as PentaSaturn would have a fleet of them.  I also didn't include the IT expenses for the MinION side.  Eight MinIONs is going to require some fleet of relatively inexpensive machines to run them but that still might add up to $5-10K.  I've also assumed MinION basecalling can be performed on available compute resources that were bought for other parts of the effort.  A more persnickety accounting of those MinION-associated costs might significantly narrow the cost gap.  

I found this exercise fun and enlightening.  I enjoyed designing a fictional company.  The example was designed a bit to highlight a regime in which iSeq might beat MiniSeq on cost, and the MinION analysis was actually a later thought -- and surprising how much lower it came in.  I hadn't really internalized the low cost of libraries with the Rapid 1D Barcoding kit.  

If you do read this and think "well, that's nice but what about a company trying to do X?", feel free to mock up your own example! Or, pitch it to me and maybe I'll work it through if it sounds fun -- though I don't plan to run a free consulting service for all comers!

1 comment:

Leeloo said...

This is definitely one of the most useful posts I've ever encountered on this site actually. Realistically speaking, everyone needs to consider cost, time, easiness of use, etc in addition to the science behind their projects to pick between the systems. Unfortunately, my project seems to have an issue with the accuracy of ONT sequencing, but also an issue with the Illumina's need to batch samples (even for iSeq) therefore prolong the turn around time. We need very quick turnaround (a few hours), accurate sequencing. I hope one day that technology will become available.