Monday, February 05, 2024

Want to Build A Sequencer? Opens Up Their Plans

Just as the AGBT hype cycle was firing up (with me contributing multiple sparks), serial entrepreneur Jonathan Rothberg's latest sequencing startup fully de-stealthed their technology this weekend, going so far as to release open source plans to build an instrument prototype.  is aiming to build a Keurig-sized device to retail for $100, with sequencing runs in the $20 range.  To accomplish this, they're attempting a novel twist on sequencing-by-synthesis.  It's an unconventional strategy by someone who has succeeded twice before in DNA sequencing (454 and Ion Torrent) and has multiple other companies going (if I've counted correctly)  - QuantumSI in protein sequencing (a future topic for this space, I promise!), ButterflyNetworks with inexpensive, compact diagnostics ultrasound and Hyperfine with inexpensive, compact MRI diagnostic devices.  Then I went to the 4Catalyzer site - Rothberg's incubator - and discovered a bunch of companies I hadn't heard of or had forgotten about -- Protein Evolution in synthetic biology for plastics production, Detect for home-based diagnostics instruments, AI Therapeutics in the rare disease space and Liminal with what looks like consumer brain scanning.  That's quite a series of companies!   But the one closest to my heart (sorry QuantumSI :-) is, and their announcements have many interesting facets which I'll dive into.

[2024-02-06 01:41 - 'used"--> iSeq fix -- stupid autocorrect!]

New Wine in an Old Bottle

Rothberg loves the 454 story and the 454 moniker, and so went and bought all the trademarks back from Roche.  454, after all, launched the first commercial post-Sanger sequencer and also generated the first genome of a previously-named individual, controversial DNA pioneer (and knucklehead) James D. Watson (yes, everybody knew Celera sequenced J.Craig Venter, but there was a claim of anonymity at first).   Whether reviving the name was a great idea is good over-drinks conversation - already I've seen confusion as to why anyone would reboot 454 technology -- and's approach has essentially nothing to do with 454's approach.  

In the molecular biology analytics space, Rothberg has been responsible for founding 4  companies before, starting with what I'll refer to after this as 454 classic.

Raindance was the least successful, due to never finding a good market for their really cool picoliter droplet technology.  They first tried to go after target enrichment for NGS, but soon were plowed over by in-solution technologies from Agilent and Nimblegen.  Later they pivoted to digital PCR, but that market was still developing and the Raindance instrument was too big and expensive.  Perhaps there was another product pivot or two in between.  Eventually it was sold for IP to BioRad.

Ion Torrent did not achieve the lofty goals it set out to hit - we never saw a single chip generate an entire human genome - but did create a working sequencer and was enough of a threat that Illumina brought out MiSeq to compete with the PGM.  Ion Torrent was also the first non-optical detection NGS instrument, and found at least some showcase uses (if not much market) in places where optical sequencers wouldn't perform well, such as on a boat.  And Ion carved out a lucrative niche in genetic panel testing, driven by being paired with AmpliSeq multiplex PCR technology - another great example (like Nextera which I cited in my previous piece) of lock-down a complementary technology driving market share.  But even that wasn't enough - eventually ThermoFisher capitulated and licensed AmpliSeq for Illumina platforms.  Ion is still on the market, but hasn't innovated in years in terms of performance.  Indeed, it was around the time Rothberg left that performance improvement tanked; he picked a good time to leave.

QuantumSI, which I'll cover sometime in the future, is the first (and so far only) entrant in next-gen protein sequencing.  There's an important parallel between the technical approaches of QuantumSI and one pot operation.

One Pot Sequencing

Every short read sequencer to date has used a cyclic chemistry scheme.  Each base (or series of the same base for flow chemistries such as 454classic, Ion Torrent, Genapsys and Ultima) requires adding reagents, and then there are stripping steps (for reversible terminator schemes) and washes.  Then another cycle happens.  Each cycle is distinct and requires fresh reagents.  

Delivering those reagents requires microfluidic pumps and lines.  Delivery to the flowcell can be dogged by challenges of even dispersal; Ion Torrent raw signals had patterns of light and dark looking like flow lines.  The smaller Ion chips had almond-shaped gaskets which lost much surface area but had more uniform fluid flow.  Later, higher capacity chips tried to go for something more like a square to maximize area, but the corners could be troublesome -- and if reagent from one cycle hung up there it would contaminate the next cycle.  

Also troublesome is that the reagents are very highly concentrated, to drive the reaction to near completion.  But only tiny amounts of the volume actually go anywhere useful and very little of those expensive reagents actually are consumed by the sequencing reactions; most are just flushed away each cycle.  That adds to the expense. is introducing "One Pot Sequencing" - there are no cycles of reagents.  QuantumSI's chemistry has a similar notion, though that is a degradative not synthesis chemistry, but again no cycles -- the "one pot" of reagents runs the reactions continuously. is using a reversible terminator chemistry, except instead of using chemical cleavage to remove the terminator and fluorescent moieties, ultraviolet light is employed.  These "Lightning Terminators" were once the basis of LaserGen - a company that was acquired by Agilent and then somewhat quietly shuttered.  LaserGen was developing a more conventional cycling instrument, but with very fast cycle times.

Like all other optical readout short read sequencers (except 454!), employs Total Internal Reflection Fluorescence (TIRF) microscopy.  The basic idea of TIRF is that if you aim light at the right angle, by the precepts of conventional physics it will all reflect off the interface between the glass surface and the reagents above -- but in quantum land the light's field extends a bit beyond that surface - evanescent illumination.  So only a tiny volume of reagent will give any signal - the tiny volume that contains polonies.  PacBio uses a similar logic with Zero Mode Waveguides - when they are imaging fluorescence can only occur close to the surface and so the huge background of unincorporated nucleotides and released labels is essentially invisible.  

For, this is also how the removal of the fluorophore/terminating moiety occurs - a cycle of illumination with the appropriate wavelength of light to cleave the terminator moiety.  Lighting Terminators are 3' unblocked; the 3' hydroxyl is always there but steric effects prevent the polymerase from accessing it.  So the evanescent illumination is critical again - you don't want to be deblocking all the terminators that haven't been incorporated.  Of course, there will be free terminators that do get into the critical zone -- such as blank areas with no polony or diffusion into the polony and so forth.  So will build up problematic unterminated nucleotide triphosphates after every deblock cycle.  

And that is their current problem - read lengths are in the single digits because the polymerase they are using prefers the unterminated nucleotides to terminated ones, so after a few cycles of deblocking there's enough unterminated nucleotides to cause serious dephasing -- the molecules in the polony are no longer moving in lockstep.  In one of their unconventional steps (more on this below), has announced a contest for improving the discrimination of the polymerase.  If you win by delivering a much improved polymerase a year from now, you get $200K  and can patent and/or public - but grant a royalty-free license to use your invention and they have the rights to further engineer based on your design.  

Here's a plot of the most recent sequencing run described on their blog.  I'd still prefer to see the error rate plotted log scale, but in this range it's not that important a point.  Claim is those first five bases are in the phred 20-25 range.  Plotting it log scale would help resolve to what degree is that jump from 4 to 5 truly a discontinuity or is it where the trend was going.  

And from the prior post we can see the degree of color chaos going on in a cluster.  If I understand the note on the sequence of the templates, this should read TCAGG which appears to map to red, blue, green, yellow, yellow -- but the majority color here is red red green yellow yellow.  So it would appear the deprotection of the cycle 1 T (red) did not go very well (again, I'd prefer to see a log plot of the intensities!!) as we see intense red in cycle 2 and still a bunch in cycle 3 and a bit in 4 and 5 -- and the next T in the template isn't until position 10!  I'd be tempted to design the molecules at this point to go a really long ways before the first base shows up again, so could measure very carefully the degree of lagging phasing.  We see the yellow prephasing as a blip in 2 but significant in 3 -- again, it would be interesting to design some libraries where there are more positions before the first instance of a given base.  If I were working on this, I'd probably be designing a whole library of templates to use in given runs -- complexity of four in each run but various choices of design to stress-test various aspects of the chemistry.  Okay, I'm getting sucked into the challenge here...

By the way, how are polonies (or clusters, if you prefer) formed? is using circular library molecules which are isothermally amplified by something I see as rolling circle amplification (RCA) crossed with bridge amplification.  The surface has forward and reverse primers covalently bound, and initial amplification is by one of those primers driving an RCA reaction - but those products can now find other bound primers to drive more reactions and so on.  The result is a hyperbranched structure.  The reverse primers all contain deoxyuracil, so a USER reaction destroys these to loosen up the snarl.  Note that because each cluster results from multiple priming events, this style of cluster generation will make copies of copies, unlike pure RCA cluster generation on platforms such as AVITI and Complete Genomics.

Unconventional Company

Going open source is certainly not the usual strategy for a company - most companies stay in stealth mode, then get some alpha sites without revealing much to the world - Ultima's technology first ran at the Broad in a locked room that only a select cadre of Broad employees had access to or even knew existed.  Oxford Nanopore went a bit differently by suddenly destealthing at AGBT 2012, then going relatively quiet for two years, but then executing the MinION Access Program (MAP) for many researchers around the world.  

Appealing to the idea of garage hobbyists has an almost romantic appeal.  When I was about 8, my brother and father got a kit-based computer for a few hundred dollars when there were many such kits and no fully functional home computers yet.  Ours was a DATAC-1000, the input method was a series of metal pads and the only output device was a row of LEDs above the metal pads.  Well, it had a cassette tape interface for storing programs.  There were others like the KIM-1 and the Altair and some tree fruit named one we started hearing about.  For at most a few thousand dollars you could flesh these out with calculator-style keypad+display or even simple video interfaces.  From that era came many early programmers and hardware tinkerers -- big brother put together our video display (he may have even designed it).   In terms of financial outlays, a serious computer building habit cost not much different than diving into 35mm photography.

Sequencing technology seems so digital and potentially as impactful, plus there is so much burnishing of the legend of the early computers or companies like Hewlett-Packard that started in garages.  Ion Torrent tried to tap into this, but the catch was that their claimed price of $50K isn't hobbyist money - and the real upfront outlay was $100K.  It's also a lot harder to get things running the first time - with some coaching an 8-year old me could write a simple 6502 machine code program to add two numbers but making a sequencing library is a much bigger lift.  Plus all the accessories you need.  Those old computing days you mostly got by with pliers and a soldering iron (we did have not one but two oscilloscopes - didn't everyone in the 1970s have one in their house?).  But for even the most basic library prep, you must have temperature control of liquids, a complete set of pipettors, a minfuge and probably a few more bits.  

There was a build-it-from-plans sequencer called the Polonator, but at the time it cost around $200K to put one together.   Some academics did so, but it never had a large user base - and then the supplier of reagents was bought by QIAGEN and that was the end of Polonator. 

Oxford Nanopore tried to tap into the vibe with MinION, but discovered its not easy.  They certainly reaped large dividends from the MAP getting MinIONs into the hands of many early career folks who didn't have a vested interest in sustaining the Illumina ecosystem and could make a name for themselves pioneering nanopore sequencing.  So folks like Josh Quick, Nick Loman, Miten Jain, Matt Loose (and far too many I'll omit here - my apologies!) held on through ONT's initial unreliability and zigzagging platform changes and showed novel applications and strengths of nanopore sequencing and shared their protocols and software with the world.  But for better or worse, you didn't actually build any of the hardware.

Mid last decade I did have one sequencer startup offer to send me plans to build a very low cost instrument -- they claimed around $1000 in parts.  It never came to fruition - and I'm not sure sending me lab plans made much sense though I was game to try it out, but the idea has been out there before.  The NDA I had with them is probably enforceable, so I won't spill who it was (sorry!).

Now is really, truly releasing complete plans.  I'm not sure I'm capable of quite following them and I would expect that early users will be sending in pointers about where they can be improved.  But the directions are there as are the design files for the many 3D-printed parts in the instrument.  

I tried my best to price out the most expensive components - it would have been nice if this happened about two years ago as I could have asked a favor of my nephew, who was in customer support at one of the optical houses listed in the parts inventory.  So please check yourself.  The biggest ticket item is the camera, which I have at $700 -- but the description in the parts brings up many variations with variation in price range -- perhaps one thing could nail down a bit better.  There's a positioning table for $300, four different bandpass filters for $1000 total ($250 each), and 16 UV LEDs that add up to $130.  No other component seems to be more than $100, and my total is up to about $2500.  That doesn't include things that look inexpensive like nuts and bolts and such or the 3D printing, but suppose that somehow added $500.  $3K is less than it can be easy to spend on a mirrorless digital camera and a few lenses for it. So not unreasonable.

The announcement shows some signs of being rushed - one image talked about "complimentary strands" of DNA in two different locations. But worst, the original price in the store for the critical reversible terminator mix was $33,999!  That's a hell of a lot more than I ever spent on film, photo prints or photo paper! After pushback, Rothberg declared "typo"  and the price changed to $1299  - the Levenshtein distance between that and 33,999 engenders skepticism of the typo explanation.  That $1299 is said to enable 60 runs.  That was the advantage of the old kit computers - once you bought them the only consumable was blank cassette tapes, which were cheap, reusable and you didn't need many. 

You'll also need the sequencing reservoir components, $49.99 for a pack of 5.  As my colleague (and synthetic biology legend) Tom Knight pointed out on X/Twitter, the finishing directions for the reservoirs aren't for amateurs, involving several solvents and something he called "piranha".  Yikes! AFTER WRITING THIS I LEARNED THAT PIRANHA IS A MIXTURE OF HYDROGEN PEROXIDE AND SULFURIC ACID WHICH CAN BURN HOLES THROUGH FLESH! NOBODY WITHOUT FORMAL CHEMICAL TRAINING AND PROPER PROTECTIVE GEAR SHOULD BE USING PIRANHA! Rothberg responded that they'll work towards something more consumer friendly.   So that brings running costs to about $33 per run - $23 dollars of terminators and $10 for the reservoir (note: unlike their store, I like to round!)

What's The Market?

In the short term, you'll need to be very interested in playing with a novel technology that doesn't really do very much.  The early kit computers, and even the first all-in-one home computers such at PET and TRS-80, weren't much better - you could learn to program or you could play games, but not very much else no matter how many journalists tried to claim it could organize recipes or help prepare taxes -- that was all long in the future (and who organizes recipes when you can just google them?).  With the current tiny readlength, you'll be very hard pressed to get much interesting biology out - though it will certainly be hyped as "go explore the biosphere".  With custom sequencing primers, it should be possible to make those 5 or so bases count, but the current don't make the process for designing such obvious.

It may well be that many people will be excited to through their machine learning skills at trying to improve the basecalling and deal with the rampant phasing.  It sounds like some diversity of the initial bases is required for good cluster finding - a common aspect of systems using unpatterned flowcells - but it would seem that a set of templates could be cloned (don't want oligo synthesis errors confounding) and sequenced conventionally, then used as targets.  Each one should have the first 5 bases as a barcode that defines the rest of the sequence -- I'll leave figuring out the number of distinct barcodes with different Hamming or Levenshtein distances as an exercise for the reader..  Perhaps some big open repositories will be built for data, to increase the training set sizes.  And perhaps an advanced machine learning algorithm could make the downstream snarl useful for some basic tasks like "which RNA virus" is this -- at least if you only want "flu A vs COVID vs RSV" level differentiation.

Longer term, if the phasing issue can be solved and reads could get in the 50 to 100 range, I could imagine diagnostics applications.  Maybe.  It will depend in part on how many reads per floral - I didn’t see that described but it’s likely to change over the evolution of the chemistry and software. Clearly some evolution of the hardware will be required to get to the goal of $100 a box. 

The low end sequencing market has neither gotten much love nor been very successful. Ion Torrent pretended to go for this market, but an all-in upfront cost of $100K isn’t hobbyist territory. Genapsys officially launched their $20K solution, but I never saw one in the field nor ever heard from a customer - and they’re gone. Illumina has iSeq, but that’s instrument has never seen upgrades and is all but ghosted by Illumina management

But the market does have MinION, a fully functional sequencing device you can get for $2K (not a typo!) that’s really works now. In theory the Flongle (another $ for the adaptor) gets run costs in the low double digits. In the home computer market, the arrival of fully featured machines was the death knell of the kit computers, only much later revived with concepts like Raspberry Pi

So how many people will buy in to the concept early? During the MAP, ONT saw significant attrition in the user base because the platform was still very buggy and reagent availability was erratic. Others were hooked;  for me it was seeing in our first run one very noisy but alignable 48 kilobase read with the entire lambda phage genome. Maybe some novices who build a will have a similar epiphany with their first sequence data, but I'm skeptical it will be a wide-spread phenomenon -- though it would make be happy to be proved wrong on that point.

Unconventional operations, an innovative technology at proof-of-concept and appealing to tech hobbyists -- should be fun to watch even if they don't succeed in carving out a major presence in the sequencing technology landscape.

[2024-02-06 Added warning about piranha]


Anonymous said...

This piece is educational

Anonymous said...

Have to raise Lynx as the first post-Sanger. Yes they weren’t very successful but they did ship instruments.

Anonymous said...

What’s the commercial plan. Nothing is really free to paraphrase Javier Milie, so how does an investor in this company get a return ?

Anonymous said...

Uh if this is what the "piranha" is meant to be this is not something even a hobbyist should be allowed, let alone encourage, to work with:

Anonymous said...

now my recollection of looking at Lightening Terminators years ago was the light based deblocking will always have bad phasing relative to bulk chemical deblocking because light will follow a different probability distribution. after say 1s of illumination 50% of your mols may be deblocked, then 5 mins 50% of whats left, then 30 mins 50% of whats left, its hard - or slow to drive to completion. These data somewhat bear that out. more light ? more time ? but flurophores, for example, sometimes just don't want to shine at all. Helicos needed two reads due to 'dark bases'.

Dale Yuzuki said...

What a walk down memory lane, to be thinking about the TRS-80 and the Apple I and I have never heard of the DATAC-1000. Yikes that must have been quite the DIY with soldering iron in hand...

Plenty of DIY biologists around, however that 'makerspace' idea never was able to get a lot of traction. I got involved in something oh 10 years ago in the mid-Atlantic, there was a decent one in Baltimore but it takes a huge amount of time, effort and fundraising to get something to actually work. As a matter of fact we spent most of our time trying to figure out financial sustainability. It's just too big a barrier to overcome.

Going the DIY route for sequencers seems like much too far a stretch. And when you are talking about a Piranha reagent, that sounds terrifying.