Wednesday, September 21, 2022

Notes From Coffee With MGI

A couple of weeks ago  I sat down for coffee with a pair of MGI representatives - American Region CEO Yongwei Zhang and Director, Global Business Development Damon Zhang. Since I hadn’t been at AGBT 2022 (my 2023 application already filed!). Yongwei and I had planned to try to catch up the next time he was in Boston area, so I braved our current subway issues (not one, but two major lines shut for extended maintenance!) and covered a range of topics.

Tuesday, August 23, 2022

SRA Entries Should Not Ever Disappear Into Thin Air

I ran into an annoying problem last night and was quite steamed, but had the discipline to wait until morning to vent publicly about it.  Now I'm more in a morose mood on the subject, not furious but still quite frustrated. The quick version of what happened is I'm belatedly trying to go through some nicely documented reproducible analysis code to explore some concerns I have with the analysis, and the code is working on an SRA entry -- and that SRA entry is the entire point of the analysis. And that SRA entry which I know once existed now doesn't - other than this code and the preprint to go with it, it's as though it never existed -- which is terrible.  And I'm irritated with everyone who contributed to that terrible result, starting with NCBI

Tuesday, August 16, 2022

Supply Stall Slows Singular

Singular Genomics reported earnings last week and delivered an unpleasant surprise: inability of suppliers to make timely deliveries of key (but unspecified) hardware components have slowed G4 instrument production to a very slow crawl.  Given the lively competition in the desktop short read space, this is a serious setback for Singular's commercial launch.  

Thursday, June 23, 2022

AGBT 2022: Overhanging Questions

AGBT broke up a couple of weeks ago and I've failed to write anything here so far.  It was frustrating not attending, but not registering for a meeting in February seemed prudent given the pattern of COVID waves - I hadn't considered (nor would have wanted to bank on) AGBT organizers reacting so well and rescheduling the meeting.  It sounds like a number of attendees did catch the virus at the meeting -- though I'm presumably still quite protected by my infection a month earlier.  Anyways, I'm going to organize this around one to two questions that hover in my head for the different sequencing providers.  AGBT also had a strong spatial angle, but I feel ill-equipped to cover that in the absence of being on the scene -- I don't work with spatial data and so don't have a deep feel for it.  As always, please flag me here or on Twitter or by email for any errors I made -- or any juicy sequencing company gossip you wish to share!

Wednesday, June 08, 2022

Admin: Feedburner to Follow.it Switch

A bit over a year ago Google made one of their dreaded announcements that they would be slowly killing off one of their acquisitions, in this case FeedBurner.  Well over a thousand of you have been using FeedBurner to follow me via email.  Follow.it has a wonderful free plan that can take over all of the previous functionality and I could just import the old subscription list

Tuesday, May 31, 2022

Ultima Genomics Storms Out Of Stealth Promising $1/Gigabase Short Reads

To date, the new entrants targeting Illumina’s short read business have been aiming at the middle of Illumina’s range, trying to take on NextSeq.  Element Biosciences is touting high accuracy for a low price.  Omniome (now PacBio) also has positioned itself to tout accuracy.  Singular Genomics is claiming to enable great flexibility and fast runs.  But all aimed at NextSeq.  As part of the run up to AGBT another company is decloaking from stealth mode: Ultima Genomics, however they are going not after NextSeq but full throttle after Illumina’s pinnacle, the NovaSeq running the S4 flowcell.  The value proposition is a large sequencing device that delivers S4 output at S1 prices for an overall cost of $1 per gigabase.  Note that the interview for this piece was conducted under a CDA and Ultima reviewed my copy for accuracy and to ensure I didn’t disclose anything they had marked confidential.  They were nice enough to offer to have me fly out to their facility, but I was forced by the damn coronavirus to cancel those plans the night before the trip. A preprint summarizing the technology is also out in bioRxiv.  A trio of additional preprints have popped up as well, describing its application to generate a huge methylation sequencing dataset around colorectal adenocarcinoma, a huge Perturb-Seq dataset and for large scale single cell RNA-Seq.


Ultima isn’t planning on truly launching until early next year, but they’re well on the way with paying early access customers.  Indeed, AGBT will feature multiple posters and talks describing the use of the Ultima instrument for a variety of genomics tasks.  And Ultima is confident that their architecture will support significant increases in future throughput, enabling per base costs to go even lower.


Ultima’s chemistry is flow based - using unterminated but fluorescently labeled nucleotides.  Only a fraction of the nucleotides are labeled in each reaction, reducing the reagent costs and minimizing molecular scar accumulation.  The reactions take place on beads whose templates are amplified via emulsion PCR - though for all the ePCR-haters out there Ultima will include a fully automated benchtop ePCR robot.  Once primed, the beads retain the DNA polymerase, so this expensive component can be conserved between flows.  The instrument is a single end reader – no paired ends – but substitutes for that by reads with a modal read length of around 300 bases, which should be enough to plow all the way across most short read inserts and their associated molecular indices.


The use of unterminated nucleotides has typically meant challenges in resolving homopolymers.  Ultima is tuning their system to call homopolymers of up to 12 bases; via discussions with customers and their own experience accurate counting of longer homopolymers is deemed insufficiently valuable to focus on vs. other design tradeoffs.  


But Ultima has found several ways in which unterminated flow chemistry can either have its weaknesses ameliorated or become downright boons.  First, while it can’t accurately measure long homopolymers it can go straight through very long ones in a single extension cycle – so poly-A tails in cDNA ends can be easily blitzed through.  This helps ensure reading all the way through inserts of things like single cell libraries.  Second, for short homopolymers Ultima embeds in the Q-scores a probability matrix of the length – basically the odds of minus one and plus one versions of the sequence.  This is leveraged by their customized version of GATK, developed with the Broad Institute.  Third, is a clever approach of “cycle shift variant calling” that I’m still stunned has never appeared in the literature for any other flow chemistry – 454, Ion Torrent or Genapsys.  Cycle shift uses the known order of flows to increase the confidence in variant calls – particularly variable for low coverage data such as cell-free DNA.  


Another key driver of low cost and high density is the use of a spinning, open “flow cell” (really a 200mm diameter wafer) for both reagent addition and imaging.  Centrifugal force generated by the spinning (fake force, ha!) distributes the reagents as a very thin film, minimizing wastage.  Imaging as the wafer spins enables shooting many tiles without having to repeatedly accelerate and decelerate the flowcell as a rectilinear scanning scheme must do.  The speed difference adds up: Ultima can generate in 20 hours the same 3 terabases (10 billion reads of roughly 300 bases each)  as a NovaSeq S4, but an S4 requires 44 hours to run – and Ultima believes they can shave that down to 16 hours.  Faster cycle times means more runs per instrument – and each instrument runs two wafers simultaneously, each with its own chemistry station but sharing imaging path  The instrument features tanks for reagents which can be refilled, with a 24 hour capacity of each reagent.  Six different wafers can be queued for running, with built-in automation removing spent wafers and swapping in new wafers with new library pools.  


How might the system grow its output?  The patterned wafers place the beads at a very conservative pitch.  Larger diameter wafers are also a possible further option. Extending the read lengths is yet another possible expansion direction.


The instrument has onboard GPU compute power, which is currently used for basecalling and alignment and could ultimately also perform the variant calling work.  


Current accuracy is 0.1% error for substitutions and 0.5% for indels.  Most of the indel error is concentrated in homopolymers greater than 8, with calling capped at 12.  When used with the specially modified GATK co-developed with the Broad, or other custom DeepVariant or Sentieon pipelines, SNP calling accuracy of 99.7% precision, 99.7% recall is achieved and indel recall and precision range from 96-98% for small indels (excluding long homopolymers and low complexity regions).  Accuracy suffers in low complexity regions, which Ultima believes is an amplification chemistry not sequencing chemistry issue and they believe they can significantly improve on the current performance.  


Ultima plans to offer their own kits for PCR-free and PCR-based sheared genomic libraries.  Libraries for other systems can be converted by a simple indexing PCR scheme - this has been done for TruSeq libraries and proof-of-concept experiments have been run for Nextera libraries.  


What could be done with such an instrument?  A pending publication uses Ultima and Illumina in parallel on the same 4 million cell Perturb-Seq experiment and finds the results equivalent between the platforms.  A large fraction of the Phase IV ENCODE HiC data was generated on Ultima.  An internal proof-of-concept experiment utilized deep sequencing RNA from COVID-19 infected samples, recovering complete viral genomes after only ribosomal RNA depletion.  One of the AGBT abstracts demonstrates the ability of Ultima WGS to detect minimal residual disease at low levels by deep WGS of cell-free DNA, an approach academia and startups are actively exploring.  Additional AGBT abstracts describe population genetics studies, oncology, and rare disease sequencing.  Ultima has 10 paying Early Access customers, with 7 instruments installed to date – and these run the gamut from large academic genome centers to biopharma to government labs.  They hope to have “well into double digits” customers at the time of the official launch.


To get here Ultima has raised over $550 million dollars and hired over 350 employees.  The company has made steady progress from their start in 2016.  . Ultima CSO Doron Lipson previously was part of the teams at Helicos and Foundation Medicine, so he has extensive experience both in building a sequencing platform and applying it at scale. CEO Gilad Almogy has spent many years in the semiconductor manufacturing field - Ultima’s reaction wafers are patterned atop silicon substrates and the semiconductor industry also uses very high precision optical methods for both manufacturing and quality control.  


Illumina for a long time now has had an unassailed position as leader in sequencing in the US market as well as others.  Now that position is under pressure from all sides: Element and Singular are trying to squeeze the NextSeq market while Ultima is aiming for the top; Oxford Nanopore thinks their “short fragment mode” can compete as well and the patent shackles are being lifted from BGI.  At JP Morgan in January Illumina said their “Chemistry X” would offer improvements in accuracy, read length and output, but absolutely no details have been forthcoming – and in particular whether new instruments will be required to access Chemistry X benefits.  Perhaps the entry of Ultima and the others will add some urgency to Illumina communicating their future plans, lest customers start planning in earnest to opt for the new platforms


For we consumers of sequence data, more competition and lower prices are a pure good. Projects can continue to be increasingly ambitious and simply the number of different phenomena which can be converted into a sequence measurement constantly grows.  More for less is never, ever going to become boring – it will always be enabling.  After a long period of very shallow slope in the notorious “better than Moore’s Law” slide, we appear to be entering a new period of plunging sequencing costs.  Time to start making plans to take advantage of it!


[20220608 corrected really embarrassing millions typo (should have been billions) which has been requoted all over Twitter]

Monday, May 23, 2022

London Calling 2022: Peptide Sequencing

London Calling was last week and Clive Brown's big revelation was a peek at Oxford Nanopore's progress on enabling peptide sequencing on the platform.  Peptide sequencing and identification is a hot area right now, with multiple startups looking to provide alternatives to mass spectrometry approaches.  Clive stressed that the technology is very early in development.  It's definitely a clever fork of the existing DNA sequencing technology.  However, it also illustrates a significant organizational challenge which Oxford. So I'm going to spend a post focused on this while I figure out how to slice up the rest of the meeting.