A computational biologist's personal views on new technologies & publications on genomics & proteomics and their impact on drug discovery
Friday, December 31, 2021
Reflecting on Anniversaries and Changes
As the year closes out for me (as I write this, it may well have closed out for some of you!) I'm reflecting on some anniversaries that were concentrated in this year, particularly those that are multiples of an early evolutionary developmental decision millions of years ago.
Monday, December 13, 2021
ONT Community Meeting 2021
Oxford Nanopore held their annual Community Meeting online at the beginning of this month. As is typical for this stage of the ONT news cycle, most topics were confirmations and updates of earlier projections, with little brand new material. There was one surprise, a new concept for running nanopore with little to no auxiliary lab equipment. Oh, and perhaps in the surprise category is Oxford appears to be finally moving away from the R9 pore which has been their mainstay for many years now.
Tuesday, October 26, 2021
A Look at Two HiFi Polisher Preprints
PacBio has made its reputation delivering very high accuracy long reads, which they have branded HiFi. These are based on their circular consensus technology: each template DNA molecule is converted into a single continuous circle of DNA which can be read in a rolling circle reaction. The "movie" is converted to raw base calls and the adapters are clipped out, leaving "subreads" which can be aligned together to generate a consensus (CCS) read. With many passes over the same molecule and its complement, the relatively high (~15%) error rate of the raw data can be brought down substantially using an HMM-based scheme. PacBio calls reads HiFi at 1% error rate, but their model calls overall quality for reads and it can keep getting better from there. Homopolymers still bedevil the technology, though not like they once did and it turns out there is at least one more systematic error class. Consensus building is a powerful way to cut through error. But could you do better? Two recent preprints from large tech companies, with PacBio co-authors, apply deep learning to this problem and each comes up with the astounding result that they can do a bit over 40% better.
Wednesday, August 04, 2021
PacBio Pulls Down Circulomics
I was on vacation early this week when the news broke that PacBio has acquired HMW DNA solid phase extraction kit maker Circulomics -- the kind of vacation that I need where the scenery is gorgeous and the internet access terrible. Where solid phase means monumental slabs of granite with diabase intrusions being attacked by a high salt liquid phase. Where I actually sighted Atlantic Puffins and didn't once think about sequencing their genomes ('til now!). But now I'm back to work and genomics.
Tuesday, July 20, 2021
PacBio Enters a Binding Agreement to Acquire Omniome
Pacific Biosciences announced today that they are slurping up short read sequencer startup Omniome for around $800M. Omniome has been developing an interesting clonal read technology. On the conflict-of-interest side, many years ago (and I think an entire management team different) Omniome treated myself and my family to a weekend in San Diego (it was my son's birthday weekend) so I could look at their technology back then -- my NDA has expired but so has most of my memory of what I saw at that meeting! Also the periodic reminder that PacBio Christian Henry sits on the board of my employer, though we haven't met. Simon Barnett of ARK Investments (which is a major holder of PacBio stock) has a very nice explainer on the Omniome Sequencing-By-Binding (SBB) chemistry and his bullish perspective on the acquisition and there is a proof-of-concept publication of the technology. I'll briefly explain the tech and then outline my somewhat more bearish view. It's also interesting to note that the FTC's actions on Illumina-PacBio and Illumina-Grail have analysts jumpy about this acquisition attempt.
Tuesday, June 29, 2021
ONT Sketches Paths to Long, Selective, Accurate Sequencing
Some sort of summary of London Calling in this space is grossly overdue after getting caught by multiple work firedrills and then several recursive rounds of procrastination. I'm not going to attempt to cover all the company announcements. I'm going to focus on a cluster of announcements that show a long range vision of inexpensive sequencing consisting of very accurate, very long reads. Well, a cluster of visions -- some parts can be mixed and matched and others cannot. This should be a prospect to grab the attention of any current or aspiring ONT competitors. Now before I'm accused of being a gullible shill for Oxford, I want to make it clear I think that running the table on these will be technically difficult and is many years in the future. But even if Oxford manages some of these but not all, they would substantially upgrade their platform.
Wednesday, June 02, 2021
New Clinical Human Genome Speed Record
I proposed last year that there should be a regular racing event for human genomics. The only real competitor in is this interesting race seems to be Steven Kingsmore's group at Rady Children's Hospital. I was sent an embargoed press release from Illumina about a new record by that group, which clocks in at 13.5 hours from patient sample to clinical report. A New England Journal of Medicine paper (hence the embargo, ending just before I post this) reports on the advance but wasn't in the packet I received.
Tuesday, May 25, 2021
Matt Meselson Needs a Biographer!
Yesterday was Matt Meselson's 91st birthday. I have only met him a few times and he wouldn't know me from Adam, but he is a particularly interesting individual I've had the good fortune to converse with. I'm putting out a plea now for a skilled biographer to write his life, because it certainly has been an interesting and impactful one, with scientific work stretching from the early beginnings of molecular genetics to a preprint just recently posted on BioRxiv.
Thursday, May 20, 2021
My Latest London Calling Thoughts
The title really says it -- London Calling has actually already begun and here I am pretending to write a "before the conference" piece. Of course, since everything is virtual again this year I can actually do this since I haven't watched anything yet nor have seen any tweets -- and the big technology announcement section isn't for a few hours so I have loads of time to write! Sadly, nor have I gone and looked at what I've written before. Nor have I defended these two days very well - my schedule is cluttered with meetings and appointments. So I haven't prepared in any way, shape or form -- but here goes some thoughts.
Sunday, April 25, 2021
GISAID Broken Down by Sequencing Hardware
The GISAID database has been the workhorse for storing and distributing SARS-CoV-2 sequences during the COVID-19 pandemic and recently passed one million entries. There was some Twitter chatter wondering about the hardware breakdown for this, as it isn't really easy to get out of GISAID. I had done a somewhat arduous partial take at this for my VIB talk last month, but in the meantime GISAID had granted me some additional access to metadata which I've been too busy to tackle. But knowing some others were curious, time to dive back in.
Tuesday, March 02, 2021
AGBT21: VizGen Unveils MERSCOPE
More spatial profiling news coming in from AGBT -- Harvard spin-out VizGen is launching in the U.S. an instrument implementing MERFISH technology. This sub-$300K instrument will initially enable panels of up to 500 genes to be profiled, with plans to expand that capacity to 1000. Users either pick from a menu of pre-designed panels or select genes using a Gene Panel Design Tool and VizGen would proceed to manufacturing the panel in around two weeks. VizGen CEO Terry Lo and Senior Director of Marketing Brittany Auclair were kind enough to give me a preview last Friday.
Monday, March 01, 2021
AGBT21: The LabRoots Presentation Platform is an Unmitigated Disaster
Rant is ON! I've been having an utterly miserable experience with the LabRoots conference software that AGBT is using for their virtual meeting. This year has exposed many of us to a wide variety of teleconference and virtual meeting software and many of the glitches are small and hard to pin down. Or matters of personal preference (though if you don't share mine, you are simply wrong!). But now on two major platforms I've come across major issues with LabRoots
AGBT21: Rebus Esper for Spatial Sees Things You Wouldn't Believe
My prediction that spatial would be a hot topic at AGBT was easy to make knowing I was sitting on embargoed news in the spatial space. This morning Rebus Biosystems announced the launch of the Rebus Esper system for wide field spatial profiling of gene panels with subcellular resolution. Rebus is promising that this instrument will offer true walkaway automation from fluidics through imaging, and data processing, requiring only one hour of hands-on time.
AGBT21: A Few Pre-Conference Mutterings
Getting some miscellanea out before AGBT21 starts later this morning
Sunday, February 28, 2021
AGBT 2021: A Spatial Foundation
I'll call it now -- the big buzz at this year's AGBT will be around spatial profiling. Trust me, it's not just a hunch. The two current players in the field -- nanoString and 10X Genomics -- both have significant presence in the virtual conference. Don't be surprised to see more players on the field -- just sayin'
Saturday, February 27, 2021
PacBio With SoftBank's $900M: How Might TheyWork?
Pacific Biosciences continued its roll of successful business development, snagging $900M from Japan's SoftBank two weeks ago. Combined with a recent secondary stock offering and a major deal with Invitae, PacBio has gone from their self-proclaimed near-derelict status during the Illumina acquisition attempt saga to rolling in cash.
Friday, February 26, 2021
More Details on 10X's Sample Profiling Trident
10X Genomics had an online event Wednesday called Xperience (as far as I could tell no Jimmy Hendrix music was used, a missed opportunity!) to lay out their development roadmap. This largely paralleled the presentation given at J.P. Morgan, but there were a few new bits and of course much more technical detail to whet the appetites of scientists -- and judging from a number of very positive tweets I saw today they were successful in that goal. Some of the 10X management was kind enough to walk me through the deck earlier this week as well as permission to borrow images from it, so this summary is based on that as well as watching the presentation. While their name is 10X, the company emphasized progress on three axes: scale, resolution and access and that progress across the three different platforms.
Tuesday, February 09, 2021
Could I See Myself at J.P. Morgan?
There's a question that others pop my way pretty much every year around J.P. Morgan: would I ever attend myself? I'll confess it never occurred to me before I was asked, but that isn't necessarily a deal breaker. I foolishly didn't attend AGBT until 2013 when Alexis Borisy (then CEO of Warp Drive) suggested I go -- I think it was mostly because he thought it was a good investment and probably only secondarily to keep me off the ski slopes for a week -- I shattered my knee just after AGBT 2012 ended. It's an interesting but complex question which I will answer one way here, but freely admit that over coffee I could be nudged one way or the other.
Monday, February 08, 2021
Why I Hated One Genapsys Slide
I claimed in my Miscellanea piece that I was one post away from being done with J.P. Morgan -- oops, forgot I had drafted a minor screed on data display which I'll push out before the last piece - particularly since I hinted I would be taking Genapsys to task on this subject. Unexpectedly good timing too: maybe new Genapsys CEO Jason Myer's first big initiative can be to fix this plot!
Saturday, February 06, 2021
J.P. Morgan: Miscellania
Before J.P. Morgan is truly a month ago I should clean up some loose ends as a penultimate post driven by this year's virtual conference (the last post isn't exactly time sensitive). In contrast to the single company focused items that preceded it, this is a grab bag of minor observations and notes.
Thursday, January 28, 2021
J.P.Morgan: NanoString
Almost done with my J.P. Morgan summaries -- this will be the last focused on a specific company: nanoString. They wish to emphasize that they are becoming the company for spatial analysis of DNA, RNA and proteins in biological samples. They also want us to differentiate that space into two segments: profiling and imaging. Profiling gathers spatial information from regions of multiple cells; imaging in their lingo covers spatial techniques with single cell or subcellular localization. In both cases nanoString is betting heavily on oligo-tagged antibodies to enable deep multiplexing of protein detection to be integrated with RNA and DNA detection.
Monday, January 25, 2021
J.P. Morgan: Genapsys
Genapsys' J.P. Morgan presentation by CEO Hesaam Esfandyarpour focused on their story of delivering a compact sequencer based on electronic detection that offers low capital, low cost sequencing. There were two bits of specific product news, but mostly general painting of a rosy picture.
Tuesday, January 19, 2021
J.P. Morgan: PacBio
PacBio CEO Christian Henry’s presentation at J.P. Morgan wasn't rich in technical specifics. But he gave a very bullish portrait of a company aiming for the stars. A conflict reminder: he’s a member of the Board of the Strain Factory that employs me, though I haven’t yet had the pleasure of meeting him.
The biggest news is a broad partnership with Invitae four clinical human genome sequencing. The only specific here is that this is not the whole enchilada; platform development will take place both within the Invitae collaboration and outside it. What might that development be?
Between Henry’s comments in the Q&A and a few info crumbs on slides there will be pushed to further tune all the canister. Her mentioned efforts on dyes and further improving SMRTcell loading efficiency. There was chatter on Twitter about an overdue update to improve HiFi yields.
Henry talked of the importance of increasing ZMW packing, but gave no specifics other than to suggest this is more "development" than "innovation" -- this was in response to a question asking if technical breakthroughs are required. But we are left wondering on a timetable as well as what the next density might be; four-fold to 32M wouldn’t be surprising on naïve geometry grounds.
I suspect a huge area of joint effort with Invitae will be to automate HiFi library production. The current protocol is long, manual and labor intensive - not at all appealing for lease scale clinical use. How much of that will be retained as proprietary to Invitae will remain to be seen. Henry claims that the Invitae effort will be separate but coordinated with existing development efforts; prior plans have not been shelved or diverted to support Invitae. A major software effort to support clinical operations is a given. PacBio has separate workflows for SNP and SV calling and those must be integrated and a clinician-friendly report generated.
Henry believes that the new Sequel IIe will be the dominant product shipped going forward. It will be interesting to see which of the older workflows PacBio updates and moves into the on-board compute. For example, if you want to call methylation you must export BAM files with kinetics data, which are predicted to be five-fold fatter. If the methylation calling happened on board, then that extra processing and extra data would be eliminated.
Similarly, workflows such as microbial assembly are still based around Continuous Long Reads (CLR). Henry didn't mention CLR once (I think). While I doubt they would ever dump it altogether like they did Strobe Reads, it would seem likely that it won't get much attention. Oxford Nanopore can beat them on very long reads and their single molecule accuracy is much higher; far better to focus on the CCS/HiFi reads where PacBio can deliver much higher accuracy. It will be interesting to see if PacBio pushes the HiFi fragment read length longer. On the one hand it will be more challenging to work with longer fragments and to routinely get enough circuits around them to deliver HiFi quality data. Twenty five kilobases is a nice size for many applications, but there will always be incremental value for going to thirty or forty or beyond.
In response to a question about $1000 genomes, Henry described it as "just a number" around "where it makes sense" in high throughput applications. He says the Invitae collaboration will be able to drive prices below $1000. But he also pushed the idea that a PacBio genome is a truly clinical grade genome and has higher value than genomes produced on other platforms. He argued that this higher value, in terms of higher diagnostic yield for rare diseases, will be more attractive to payers and that there will be a net benefit to the healthcare industry by ending diagnostic odysseys sooner. He vowed to continue generating "diagnostic proof statements" to provide evidence to support the higher value claim.
Should be interesting to watch, particularly if you have a front row seat in front of a Sequel IIe,
Saturday, January 16, 2021
J.P. Morgan: 10X Genomics
As I attempt to collate various incomplete thoughts about the J.P. Morgan presentations I have read and listened to from genomics instrument shops, one thing stands out about 10X Genomics: they actually announced new gadgets and kits! I should thank the company for supplying the slides after I snarked on Twitter about how they weren't archived in the J.P. Morgan webcast -- but now it is there. So either my eyes failed again or I had a personal IT failure (I think the website doesn't like iOS and I may have forgotten that). The slides were presented by CEO Serge Saxonov
Thursday, January 14, 2021
JP Morgan: Illumina
Illumina presented at J.P. Morgan on Monday, reminding us that they aren't just a sequencing instrument company but an interlocking set of businesses focused on genomics. CEO Francis deSouza spent much of his time discussing the Grail acquisition and some of the other ways in which Illumina is pushing rapidly to become an essential part of clinical medicine, but there was one slide on future improvements to sequencing technology and a few on the lineup of existing sequencers. Reminder: I'm working off public sources, as during the day we work closely with Illumina and they even sunk some serious cash into my employer last May.
Monday, January 11, 2021
J.P. Morgan 2021
The J.P. Morgan Healthcare Conference has started this morning in virtual form, so I'd really better get this draft cleaned up and out (indeed, Roche is presenting as I hurriedly type, though about pharma not diagnostics). 2021 already feels like a darker continuation of 2020, between the appalling putsch attempt in my nation's center of government last Wednesday and the still buggy roll-out of the coronavirus vaccine. As I noted in my piece on the Oxford Nanopore Community Meeting, the many disruptions of 2020 make grading the progress of companies essentially impossible: many were disrupted by lockdowns, supply chain issues and the general distraction from the year of doomscrolling.
Sunday, January 03, 2021
Advent of Code vs. FizzBuzz
A bunch of coding types at the Strain Factory participated in The Advent of Code, a clever 24-day set of programming challenges that runs each year before Christmas. Each day a new two=part programming challenge was posted. Technically it is a speed contest, but you won't find me on the public leaderboard as I'm not nearly quick enough to ever rate a point there. One of my major official activities last month was contributing towards screening candidates for three different computational positions, one of which we threw open to general data science experience. As a result, I've been thinking far too much about the FizzBuzz problem and my prejudices towards it.
Saturday, January 02, 2021
Peri-New Year Nanopore Playing
Ever since the community meeting I've been toying with an idea, then never quite trying to code it.
So on New Year's Eve I started getting the dataset together and reducing it to a bunch of dataframes, and today I pushed that a bit further and started graphing some of it. It's very much a rough project -- some of the dataframes have some issues I'm still chasing down with redundant data not being initially collapsed, but I think the data is accurate. I also think I have my conventions consistent -- at one point confused myself into inverting the labels on the plots! In other words, ApG would be labeled GpA -- not good! There's already some intriguing patterns, which are presumably the sort of signal tools like Medaka use to polish assemblies from FASTQ data aligned to draft references.
So on New Year's Eve I started getting the dataset together and reducing it to a bunch of dataframes, and today I pushed that a bit further and started graphing some of it. It's very much a rough project -- some of the dataframes have some issues I'm still chasing down with redundant data not being initially collapsed, but I think the data is accurate. I also think I have my conventions consistent -- at one point confused myself into inverting the labels on the plots! In other words, ApG would be labeled GpA -- not good! There's already some intriguing patterns, which are presumably the sort of signal tools like Medaka use to polish assemblies from FASTQ data aligned to draft references.