Wednesday, January 22, 2025

Illumina & NVIDIA Team to Remake How to Train Your DRAGEN

If you've been in a movie theater recently, you may have seen a trailer for a mixed live action and animation spectacle called How To Train Your Dragon.  Having seen the purely animated original - and wished I had gone to a 3D showing as the flight scenes must have been amazing - it was a bit unsettling, as the animated dragon in the new looks exactly like the one in the old.  It's apparent a shot-for-shot remake of the original, but this time with live human actors.  So effectively a port of a script from one cinematic language to another.  In a similar vein, at last week's J.P. Morgan Conference, Illumina and NVIDIA announced they will start porting Illumina's DRAGEN applications onto NVIDIA GPU hardware. 
DRAGEN is Illumina's suite of hardware-accelerated applications.  Early on they made gaudy claims of enabling much faster human genome variant calling than other pipelines and also claimed their closed source code (the fact it is closed and therefore "trust us on quality and edge case handling" in nature has raised some hackles) has performed very well in variant calling head-to-head competitions for sensitivity and specificity.  Not that the rest of the field has stood still and others claim similar speed and calling quality - and many of the newest tools - such as DeepVariant - can be substantially accelerated by running on GPUs such as NVIDIA's. 

Currently DRAGEN uses Field Programmable Gate Array (FPGA), a specialized sort of chip.  One can order different types of silicon logic on an axis from most general to most task specific. The CPU inside the laptop I'm writing this on represents the most general purpose and therefore most flexible; GPUs are a bit more limited but are better at those more limited tasks, FPGAs are still more limited but even more potent and Application Specific Integrated Circuits (such as found in Oxford Nanopore devices) represent the ultimate in specialization - fully optimized for a single task.  More specialized enables more of the limited number of transistors to be devoted to the task at hand. 

However, there are huge network effects with the less specialized hardware.  If I want to program a CPU, there are far more available programming languages and domain-specific libraries than one can ever hope to wrap ones head around.  One of the secrets to NVIDIA's success has been the development of the CUDA environment as well as downstream libraries such as PyTorch.  It doesn't hurt that GPUs keep being well-suited for a new hot task - gaming then cryptocurrencies and now AI. 

It's also the case that GPUs are big money with huge demand, driving NVIDIA to continue innovating and also to put their GPUs on the latest silicon architectures with the highest densities of transistors. FPGAs and ASICs can't command the volumes to justify the more expensive processes - and the more profitable CPU and GPU makers are probably tieing up all the available supply of the higher densities anyways.  So even if an FPGA has a theoretical edge in the fraction of transistors working on a given problem, GPUs may just have so many more transistors available that the edge is minimal to nonexistant in real life.

Illumina acquired the DRAGEN technology by buying Edico Genomics in 2018.  Illumina pushed it forward into many aspects of their bioinformatics pipeline and now DRAGEN FPGAs are built into the newest Illumina boxes.  There are some small downsides - one catch about FPGA is that FP part - the field programming - is not a fast process; switching from the FPGA configuration to support demultiplexing to that for human variant calling isn't nearly as fast as just firing off a new process on a CPU box.  It's also easy to imagine that Illumina draws from a very small community whenever it wants to hire experienced FPGA coders - and outside Illumina there's at best a handful of academics or other companies trying to implement bioinformatics on FPGA.  And if you want to give options for users to use cloud compute instead of buying their own hardware, there's definitely GPUs in datacenters far more abundantly than DRAGEN FPGAs. Embedded compute is a major cost component of modern sequencers, so riding the the current wave of GPU innovation enables more flexibility and likely lower future instrument prices.

Some of this is familiar.  In the late 1990s, there were three companies building specialized hardware accelerators for modified Smith-Waterman and Hidden Markov Model searches.  We evaluated all three at Millennium and purchased the one then later a second from Paracel, each at around a quarter million.  If my memory is correct, Paracel was ASIC based - they had been originally funded by three-letter federal agencies to analyze communications traffic -but TimeLogic and Compugen were FPGA-based.  Compugen successfully pivoted from informatics to therapeutics development; the other two companies are gone with the wind.

NVIDIA has now wrapped up most of the sequencing instrumwnt makers - Oxford Nanopore was first, which was logical because they invested in machine learning heavily the soonest since their basecalling problem is the most complex.  PacBio was next in line, with GPUs built into Revio.  Element touts their integration with NVIDIA Parabricks tools.

NVIDIA is still a company to watch - for me particularly since big brother works there - it was no surprise to unwrap a copy of the new book The NVIDIA Way this Christmas (which let me return the library copy that was sitting unread).  There are other companies trying to steal away NVIDIA's grip on the GPU market, but with not a lot of success to date. In the bioinformatics space, Parabricks is likely to provide yet another layer of moat to keep Intel or AMD from getting in.  AI is where the big money is, so it could be a long time before we see any of these companies making a serious run at biomedical-specific applications.  Alternatively, getting a different GPU within a sequencer could be a way to pry out some market share, and perhaps the mapping of important bioinformatics problems into GPU space is becoming a well-solved problem that can be easily ported onto any GPU architecture.  In any case, time - but not TimeLogic - will tell us the answer.

No comments: