Tuesday, January 24, 2017

Notes on a Conversation with 10X

I've been remiss in writing up a piece on 10X Genomics based on a phone discussion last week with Michael Schnall-Levin (VP Computational Biology and Applications) and Anup Parikh (Director, Product Marketing).  I always appreciate companies reaching out to me and spending time to educate me on their products and plans, and this was a very interesting and enjoyable conversation.

10X Genomics is built around microfluidic encapsulation technologies, which they have been building into two major analytical platforms.  Their linked reads technology encapsulates very small subpopulations of DNA, ideally single molecules, in order to capture long-range sequence linkage information on the short read Illumina platform.  Their other application is single cell RNA profiling, in which individual cells are captured and their poly-A RNA converted into barcoded Illumina libraries via reverse transcription.  

Linked Reads for Haplotyping and de novo Assembly

On the DNA front, 10X is excited by ongoing adoption on the platform and the concept by the genomics community.  The entrance of other companies in the space, such as Dovetail and iGenomX is seen as enhancing awareness, but 10X is convinced that they hold a sizable lead on their competitors.  In particular, 10X sees their suite of analytical tools, such as LongRanger and SuperNova, as superior to the offerings of other companies in this space.  10X also believes that many applications can rely on their technology to get high quality genome assemblies, rather than a collection of three or four other genomics approaches.

10X is now expanding beyond their original focus on linked reads in human resequencing projects to using linked reads to assist in de novo assemblies of both humans and other species, some of which were highlighted in talks at the recent Plant and Animal Genomes conference.  The company recognizes that one barrier to use for de novo assembly are the resource requirements for their Supernova assembler.  Last year then trimmed Supernova's RAM requirement from 512 gigabytes down to 384 gigabytes; they hope to make further progress on this and get down to 256 gigabytes.  

It is worth noting here that the genome assembly community is beginning to support 10X (and other) linked read schemes.  For example, the preprint describing the Abyss 2.0 assembler has a recipe for scaffolding with 10X data.  Another pre-print describes ARCS, a scaffolder which uses 10X linked reads, and another a scheme for structural variation detection. 10X believes they are fostering such developments with their open software architecture.  They are also starting to look at applying their technology to metagenomes.

By using hybridization capture upstream of 10X library generation, the technology can generate linked reads for targeted regions such as exomes or defined chromosomal regions.  10X said that this approach remains an area of growth, particularly for some clinical projects that have found whole genomes less useful than targeted regions, but with 10X they can add haplotyping information.  Since this approach works with existing targeted probe sets, switching to 10X library construction involves little process change.  

10X also stressed that linked reads not only enable haplotyping, but also improve the ability to call genotypes.  By binning reads by chromosome and haplotype, genotyping can be reduced to essentially a binary problem, even in the face of close paralogs or segmental duplications.

Since iGenomX is perhaps the most similar technology on the market to 10X's for linked reads, I asked for some more depth here.  The 10X team sees iGenomX as still early to fully evaluate, but definitely believe 10X's strategy is superior and is not convinced that iGenomX's library chemistry generates significantly better distributed coverage than 10Xs.  By strategy, what I mean is that iGenomX is sampling a smaller number of DNA molecules in greater depth, whereas the standard 10X approach generates fewer reads per molecule on a much greater number of molecules.

10X feels that getting high physical coverage from many molecules is key to accurately assessing structural variation in a sample, and that it is always possible to sequence their libraries more deeply to obtain higher coverage per molecule.  In particular, they believe that too few compartments increases error via "shot noise" effects.   They noted that iGenomX data presented at ASHG was sampled more deeply at 100X than the more typical 30X in a sequencing project.  Clearly the community needs to generate some good head-to-head data for proper comparison.

I have this deep curiosity about sample prep methods for long DNA.  10X hasn't been focused on this, seeing good results from a variety of methods, but does believe they have seen extreme linked reads suggesting molecules around a megabase in size.

Single Cell 3' Expression Profiling

10X's other big push for their microdroplet technology has been profiling expression in single cells.  To me it is amazing that this system, and others like it, can successfully execute reverse transcription in lysates generated in droplets, given all the troubles that can crop up while performing molecular biology on purified nucleic acids.  Alas, 10X is keeping the details of their library chemistry close to the vest, seeing this as an edge.

With this technology, about 10K cells can be barcoded per library.  In addition to a droplet-specific barcode, RNAs are tagged with a unique molecular barcode, enabling the detection of PCR and optical duplicates.  Since this is a 3' assay, conventional means of calling duplicates don't work well since all reads from the same mRNA isoform should have the same 3' end.  10X often generates 18K reads per cell with this approach and finds that the data doesn't plateau until 200K reads per cell.  They've demonstrated the system in a huge way by generating a 1.3M neuron dataset (registration required to access), using 17 runs of 8 libraries (or channels) each.  10X supplies a tool called CellRanger to organize and analyze datasets such as these; as with the LinkedReads they see value in offering a complete analytical solution.

I asked about some of the competition in the space, and it is 10X's belief that they have a commanding lead.  In particular, while 10X is encapsulating 10K cells per channel with 50% efficiency (i.e. about 50% of the input cells are represented as libraries), their competitors are sampling a few hundred cells (new product from Bio-Rad/Illumina) or around 800 cells (newest Fluidigm product).  Also, because some of these (such as Bio-Rad/Illumina) use a double Poisson process, conversion inefficiencies of competitors can be quite low.  If you aren't used to thinking about these droplet systems, these systems merge populations of droplets containing cells pairwise with populations of droplets containing barcodes.  If each population can be modeled with a Poisson distribution, with many droplets empty so that very few are multiply-filled, then the cross-product of these will be "double Poisson".  So only a small fraction of the merged droplet population is productive, as many result from merging two empties and others contain either a barcode or a cell but not both.

There's been some very clever publications around this system recently using an approach called Perturb-seq.  In this process, vectors are constructed that contain an CRISPR cassette targeting a gene of interest as well as a sequence barcode which will be transcribed into a polyadenylated mRNA.  This library can be transfected into a cell population, the population exposed to some stimulus, and then the system profiled using the 10X single cell 3' expression assay.  The truly clever twist in this is that the assay will read not only the expression profile of the cell, but also the molecular barcode identifying which CRISPR cassette (or cassettes; some of the design multiplex three cassettes) the cell received.  Thus is gene expression made a phenotype for a high-throughput forward genetics screen.  The Peturb-seq effort from Aviv Regev and colleagues at the Broad Institute led off with back-to-back papers, one profiling the unfolded protein response and the other the response of dendritic cells to lipopolysaccharide.  (Note: a raft of papers with similar approaches have been published recently; I don't have access to most and so can't see what cell encapsulation system they used).

A future product from 10X, scheduled for the first half of this year, will use the same cell encapsulation scheme, but instead of targeted 3' poly-A it will be designed to read out immunoglobulin VDJ combinations from both the heavy and light chains.  

No comments: