As others have noted, a significant theme at
AGBT this year was sequencing at length.
While this year lacked true bombshells, PacBio impressed many with their
making single-contig bacterial genome assemblies look easy. Moleculo had been the object of much
pre-meeting excitement, and while very few additional details emerged about
their process, several talks showed what could be done. As I have discussed previously,
Nabsys demonstrated their “positional sequencing” system to select invitees in
a hotel suite. Optical mapping from
OpGen and BioNano Genomics featured in a few posters, but did not attract much
attention. Oxford Nanopore had no
physical presence, beyond a somewhat secretive suite, but several ONT staffers
were happy to reiterate their confidence that they will launch their system –
when it is good and ready.
In the end, three things will drive adoption of these
technologies and the extent to which each one succeeds, which I will explore in
detail below. First, there are the
applications; there are different strengths and weaknesses to each, and some
systems will be ill-suited for some (or completely unusable, with a lack of
commercial availability being the ultimate in unusability). Second, there are cost considerations, though
none of the presenters seemed to even touch on this, leaving pundits such as
myself to do back-of-envelope estimates (some of which threatening to pop
eyeballs). Finally, there are just
preferences. For example, as noted elsewhere, Moleculo will be attractive
to shops already heavily invested in Illumina, particularly if they are averse
to shipping some work elsewhere.
For application space, most examples given at AGBT were
either genome assembly (including gap filling and other improvements),
structural variant discovery and haplotyping.
One poster showed the use of PacBio for cDNA sequencing, which will
certainly be a boon to cataloging splice variants. Metagenomics applications came up in Q&A,
but I don’t believe any talks or posters actually showed this.
As far as matching technologies, it’s useful first to
explore who is just plain absent and who is a pretender to the throne. Ion Torrent simply has ignored this area, and
their AGBT presentation was no different.
Rothberg’s penary talk, which apparently was a near copy of the one he
gave at the earlier Ion sequencing symposium at an adjacent hotel , was big on
“Moore’s Law” and enjoying the spotlight (and also inducing many eye rolls),
with lots of projections of the capacity improvements coming on Proton (PGM
users were only referenced in terms of number of runs; nothing was promised
here) and discussion of amplicon, capture and RNA-Seq applications, but no
mention of long range information. The
pretender to the throne is clearly Roche/454.
They presented one nice talk in the bioinformatics session describing
valuable work closing gaps in a human cell line sequence, but cost was
completely ignored. No surprise: with
454 running north of $10K per gigabase, a 10X genome would be upwards of a
quarter million dollars. PacBio’s per
gigabase cost is at worst half that – so their 10X human genome was only about $100K (apologies again for posting a much higher number on Twitter previously). The contrast is
that 454 seems stuck with incremental improvements in modal length and no
significant changes in density, whereas a doubling of PacBio throughput should
be rolled out this spring and perhaps another 2X squeezed out of the RS
platform over the rest of the year.
Also noticeably absent was any serious mention of BGI,
Complete Genomics or Complete Genomics’ Long Fragment Read technology (LFR),
covered previously. In their
Nature Paper, Complete and collaborators demonstrated much longer haplotyping
than is possible with Moleculo, though the underlying approaches are similar. If BGI wants to be part of this new push for
long range information, as they seem to have suggested, they need to get the
merger distraction out of the way and start making it clear whether they will
roll out LFR as a service (likely) or as a kit. Nor were the cool "library-on-an-Illumina flowcell" approaches to long range information in evidence at AGBT, but that remains an interesting approach as well.
Moving to applications, for de novo assembly, my bias would
be towards PacBio. Because Moleculo
performs de novo assembly, albeit on individual fragments, it can run into
problems with long direct repeats and also with any extreme base bias regions
which the underlying Illumina technology chokes on. PacBio had a poster demonstrating reading
through a very long VNTR in a mucin gene.
PacBio might have problems getting the exact number of bases correct on
a simple repeat array, but should be able to give relatively tight bounds. In contrast, if Moleculo must deal with a
repeat array longer than the fragment size, only some guesstimation based on
read depth is going to yield the number of repeats.
In their AGBT presentation, PacBio made snapping bacterial
genomes to a single contig, well, a snap. I’m in the process of testing that
for myself, but if this is the case PacBio is likely to become the standard
approach to high quality bacterial genomes.
Illumina will still be valuable for surveying large numbers of genomes
at much lower cost, but for high resolution PacBio could rule. However, Illumina makes some strong claims
around their new Nextera mate pair kits in this space, and so there may be
three grades of genomes: highly fragmented Illumina paired end, good but not
single contig Nextera mate pair versions of those and finally PacBio. If there is much cost differential, then some
investigators will settle for that middle ground, which may be useful for most
studies.
For other classes of ugly sequence, the two technologies are
probably so close that only a very carefully designed head-to-head would flag a
clear winner. For example, such nasty
regions as mammalian MHC showed up in talks, which are characterized by lots of
repeats but not necessarily long simple repeat arrays.
On the other hand, for haplotyping I suspect Moleculo will
be more popular than PacBio. First, if
Illumina is to be believed there will be a sizable cost difference, with
Moleculo on a human genome perhaps adding around $10K per genome to a
project. Illumina stated in their
presentation that a substantial amount of haplotype information could be
obtained using low coverage Moleculo, so that may be popular in studies with
lots of samples. As noted above,
Moleculo may also simply be popular for those heavily invested in Illumina.
For large genomes, it appears that there will still be
challenges. That will remain the area of
opportunity for mapping companies such as OpGen, BioNano Genomics and soon
Nabsys. But as the long read sequencing approaches improve, they will be continually chewing upwards into the mapping companies' space. It's a long way until all the dust settles, which means it will be an interesting space to watch for quite a while into the future.
Z niecierpliwością oczekuję kolejnego wpisu:)
ReplyDelete