Sunday, April 03, 2016

Mosquito Genomes: Chance for Long-Range Companies to Shine

Friday's New York Times carried a front-page illustration of the current status of the Aedes aegyptii genome, accompanying an Amy Harmon story on efforts to improve the currently highly fragmented state of this genome

The pice has seen a lot of opinion on Twitter with regard to its value and other issues (such as calling an assembly a map -- which to me is correct as the perfect genome sequence is the ultimate physical map!)

Why is the assembly so fragmented?  Well, here is a visual example (from Anopheles) posted by Adam Phillippy:

Yeah, that's repeat hell.  And apparently these genomes are full of such hairballs.  The biological relevance of such complexity isn't clear, but based on the Harmon article efforts to tune up the genome have already identified genes missed by the original assembly.  There's also interest in identifying truly mosquito-specific regions for possible generation of "gene drives" to do something useful (ideally eliminate their ability to carry nasty diseases, but it would be remarkable if enough is known about that biology to accomplish such a feat).

I recently posted my own grandiose idea for what could be done with these mosquitos, namely generate a huge exome sequencing database on mosquitoes phenotyped for carrying different pathogens.  Some useful comments ensued from those who actually work on mosquitoes, such as the low frequency at which mosquitoes may be carriers -- though whether that is genetically driven is precisely what my moonshot is conceived to address.  Of course, any such scheme is assuming the key genetic variation can be found in exomes, as opposed to (say) structural variants.  And building the chips to pull down the exome rely on having a good picture of that exome -- so missing genes are a bit worrisome.

In any case, I do believe there is value in getting a better genetic foundation for mosquito work.  Furthermore, this would be a very worthy proving ground for advanced long-range sequencing methods.  The Harmon piece makes it clear that Pacific Biosciences is in this game and suggests two other technologies as well, but doesn't name them.  Getting BioNano optical maps would seem very logical, and in a previous Twitter conversation with Adam I think he suggested that was on the menu.  But if I were at Dovetail Genomics, then I would be certainly pushing to get samples to try to scaffold with that technology as well -- this would be an opportunity to show the capabilities of their technology on a very high-profile scientific effort.  10X Genomics is another technology that could shine here, perhaps more towards generating haplotype information than tying up the existing genome better (though it might likely help there as well).

That would be the round-up of proven long-range technologies.  Oxford Nanopore would be one on the edge of capability: reports are starting to appear of truly monster alignable reads, even bigger than the 100Kb range I think is the biggest in the literature.  But MinION just doesn't have the throughput to make a real go at a large (~1Gb) genome, certainly not without fast mode.  Perhaps one of the early PromethION customers will try out Aedes aegyptii or Anopheles, another bad actor mosquito genus -- or perhaps Aedes albopictus, the tiger mosquito that has the potential to transmit Zika within large swathes of the United States (red).

Who else is out there?  Nabsys 2.0 is the other company that comes to mind as having a technology that could apply.  Are there any dark horses?  Stealth mode companies that would see mosquito genomes as a way to uncloak with a big splash?

Contributing to a high-profile effort on an urgent problem would, in my opinion, be a great way for genomics companies to do good while getting favorable publicity.  Demonstrating value on a truly difficult genome such as Aedes would serve as a benchmark for other diploid genomes.  Most importantly, by depositing such data in public repositories, companies can enable and encourage software developers to build tools that leverage such data, which is ultimately beneficial to these companies.


JohnUrbanGenome said...

Do you know how much 10X and/or Dovetail are charging for scaffolding data? I cannot seem to find much info on that. I suppose I could always ask them directly :P

Keith Robison said...

Genomics Services Laboratory at Hudson Alpha will generate a 10X library and run 1 lane of HiSeq for about $3300.

Full on service for a mammalian genome at Dovetail I think is quoted at $35K.