There was one slide in Genapsys' J.P. Morgan presentation that just didn't seem right, and the more I looked at it the more annoyed I got at the exhibition of poor graphical design choices. I'm going to walk through my complaints in the hope that others might learn from it. Before I tear it apart, here is the figure -- what do you not like?
We can quickly spot violations of the minimalist ethic of Robison pere and Dr. Tufte: why are there these extra unlabeled datapoints on the swirl graph that I've marked with red arrows. Actually, that whole grey swirl has issues, but we'll get to that later -- and really isn't important to Genapsys' point. Indeed, it is something you might expect to be lifted from an Illumina plot.
As a scientist, my first concern is accuracy. If a graph is deliberately deceiving, it is unacceptable. The Genapsys plot doesn't appear to have any gross violations of this ethic, though the ambiguity from plotting with giant markers isn't ideal. No, what really toasted my cookies here is that this is supposed to visually argue a point -- and the execution of plot utterly botches that objective.
The axes are the first sign of serious trouble -- both have breaks in them (red arrows). Those are sometimes unavoidable, but you really, really want to try avoiding them. The Y-axis is scaled as a log axis, which I am fond of, but with the break its now neither linear nor continuous, potentially inducing confusion. Plus the swirl is unbroken -- plotting a continuous trend line on discontinuous axes is starting to get into misleading territory -- though here it appears to be cluelessness rather than malice.
Of course, the artist hasn't done themselves any favors with the big blue text box eating space (orange arrow) -- particularly when there is so much empty plot space in the upper right corner
If one wants to just focus on the desktop instruments, we can toss NovaSeq to get this plot which perhaps better emphasizes how the 144M chip -- if successfully launched -- would be radically different for these metrics
Now that I've plotted it, I realize the effect isn't as huge as I once thought -- but I'll double down and say that makes the Genapsys graph execution even worse! Their overly complex presentation of what is really simple still obscured their message -- or at least gave an opening for someone to claim that the presentation was distorting things. By cleaning up the plot they can make their point more cleanly and remove an angle for a competing salesperson to try to discount it.
In reality, the cost question is much more complex and nuanced. The slide doesn't include the newest NextSeq instruments and their price/performance. Comparing cost per gigabase makes sense if there aren't significant other costs. For example, if I must spread my run across multiple Genapsys chips that would fit on a single Illumina flowcell for the machine I have, there is a cost in terms of multiple loadings (lab-side labor!) and in making sure the right data is amalgamated (compute side labor!). There's also the ongoing advantage of less expensive hardware: after the first year there are service contracts (and/or licenses) that are percentages of the purchase price. But that gets into really messy territory, as it depends on the sizes of your projects. So its more a reminder that in making these decisions you should consider your projects and their sizes and quality demands and only use published numbers as starting guides.
Good graphic layout -- it matters!
1 comment:
Of all "How to display data" nerds, this guy is my favorite. Some him in grad school, and has UC Santa Cruz pay to bring him multiple times. Was worth every penny. https://www.principiae.be/X0000.php
Post a Comment