I was looking through part of my collection of papers using Illumina sequencing and discovered an unpleasant surprise: more than one does not seem to state the read length used in the experiment. While to some this may seem trivial, I had a couple of interests. First, it's useful for estimating what can be done with the technology, and second since read lengths have been increasing it is an interesting guesstimate of when an experiment was done. Of course, there are lots of reasons to carefully pick read length -- the shorter the length, the sooner the instrument can be turned over to another experiment. Indeed, a recent paper estimates that for RNA-Seq IF you know all the transcript isoforms then 20-25 nucleotides is quite sufficient and you are interested in measuring transcript levels (they didn't, for example, discuss the ideal length for mutation/SNP discovery). Of course, that's a whopping "IF", particularly for the sorts of things I'm interested in.
Now in some cases you can back-estimate the read length using the given statistics on numbers of mapped reads and total mapped nucleotides, though I'm not even sure these numbers are reliably showing up in papers. I'm sure to some authors & reviewers they are tedious numbers of little use, but I disagree. Actually, I'd love to see each paper (in the supplementary materials) show their error statistics by read position, because this is something I think would be interesting to see the evolution of. Plus, any lab not routinely monitoring this plot is foolish -- not only would a change show important quality control information, but it also serves as an important reminder to consider the quality in how you are using the data. It's particularly surprising that the manufacturers do not have such plots prominently displayed on their website, though of course those would be suspected of being cherry-picked. One I did see from a platform supplier had a horribly chosen (or perhaps deviously chosen) scale for the Y-axis, so that the interesting information was so compressed as to be nearly useless.
I should have a chance in the very near future to take a dose of my own prescription. On writing this, it occurs to me that I am unaware of widely-available software to generate the position-specific mismatch data for such plots. I guess I just gave myself an action item!