Scientific Easter Eggs

Tonight, of course, is Halloween, one of the many holidays which in the U.S. has a serious sweet tooth. After taking Version 2.0 around for the tradition of gentle extortion on this day, I indulge in my own rituals -- listening to Saint-Saens & reading The Raven. It isn't exactly the right time, but the confectionery angle got me thinking about other sweet holidays, and then to Easter Eggs -- of the scientific kind.

There was a recent complaint in Nature about the growing shift of information from the printed versions of articles to the Supplementary Online Material (SOM). I can definitely sympathize -- as the writer complained, key details have been migrating to the SOM, meaning that sometimes you can't read the print version and really tackle it scientifically. In particular, Materials & Methods sections of many papers have been eviscerated, with the key entrails showing up in the SOM. Most of the point of my print subscriptions to Science & Nature is to be able to read them during my Internet-free commute. Worse, the SOM becomes an appendage in danger of being lost or misdirected -- such as in a recent manuscript I reviewed which showed up without the supplements.

For better or worse, editors & authors have shared interest in shifting things from print to the SOM. For editors, online is cheap. For authors, it is a way to cram more in to fixed paper size limits. Clearly some material (such as videos) can only go into SOMs, and lots of supporting data really does belong there.

In computer code, an Easter Egg is a hidden surprise -- if you know the right combination of keystrokes or commands or such, something interesting (and generally irrelevant to the program) will show up. I'm not sure I've actually ever seen one -- I'm generally too impatient to deal with such things, but I do recognize they exist. Granted, perhaps some of that programming effort would be better spent wringing a few more bugs out, but it is a way for coders to blow off steam.

I propose that a scientific Easter Egg is the inclusion in Supplementary Online Material of valuable scientific data which is peripheral to the main thrust of the paper, but is nevertheless a significant advance. Such events are probably rare, as it requires a certain mindset to bury a possible Minimal-Publishable-Unit in another paper's SOM, but on the other hand it beats something never being published -- and perhaps it is interesting to some but viewed as too minor to merit a paper.

I'll give you an example, from the Church lab. George has long been burying stuff in papers -- for example, one of the footnotes to the original multiplex sequencing paper declared that the technology was being used to shotgun sequence Salmonella typhi AND Escherichia coli! Alas, the project was ahead of the technology & never completed. But a much better Easter Egg is in the first large-scale polony sequencing paper (PDF ; SOM). Supplementary Figure 2 is really an in-depth study of the site preferences of the Type IIS restriction enzyme MmeI -- driven by about 20K of sequencing examples. This is really a bit of restriction enzymology hiding in a sequencing paper. Because the enzyme is used in the method, it is relevant -- but not quite critical. The enzyme preferences are important because it could create biases in sequence sampling, but it is hardly the main point of the paper -- which is why it is in the SOM.

I'm sure there are even better examples out there. What is the most interesting tangential information you have seen in an SOM?

I think that the increasing Least Publishable Unit size, especially in top journals, is a big part of what's going on. Materials and Methods, figure legends and the like in Nature or Science papers frequently contain work that would have been a publication in its own right a generation ago.