Thursday, February 21, 2008

History better learned late than never

I've always been interested in history, and the history of science is no exception. I thought I knew a bit about the history of DNA sequencing, so it was a bit of a rude surprise to read the obituaries on Wed for Dr. Ray Wu and discover that he had published one of the first DNA sequencing methods, a method that is credited with being the forerunner of Sanger sequencing. I was totally unaware of this history.
Sadly, the Wikipedia article on DNA sequencing doesn't cover this at all.

A bit of Medline hunting, aided by Dr. Wu's page at Cornell, found a few articles in PubMed, most sans abstracts and few with full text (Somebody PLEASE arrange legally to get classic J Mol Biol as free full text!). Luckily there are a few papers -- this NAR paper and an earlier PNAS one. If I'm reading it correctly, then it involved 2D analysis of digestion maps, which I had heard of so perhaps my historical knowledge isn't totally deficient.

The one question that occurs is why didn't Dr. Wu stay in the DNA sequencing business. I wonder if he left any thoughts -- was it just not interesting enough or did Sanger & Gilbert just jump ahead so he felt like it wasn't the right place to be. Whatever his reasons it can't really be criticized -- Wu had quite a publication record and appeared to be active essentially to the end of his life. It would just be interesting to understand why he took the direction he did.

Friday, February 01, 2008

Microsoft's irritating limits

Okay, it’s really time to face facts – there’s no avoiding the issue. I am a creature of habit, but the good ones tend to wax and wane with a rarely changing backdrop of the bad ones. Posting regularly to this forum was a good habit, but one which has been scarcely seen for a while.

For the explanations, you can round up the usual suspects. Work pressures. Holiday madness. Routines once disturbed being hard to reform. But in the end, those are just excuses which must be actively combated. Or perhaps it is a lack of passion? And how can passion be rekindled? What will put a new burr in the saddle – well (to badly mix metaphors), perhaps something to get the blood boiling: a good old-fashioned screed.

Now, there are whole websites devoted to griping about Microsoft. Perhaps everything I say has been said before. And, perhaps it is all obsolete carping – for reasons good and perhaps not-so-good I’m still using the previous version of Office. But, given that most of these problems have persisted through so many versions during a decade-and-a-half, I’m not optimistic.

Now, I’ve picked a very broad target. Easy to hit. Like an archer trying to hit the ocean with an arrow while standing in the foam. But I’ll try to focus.

I could gripe almost endlessly about Bill’s Army’s graphical choices. How poor contrast, cluttered charts are the defaults. I won’t (today, at least) go into that, but suffice to say it would be unhealthy for everyone concerned if I were to be trapped in an elevator with the Excel Charts programming team.

No, today I won’t focus on cluttered vision but on a limited vision, when Microsoft has a good idea but then shackles it bizarrely. When it gives the user choices, but insists on giving them very few.

Excel is a workhorse for myself and my colleagues as a data organization, filtering & delivery tool. This has been a habit acquired way back in graduate school and carried through many iterations. I often gravitate towards Excel for taking structured notes, with it’s relentless row-and-column layout forcing useful order. It’s also easy to take those tables and move them into other programs, and in my current post I’ve discovered Excel’s facility for slurping data out of relational databases. All great stuff.

Now, once you’ve got a mess of data in Excel you’d like to sort, filter & highlight it. Naturally, Excel can sort data, the handy Auto-Filter can filter it, and conditional formatting can highlight cells. But they are all broken in the same way: for reasons numerological or otherwise you can do any of these on only 3 criteria each. Sort on three columns. Have three different states to filter on format on.

Now, to pretend to be fair, with Auto-Filter that restriction is per-column, so in a many columned spreadsheet I can filter each column with three criteria -- except when it gives me only two - -I swear it happens, though I can’t remember when , as I’m usually too busy swearing. But, if you wish to see in your dining spreadsheet when you ate Thai, Italian, German or Fusion, you are out-of-luck.

A work-around, of course, is to compute columns with groupings. This can be useful, especially since you can’t save those filter settings for reuse later, but such columns quickly become a visual headache. A few can be useful, but in my hypothetical dining spreadsheet I really won’t want one column for every possible combination of cuisines!

The visual issue brings up another half-way measure: pane splitting. A useful way to deal with so many columns or rows is to split the panes horizontally or vertically (or often both) – but you can have only one split! So if I want to compare data in columns A,Q and BC, I’m out of luck – unless I copy columns or laboriously hide (and later unhide) the columns in between. One help is to make the top row or first column so it doesn’t scroll offscreen – except that blocks out pane splitting!

Now, somebody might be churlish enough to suggest that these limits are rationally chosen and have to do with tradeoffs. Hogwash! Spotfire has a spectacular interface for limiting a field all different ways, with different idioms for choosing (checkboxes, sliders, radio buttons). And if you are having problems with splitting too many ways not working well with various windowing components (such as scrollbars), it’s not like the company doesn’t have its paws in both areas.

A related class of shortcomings, but not quite in the same category, comes around Excel’s functions. If you want to add or count things conditional on some column, there are the COUNTIF and SUMIF functions. But what if I want a conditional AVERAGE? Well, that at least can be gotten from COUNTIF and SUMIF – but no such luck if I want QUARTILE or MEDIAN or STDEV such ways. Why isn’t there a function APPLYIF?

Similarly, I’ve recently gotten hooked on Pivot Tables, which are handy for summarizing data. Basically, a Pivot Table can give you summary statistics on some cross-reference of fields in rows – for example, I could easily see in my eating out spreadsheet the sums of cuisines versus the months of the year. Naturally, these statistics include sum, count, average, stdev, min, mean – but no MEDIAN or QUARTILE . Aargh!

Alas, for the moment I must let my blood boil. When in a company environment, there is an advantage to having a lingua franca, even if it is a flawed one. Plus, what alternative do I have? Perhaps some of this is better in OpenOffice, but I’m not optimistic – my previous experience (though at least a year old) with OO was that it generally aped Microsoft’s flaws & introduced some new ones. Yeah, with more time and energy I could fix that myself, given that it’s open source, but wasn’t the limits of personal time and energy how I opened this article?