Saturday, May 22, 2010

Just say no to programming primitivism

A consistently reappearing thread in any bioinformatics discussion space is "What programming language should I learn/teach?". As one might expect, just about every language under the sun has some proponents (still waiting -- hopefully forever -- for a BioCobol fan), but the responses tend to cluster into a few camps & someone could probably carefully classify the arguments for each language into a small number of bins. I'm not going to do that, but each time I see these I do try to evaluate my own muddled opinions in this space. I've been debating writing one long post on some of the recent traffic, but the areas I think worth commenting on are distinct enough that they can't really fit well into a single body. So another of my erratic & indeterminate series of thematic posts.

One viewpoint I strongly disagree with was stated in one thread on SEQAnswers.
Learn C and bash and the most basic stuff first. LEARN vi as your IDE and your word processor and your only way of knowing how to enter text. Understand how to log into a machine with the most basic of linux available and to actually do something functional to bring it back to life. There will be times when there is no python, no jvm, no eclipse. If you cannot function in such an environment then you are shooting yourself in the foot.


Yes, there is something to be admired about being able to be dropped in the wilderness with nothing but a pocketknife and emerging alive. But the reality is that this is a very rare occurrence. Similarly, it is a neat trick to be able to work in a completely bare bones computing environment -- but few will ever face this. Nearly twenty years in the business, and I have yet to encounter such a situation.

The cost of such an attitude is what worries me. First, the demands of such a primitivist approach to programming will drive a lot of people out very early. That may appeal to some people, but not me. I would like to see as many people as possible get a taste of programming. In order to do that, you need to focus on stripping away the impediments and roadblocks which will trip up a newcomer. So from this viewpoint, a good IDE is not only desirable but near essential. Having to fire up a debugger and learn some terse syntax for exploring your code's behavior is far more daunting than a good graphical IDE. Similarly, the sort of down-to-the-compute-guts programming that C enables is very undesirable; you want a newcomer to be able to focus on program design and not tracking down memory leaks. Also, I believe Object Oriented Programming should be learned early, perhaps from the very beginning. That's easily the subject of an entire post. Finally, I strongly believe the first language learned should have powerful inherent support for advanced collection types such as associative arrays (aka hashtables or dictionaries)

Once you have passed those tests, then I get much less passionate. I increasingly believe Perl should only be taught as a handy text mangler and not a language in which to develop large systems -- but still break those rules daily (and will probably use Perl as a core piece of my teaching this summer). Python is generally what I recommend to others -- I simply am not comfortable enough in it to teach it. I'm liking Scala, but should it be a first language? I'm not quite ready to make that leap. Java or C#? Not bad choices either. R? Another one I don't really feel comfortable to teach (though there are some textbooks to help me get past that discomfort).

7 comments:

Jonathan Badger said...

Well, I'm certainly not one to suggest that C be used for tasks better handled by a scripting language, and I tend to agree with Richard Stallman's quip about vi "Using vi is a sin -- but one that's its own penance".

And yet I really have to question the value of an IDE -- it isn't that I work off of dumb terminals -- I mostly use OS X, which comes with an IDE (Xcode), plus I've installed and played with things like Eclipse and Netbeans. And yet, I find myself going back to Emacs -- this isn't "primitivism" but simply because it *works* better than an IDE while providing the useful bits of an IDE -- the syntax highlighting and automatic indenting.

Noah Fahlgren said...

It seems to me what you want to learn should be about what you want to accomplish. In my experience, the majority of scientists working on computational projects just want to get something done, a question answered, and want to get it done fast. In most of these cases it's probably hard to beat a scripting language (in terms of programming speed and ease). If someone is actually developing software then it might be a different story.

Conrad Halling said...

I programmed exclusively in C for ten years and then switched to object-oriented Perl. I'm far more productive in Perl than I ever was in C. For the analysis of data, I use a mix of Perl and R.

If I were starting fresh, I would learn Python. Perl has many weaknesses, including that the philosophy of the language encourages you to write code that is difficult to maintain. Python has the advantages that it is object-oriented from the start and that most if not all of the text-munging capabilities of Perl are also available in Python.

I have made several efforts to be productive in C++, but I hate it that I have to do my own garbage collection in that language.

If I had to write more than web interfaces, I would learn Java, which is OS-independent, or C#, which locks you into Windows.

I taught myself emacs a few years ago because I was working with programmers who used emacs exclusively. But I find emacs terribly old-fashioned and limiting, and when I learned it emacs still had inadequate support for UTF-8. With emacs, I found I had to memorize too many things to become productive. I prefer a modern application in which you can discover functionality by experimenting with menus and buttons.

I don't mean to say that everyone should use the languages and editors that I prefer. People should experiment and find what works for them.

Anonymous said...

300 programs are written in C/C++ on your computer right now.

Your loved scripting language is written in C. Your browser. Your word processor. Your OS. Your spreadsheet. Your R stat package. Your short read aligner. Your blast program. You email client. Your shell. Etc. Etc.

Deal with it. It's the foundation of all computing. If you know the foundations, the rest is easy. You know some lame abstraction and your befuddled. Bottom line.

Keith Robison said...

WRT the last comment, I'm afraid either I haven't expressed my opinion well or you didn't read it carefully. I have no objection to code written in C/C++; I've even done some myself. They are still really useful languages & many programs I use are written in them.

BUT, is C/C++ a good first language? Good for masses of people who view themselves as biologists first and programmers second (if at all)? That's the question I was trying to raise & you appear to have not addressed at all.

Anonymous said...

how can one be productive with malloc, calloc, or new and delete?
need purify and other tools to see leaks, it's insanity of programming.

rdf said...

Sorry for the late comment -- but I think that if you're trying to learn bioinformatics you would want to concentrate on languages that give you insight into the algorithms rather than the software per se. e.g., "if I do this, what comes back as being similar". I would think R would be a strong contender here, plus you would have the toolset to determine if the result has statistical significance.

I'm a software guy, not a bioinformatician by any stretch, but I honestly don't see how learning C (by learning I mean studying the language spec, understanding how structure membership order impacts memory layout and how that works with/against your machine's memory hierarchy etc.) aids in understanding these issues which I think are more important for bioinformatics.