Sunday, June 10, 2007

Temptation Beckons!

For the last decade plus my main programming language has been Perl. It has served me often and served me well, but during that time I submerged an important thought.

I really don't like Perl!

I started with Perl 4, and was slowly seduced. The huge key was the facile text manipulation, particularly the regular expressions. I had previously been using C++, and doing any text processing in it at that time was a bear; in Perl it was so trivial. Plus, Perl had no arbitrary variable size restrictions (and was garbage collected), versus the wierd problems which most versions of sed & awk ran into with large text strings -- such as big DNA sequences. Perl 4 also had these wonderful hash tables, another feature rarely found in languages at that time, which could be so useful. It lacked any sort of complex data structures, but I was just using Perl as my specialized text processing language, so I just didn't do anything requiring fancy data structures. The text processing was what I really needed, and with a little bit of learning how to ping internet servers from Perl I was hooked.

I even made a little spare change, developing the original web interface to Flybase, the grand Drosophila database. Flybase's computer guru was a big fan of the Gopher+ system and skeptical of World Wide Web, and so Flybase had a strong Gopher+ interface and no Web access. I was starting to play around with hyperlinking the heck out of BLAST reports and database entries, and was frustrated that my browser couldn't be used to talk to Flybase -- early web browsers did Gopher but not Gopher+. At some point the lightbulb went off: what if I wrote a system that translated between the two protocols! My 'relay server' concept was born (which, of course, others had thought of first and named 'gateways'). A bunch of toying around & poking through documentation, and the thing worked! Well, mostly -- periodically something would change at Flybase to expose my incomplete understanding of Gopher+, and the system would break. But, I was the only customer & it wasn't bad to maintain. The curation center for Flybase was in the Biolabs at Harvard, and one day I saw one of my friends from there in the hall and the ego switch was thrown: "Hey, look what I've done!". I showed it off, just to show off. But it turned out, the Flybase advisory board was also frustrated at a lack of WWW access & wanted one NOW! My good luck! For a modest fee (but princely by graduate student terms) I would maintain the gateway as a public access point, until a permanent system was built. I enjoyed the extra spending money -- and enjoyed seeing the permanent system relieve me of maintenance duties.

Java came out & I took a look and there was lots to like. Strongly typed; I like that. Lots of Internet-friendly stuff; nice also. So I played around and then built a useful tool, one of the first graphical genome browsers delivered over the Internet (as an applet). That experience help raise my awareness of Java's deficiencies, at least from my standpoint. The graphics model was horribly primitive. The security model meant I couldn't print the diagrams -- my users had to do screen capture to get hard copy. But quite importantly, they hadn't included any regular expression support! In an Internet-targeted language! Perl was still my go-to language (though never with a goto) for any text mangling -- and most of what I did was text mangling.

I got to Millennium and found a thriving Perl community. Perl 5 had come out while I was slogging through my thesis writing, so I hadn't learned it yet. The other choice of languages at MLNM at the time was Smalltalk, which I meant to learn but never quite did. I did learn Perl 5 -- and now you could do everything! All sorts of advanced programming concepts: object-orientation, references, complex data structures. Yippee!

Except, it was the dog's breakfast. Almost perfect backwards compatibility had been maintained, at the cost of importing lots of the weaknesses of Perl 4. The object-orientation was a particularly weird veneer, with lots of traps for the unwary. But it worked, and I had lots of people around me using the same language. We could (and did) share quite a bit of code, and there were dedicated folks willing to solve Perl conundrums.

Now, I'm in a different boat. I am the lone Perl programmer in an environment that again is split between two languages. Both of those languages are strongly typed & carefully designed, though not perfect (what is?). While to date I have mostly been a data analyst, and so my code could live in its own world, increasingly I wish to fold those analyses into better code. Much of the back end is in Python, and most of the code that delivers results to the users are in C#. Trying to learn both at once seems insane, but that is the road I am now headed down.

I've also gotten reminded what I don't like about Perl. One hates to complain about hard work that other's have given away to the world, but many Perl gurus write utterly unreadable code! When I was trying (ultimately successful) to get the Perl SOAP modules working, so that my Perl could speak a common language (Web services) with our C# and Python. A had some difficulty, and was trying to figure out how to change one $*(&(*& character in the generated XML. Digital hieroglyphics!

I had also tried to hone my skills a bit by reading a book of 'Perl Hacks'. The stuff is very clever -- but then I realized how too clever it is! Perl is a language whose culture revels in doing things any-which-way. Yes, I enjoy some of the silliness -- the Bleach module that executes code written all in whitespace (which Bleach can generate from normal code), the write-perl-in-Latin module. But the best 'hacks' were on par with this, turning the language into something entirely different. Some may like that, but it doesn't go with my grain.

I've written two useful pieces of Python so far, and I generally like the language. I'll probably do a longer Python-centered post, but it has the quick-development flavor of Perl but with a much cleaner design. It's awkward though; I'm still diving into the books for just about everything & I'm sure any seasoned Python programmer would say 'you are writing Perl code in Python'!. I've written one toy application in C#. But, of course, the problem is that I have built a good code base in Perl (and of course my Perl is readable!), so if I need to solve a problem quickly I fall back on what I know. Breaking up will certainly be difficult, long & painful.


neilfws said...

But, of course, the problem is that I have built a good code base in Perl (and of course my Perl is readable!), so if I need to solve a problem quickly I fall back on what I know.

So why change? Perl might be perceived as messy and weird by some programmers but it works, it's great for text processing (which is most of bioinformatics) and its long, rich history means plenty of libraries and modules, such as the Bioperl project.

I know a lot of Python converts who like the syntax and design of the language but ultimately, it's about getting the job done in a way with which you're most comfortable. By all means teach yourself a new language - it can be fun and it's good to acquire new skills, but I think it makes sense to do it because you need it, not because it's the new vogue or whatever.

Keith Robison said...

Well, the big driver is that by using an idiosyncratic language for my current professional environment, I lose a lot of opportunity for code reuse -- there is a large existing code base that is difficult to access because it is in Python & C#. Web services helps bridge things a bit, but adds new layers of complexity and points-of-failure. It also means that my code tends to be a dead-end -- I'm the only one improving it, so I don't get the benefit of others' development. There's also been expressed some gentle managerial opinion that having a third code base to maintain is less than desirable.

But that is the dilemma: take a hit on productivity while re-wiring my brain in exchange for being able to leverage the other programmers' work, or keep in Perl mode but be treading a solo coding path.

Rick said...

...keep in Perl mode but be treading a solo coding path.

I think that neilfws pointed out that -- for bioinformatics -- there is a long, rich history [which] means plenty of libraries and modules. So using Perl -- for bioinformatics -- should not doom one to a solo coding path. That assumes that other people in your place of work write Perl which it appears that they do not do.

On the other hand your comments about the unreadability (and thus un-maintainability) of Perl code and the hacks of object orientation ring all too true. Perl 6 may bring relief but it appears that 6 will be almost an entirely new language and, at that point, why not look towards Python, Java, or another language?

Unfortunately workplace (a genomics facility) is almost exclusively Perl which means that it is hard to branch out. Sort of like your problem of being the sole Perl programmer, my problem is becoming to sole non-Perl programmer. :-(

Pekka said...

This was a really interesting read because I've had a similar language learning path in totally different domain. I do software test automation and have done a lot of text processing using Perl, also some Java and about three years ago learned about Python. With this background I agree with all the Perl cons and Python pros you wrote about and nowadays most of my programming is on Python.

Here are two links to make you Python learning curve easier and to give you some other possible interesting pointers.

* Dive Into Python
An excellent Python book targeted for people who already know how to program. This is where I originally learned the language.

* Python Programming for Bioinformatics
I've used information here for learning how to integrate Python with C but since it's about your domain you may find something else that interests you also.