Sunday, September 24, 2017

Why Is LISP So Rare in Bioinformatics?

LISP is one of the oldest computer languages and perhaps one of the most influential of the early ones.  Some of the other well-known Eisenhower era languages -- Fortran, COBOL and ALGOL, have certainly left their mark, but LISP and derivatives such as Scheme or Common LISP certainly carries more cachet among "serious" programmers.  COBOL has always been a bit of an easy joke and Fortran tends to mark you as old-school; use of APL (once a language of mine) would mark you as dangerously reactionary.  ALGOL begat Pascal and Modula II and clearly had impact on the C syntax family of languages (including bioinformatics mainstays Python, Perl and Java) As I'll detail below, learning LISP has embarrassingly ended up stuck seemingly permanently on my future plans queue.  But that's also because life never forced the issue:  while LISP has certainly been used in bioinformatics (as covered in a review from 2016 ) , its mindshare in the community would seem to be very minimal.

When I was a wee lad, my father would sometimes invite me to accompany him to Philadelphia Area Computer Society (PACS) meetings if he thought there was a topic of interest.  Later I would become a regular and even a contributor to their newsletter, The Data Bus.  Anyway, one time he spotted a talk and demonstration on Logo which he thought I would find interesting and I did.  Logo was intriguing because it incorporated a simple graphics programming system, Turtle graphics, but ultimately was a derivative of LISP.  Big brother had said good things about LISP, so the idea of playing with graphics then easing into a serious language had instant appeal.

Alas, at the time we didn't have a name brand computer and in any case the only consumer computer with Logo was the TI-99/4A.  We soon had an IBM PC clone, but the first Logo environment for that required the less-favored CP/M-86 operating system and was slow to launch, and I think by the time it did I was hooked on Turbo Pascal.  So I didn't learn any LISP then and sadly I've never learned any since.

On early personal computers LISP had a reputation for being slow, which I think even extended to non-personal computers.  Indeed, there was an effort for a while to develop microprocessors that would run LISP, or something close to it, as machine code.  But given the crazy compute power I slip into my pocket every day, it is hard to believe that is an issue any more.  So why so little LISP?

One possibility is the fact that so many bioinformatics programmers are self-taught, and LISP seems not a popular first computing language.  Conversely, programming courses at the high school and undergraduate levels are focused on a small number of programming languages.  My high school offered a three-year ladder of BASIC, COBOL and FORTRAN, which is one reason I didn't take any programming classes in high school!  Pascal became a much better replacement for BASIC and more recently (and perhaps not necessarily helpfully) Java took over.  Now Python seems to be a favorite first language, which isn't a bad choice.

Similarly, core libraries for bioinformatics tend to be built for a small number of languages, so perhaps LISP is always under critical mass.  Samtools at least once had bindings, but I think that is the only library I remember seeing that mentioned LISP support.  

The counter to all this is that there are certainly plenty of clever people lured out of Computer Science departments into computational biology, and surely many of them must be familiar with LISP.  I had several Millennium colleagues who I know are fluent, but when I started there the focus was on Smalltalk.  That's another language the aficionados love but which has had minimal impact on bioinformatics, though Millennium had a huge investment in it and built some very powerful applications.

The review I linked to above ( makes a strong case for LISP as a bioinformatics language; it is open access so please dive in.  They point out very interesting tools developed in LISP such as the Pathway Tools that are part of MetaCyc and the BioBike cloud bioinformatics platform.

There's also Greenspun's Tenth Rule to consider:
Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.
According to Wikipedia, this is somewhat pointed humor about how many programs have idiosyncratic languages for configuring complex options.  Which sounds like many of the de novo genome assembly programs I use as well as a number of other tools.  Each has a wonderful degree of configurability but also its own funky syntax for doing so.

Okay, the comments section is now open for disputing or expanding on my claims.  I would honestly enjoy any exposition of hidden ways LISP has driven bioinformatics or independent pontificating on why it has been largely on the sidelines.


Scholander said...

As you say, most bioinformatics is self-taught, for better or worse. The vast majority of the time, a nascent bioinformatics programmer just needs to convert some file types, or do some string processing, and when they start Googling they get steered well away from Lisp or Fortran, and for good reason, in that regard. And then they find Bioperl, or Biopython or Bioconductor, and they're off to the races.

I suspect that more bioinformatics comes from other scientific disciplines than it does from direct computer science. At NERSC, we do still have to support a lot of Fortran. Get at least a few help desk tickets a week looking for Fortran compilation help. Quite a lot of the monolithic physics and Astronomy code is Fortran. I've never seen anyone ask for Lisp support. I think it's just not really used by anyone at this point in science, though I'm obviously not certain. (Could be the guys using it just don't need help desk support...)

gasstationwithoutpumps said...

I did some LISP programming, back in the 1970s when it was fashionable. LISP is a great theoreticians' language, as programs can be manipulated and reasoned about easily. It is a terrible language for writing code that has to be maintained by people other than the original author, as its readability is low even by the standards of programming languages. It is more suited for toy examples than for production code.

Jonathan Badger said...

Well, there's the general resurgence of LISP-like languages over the last few years spawned by Clojure/Clojurescript. I'm not sure if Clojure "officially" counts as being Lisp as some diehards refuse to see even Scheme as being a version of Lisp, but it clearly is very Lisp-inspired at any rate. There's a BioClojure for the typical Bio* needs (Plieskatt, et al, 2014).