Thursday, October 11, 2007

Opus #173, Programming on the Dark Side (C#)

I had commented a while back that I was contemplating shifting my programming focus from Perl to another language. The existing code base is split between C# and Python, with more C# but with a lot of code I need to think about in Python. I gave both a bit of a trial and also took some suggestions, and did come to a decision.

Hands down, C# is my language.

Now, language choice is a personal matter, and I don't dislike Python -- at some point I'll write down more impressions -- but C# is a great match. I really do like a strongly typed language, both from the standpoint of catching lots of silly mistakes at compile time rather than runtime but also because the typing provides lots of cookie crumbs for trying to reason out someone else's code (or old code of your own). That could also make for a long separate post.

There are really three powerful things to like about C#. First, the language itself. While by far I can't claim to have figured everything out, for the most part I can't argue with it. Lots of powerful concepts and a general feeling of consistency (as opposed, for example, to Perl's kitchen sink collection of stuff).

Second, there is the .NET class libraries. There is an awful lot there to cover many things you'd want to do, and again there is a reasonably strong sense of consistent design. Here I might find more to quibble over, but it generally hangs together.

Third, there is Visual Studio, a very slick integrated development environment (IDE). The help facility is very powerful for exploring the language, the error messages are generally good, and the ability to browse data in a running program is superb. Furthermore, you can perform a remarkable degree of editing on a running program -- there are many things not allowed, but a lot of runtime errors can simply be edited away and the program continued from where the exception occurred.

However, there is one key drawback to C# from a bioinformatics standpoint: you are not going with the crowd. There appear to have been at least two efforts to create C# bioinformatics libraries for C#, and both appear to have been stillborn. If you Google for "C# bioinformatics" or
.NET bioinformatics" you find stuff, but more idle talk than solid work. And I think there is an obvious reason for that.

All three of the legs are controlled, or at least perceived to be controlled, by the Emperor Gates. If you do click around some of the google links it's not hard to find disdainful comments about the perceived Microsoftity or Windowsosity of C#/.NET. There is an effort called MONO to port the whole slew over to UNIX boxes, but it's not clear this is perceived as more than a fig leaf. The name certainly isn't going to win friends among undergraduates -- "Have you gotten MONO yet?".

On the other hand, there is definitely corporate interest. Microsoft has been making increasing noises about bioinformatics, though perhaps focused further downstream than where I usually work. Spotfire, which is really useful for data exploration, I've heard provides a .NET API. Certainly during my interviews last year I saw C# books or heard mention of it at many of the companies.

So, it's a locally packed but globabaly lonely world to be a C# bioinformaticist. Luckily, it wasn't hard to build the critical tools I needed -- but I needed only a modest subset of what BioPerl, BioPython or BioJava would provide. However, there are some interesting ways to leverage those tool sets -- though that will have to be another subject for another time


D. Caplan said...

Eww, gross :)

(sorry, I couldn't help myself)

Maverator said...

Is there anything specific about Java that takes it out of consideration for you?

Not that I'm a Java apologist. I am considering languages to move to beyond perl, and Java is the current front-runner.

Keith Robison said...

You may not have seen the previous post: the language choice was driven largely by what is already in use at my company. By picking one of these languages I get to share code, get questions answered, etc.

It would be interesting to see a good comparison of C# vs. Java. I did some Java programming when it was very new (circa 1995) and was missing a lot of important stuff that C# has (e.g. generics, delegates) and the support libraries had nothing like what .NET has (the collection classes were anemic, the graphics model appalling, no database support, etc. But that is, of course, a ridiculous comparison -- Java has not stood still in the decade plus since then. The only thing I know is that the person who picked C# for my shop had extensive Java experience and strongly believes he made the right choice, but that is one opinion.

naptime rocks said...

Check out the NCBI C++ toolkit- lots of standard functions for use there...

Keith Robison said...

Thanks for the suggestion. Any idea how much of the NCBI toolkit will compile under C# -- I'm guessing a lot, but C# is not purely upwards compatible.

Anonymous said...

Java is very much like C# these days - yes it supports Generics since v.5 though there are some limitations because of backwards compatibility.
C# has that nice IDE & nice graphics / interface libraries all together as standard. This makes studying simpler. You can throw a nice app together easily and yes the debugging is top notch.
Java can offer similar libraries but not as standard, so immediately there's a question about which to use - do you pay or go with freebie stuff?
I'm currently working with Java and JIDEA IDE (4.5.4). To be honest I don't know which of our libraries are free & which aren't. There's a lot of JAXB/jxb stuff but I still haven't got the kind of graphics features I like so much in .NET 2.0.
So to summarise, I think C# wins, until you want to deploy to anything except a Windows box with .NET 2.0 installed, when you might as well take the plunge and go the Java route because it's more versatile than porting through MONO.
I know a man who's big in Bioinformatics who says that C++ is best unless you're limited for time/geekery, in which case Java (done correctly) is almost as fast to execute and much faster to develop.
He doesn't know much about C#. I think that speaks volumes. Most Bioinformatics software has appalling user interfaces in my opinion.

Keith Robison said...

Thanks for the perspective on Java.

The deployment issue is an important one. We don't have plans to deploy outside the company, so it isn't an issue, but certainly if you are an academic wanting widespread use of your software Java is probably a better platform.

I haven't worked with C++ in a decade & as with Java I haven't attempted to keep up with its evolution. On the other hand, I'm most interested in solving biological problems & not be a computer geek, so to me it is the programming environment that is the big win. I really don't have much patience for low-level gruntwork & am not fond of others being in love with it -- unless you really have a high-performance application that needs the speed & you have the time to make it leak-free.

Anonymous said...

I found this very interesting. I am a professional bioinformatician working both in an academic setting and on commercially distributed software. Our commercial software had an “outside” requirement that it be in .NET so for the last several years I’ve been leading a team mixed between perl and C# programmers (composed mostly of undergraduate and graduate Computer Science majors along with one other professional programmer). The reason I push C# these days (call me old fashioned) is that it sells. The last four student s who left the team found work based on their C# skills – both in and out of bioinformatics.

Anonymous said...

Creation of multiple programming language to do largely the same thing is a reflection of the natural biodiversity that occur in computer science.

Ironically it is in itself an inefficiency. Instead of learning different syntax to do the same function, what could have been better is to teach, reuse and enrich the one single language.

I am not sure about the rest, I am certainly favours C++ over the richness and availability of many libraries (boost, OpenGL, and etc) and wrappers to other languages such as perl,python and java. I would favour linux on the cheaper replication and expansion of system that solves and answers questions.

It is ok to be different. It is ok to explore. Just be reminded, project would fail if the community fail to rejuvinate itself, maintain attention, it would ceased to exist. Some language would start, some would go. Let see how C# evolve.

Max said...

C# is by far not as cross-platform as Python. Python is installed on most Unix machines these days - just send out your script to a collegue and he/she can run it.

If you're doing analyses and not GUI development then you don't need the huge class libraries.

Python is very easy to read, I doubt that you find C# easier to read than Python if you compare them side by side.

There is tons of documentation for python and chances are that you can solve your particular problem by just googling for it and there are some problems very particular when you are doing scientific computing (e.g. parsing large textfiles, fast hashes, matrices) that is often not part of class libraries.

The IDE is an advantage of C#. However, I usually don't need a debugger or helpfiles when programming, as the language Python is quite small, there is tons of help on the internet and I don't use class libraries.

For scientific programmers, Windows is not the OS of choice so you will probably stay alone with your C# and bio-libraries will not appear that soon, just as wrappers for third-party tools, parsers, etc.

Keith Robison said...

I certainly appreciate the actual/perceived restriction of C# to Windows, and if running lots of places was an issue, I wouldn't pick C#.

I do respectfully disagree on the IDE & readability issues.

If you've never used a good IDE, you just don't realize what you are missing -- I know I didn't all the years I worked without one. Being able to inspect in detail the state of your program at any point in its execution is a powerful tool for figuring out why the code isn't behaving properly.

On readability, I'd agree that for just glancing over the code Python tends to be very readable and certainly most Python code is more readable than most Perl code (trying to decipher the Perl web services library ignited my discontent).

But, when you go into a library to figure out just how the code is flowing, strong declarative typing gives a huge edge up -- if I have function X I'm trying to figure out and it spits out an object of class Y, I can easily find all the possible consumers of class Y. Similarly, I could also figure out all the generators of class Y should I have a consumer of that class. It's a layer of required documentation which in the ideal world you might not need, but in my experience I find very useful.

I certainly agree that all languages are easier to learn nowdays due to all the stuff on the Web -- Google is an amazing tool for learning a new environment.

Also, there is a secret weapon for solving the libraries issue. Gotta get working on that post...