I recently found out that I've received a summer undergraduate intern slot. I have a soft spot for summer internships -- my own was a great experience -- and the company runs a very nice program, with specific social and learning experiences for the cadre. Anyone interested in applying should do so through the company website (and not here!). I do promise not to fill this space with "can you believe what the intern did today?!?!?", though executing "sudo rm -r /" might earn a slot!
I'm still trying to sketch out a grand scheme for the internship. But, it will certainly combine a certain amount of data analysis with a certain amount of programming. One person I've phone-screened has already asked about suggestions for programming problems to practice on. It's a great show of initiative, which I like but discovered for which I wasn't really prepared.
The challenge for me is to rewind my brain back to an early stage and remember what makes a good -- but doable -- problem. In my head, everything either seems too trivial or potentially discouragingly difficult. So, I'd be very interested in examples of programming challenges given to early programmers with a significant bioinformatics angle -- no bubble sorts or games of Wumpus!
I did find a couple of links with some examples: one from MIT and another from Duke (these links are really a level above). I'd love to find other examples -- and mostly don't care about the language used in the examples. I'm probably going to nudge my intern towards Java/Scala (leveraging BioJava as much as possible), perhaps if only to encourage me to put some more time in on my own retraining project.
So, any suggestions?
I'm actually a big fan of the Mathworks MATLAB programming competition. They keep an archived index of previous contests. There are numerous biologically inspired competitions like protein-folding and gene re-arrangements. The contests are designed so an adequate programmer can make a reasonable answer in a few hours. They come complete with test data, example programs, pretty visualizations, etc.
ReplyDeleteThe only disadvantage is that you actually need MATLAB, but if its a college student they probably have a site-license.
HMMs of DNA sequences. Try to recognise coding sequence vs non-coding. Will learn about exons, introns, pseudogenes and HMM methods which are incredibly useful.
ReplyDeleteShould get something up and running quite quick, but take at least a couple of weeks to walk it all the way through.
Ideal problem for a very bright CS undergrad that's done some algorithms modules.
But then this isn't for anyone pre-first year. I remember my first summer Bell Labs they got me doing compilers and it was a bit beyond me then. Great experience though.
Forgive me; this isn't exactly relevant (or perhaps it is!). We are looking to hire a computational biologist for our HiSeq facility. http://seqanswers.com/forums/showthread.php?p=12004#post12004
ReplyDeletePerhaps your intern is a bit too green, but if you know anyone, could you send him/her my way?
Seth Crosby
Washington University
scrosby at wustl dot edu