Monday, August 31, 2020

Self-Assessing My Python

I've been programming 90+% in Python now for over a year and a half -- when I joined the Strain Factory I vowed to finally make the break from Perl.  Partly this was disgust with so often finding libraries I wanted to be missing or broken, and partly it was recognizing that the Factory is primarily a Python shop and I would have the most impact if I worked in the lingua franca. I was first exposed to Python back at Codon Devices, but there was a strong C# faction there and I fell in love with that language, so my primary dabbling in Python was learning enough to glue the key Python code into my C# with IronPython.  I strongly considered changing over at the start of Warp Drive, but gave it too weak a try and quickly started churning out Perl.  I still use that language for basic level text munging, but have avoided writing nearly anything that occupies more than one screen.
But it is useful to figure out how far I've come and to try to do so honestly.  What do I do well and what not so well?  Where are the gaps?  Thinking about this has also meshed with the whole question of how to assess candidates for our open positions, what to suggest to people who say they wish to break into computational biology and similar avenues.  There's also a question of scope: any modern language is really a core plus a set of key libraries, so which of those to include?

The hardest part of self-assessment would be to judge what I am completely ignorant of but shouldn't be.  Perhaps a bit more tractable is recognizing libraries I have plumbed insufficiently.  For example, I really didn't start using itertools until last week, which is a library of advanced iterator concepts which can eliminate many loops and particularly nested loops.  There are two collection classes in the collections library I have mastered;  I probably even overuse the one (defaultdict, a dictionary/hashtable which you instruct how to create new records).  

Then there are places where I know I am climbing the learning curve, and never as fast as I'd like.  I'm pretty happy with my ability to sling dataframes around with Pandas, reading them and writing them, slicing them and combining them.  I recently mastered joining them.  But aggregating them with groupby or pivot_table?  That's still a work very much in progress, usually requiring consulting some online tutorials to get the execution correct.  I did find a tutorial that explicitly showed the same grouping operations as Pandas and SQL, but I haven't studied it properly.

Where can I pat myself on the back?  Well, I was slow to pick up list and dictionary comprehensions, but now use them pretty routinely.  It's far less likely now that I'll write a for loop to initialize a list and then rewrite it as a comprehension.  Even better is where I replace some levels of a defaultdict with a dictionary explicitly initialized using a dictionary comprehension.

How else might I grade myself?  I don't work on a lot of group code, but that is always instructive.  Even just getting asked a question I can't answer can be a path forward -- someone asked me about the fastest way to add a certain type of computed column in Pandas and it turned out that apply(), which I've mastered, isn't it -- which taught me about some of the other methods like map() for dataframes -- though I haven't yet used that.  I am on a few Python tips mailing lists, but the signal-to-noise has gotten low -- partly that's a good thing because I've learned a lot, but partly it's because much of what these cover aren't important to what I do.

Another ruler is what did I once do frequently in Perl and how well can I do that in Python?  Data structures: nailed them.  Regular expressions: have that pretty much on par.  Manipulating files or executing operating system commands: uh oh, not very facile.  Writing classes?  That's the worst -- I did a bit last year but didn't quite master it, but I've let the knowledge erode.  
So overall, not bad -- but still so much to learn.  That was always the barrier to switching once Perl dug deep furrows in my brain, that frustration of flailing a bit while switching to a better language.  But perhaps the most important symptom that the switch has really taken hold seems to be apparent: I am far more frequently these days erroneously using Python syntactic conventions when dashing off Perl than the converse! By my bugs thee shall know me.


Dan Udwary said...

Fun to read about your Python conversion. I've also gone down that road, as JGI is largely Pythonic. Once I realized how easy it was to just deal with data read from GFF3 as a dataframe I was sold. I still think too much in Perl, but I'm getting better!

gasstationwithoutpumps said...

One thing we do early on with our bioinformatics students is to teach the use of generator functions for reading input. It allows separating the input from the processing (if you can process as you go) in a clean way, without having to read everything in at once.