At the beginning of the weekend I received a real shock: one of the other father's in TNG's Cub Scout Pack had died of sepsis after a failed endoscopic procedure. His son is a year younger than mine's, but we interacted a bunch on various outings & in my mind's eye I can still see his smiling face illuminated by a lantern at a recent campout. He was a mathematician & I have a bit of a natural draw to anyone in a technical field. Plus, it is always unsettling to have a near contemporary pass so suddenly and from such an unexpected source.
That someone relatively young and being treated in a hospital widely acclaimed as one of the world's best illustrates the grim terror of sepsis. I have never worked directly on it, though when I was an intern at Centocor the lead agent in their therapeutic pipeline was directed against gram-negative sepsis, that is sepsis resulting from an infection by gram negative bacteria. At the time I was there, Centocor and their rival Xoma were cross-suing each other over patent issues for their sepsis drugs (both monoclonal antibodies) and the Department of Defense was accepting the unapproved drug for possible use in the First Gulf War.
Both Xoma & Centocor unfortunately ended up following the path which so far has characterized sepsis: both drugs failed in the clinic, nearly pulling both organizations down with them. Numerous drugs have failed in the clinic for sepsis. A significant challenge is that in sepsis one must somehow prevent the immune system from causing collateral damage to the body while not preventing it from combating the grave infection which is triggering the reaction.
Clearly, this is a tough nut. The Centocor & Xoma drugs both tried to target a toxin (endotoxin A) which is released by dying gram negative bacteria. One thought I had at the time is that a diagnostic would be valuable which would enable distinguishing those patients with gram negative infections who could potentially benefit from those with gram positive infections who could not. In retrospect, even such a diagnostic is a tough challenge -- to be of any clinical value it would need to return results in a matter of minutes or a few hours. That's a hard problem. Other therapies which have been tried in the clinic have tried to modulate the immune system and proven no more effective.
Even running a sepsis trial is clearly an even greater challenge than your average serious disease trial. Obtaining proper informed consent from patients who are at risk of dying in a very short timespan cannot be easy. Challenges in running other trials in emergency medicine situations have befouled another biotech horror land: blood substitutes.
Quite likely a key part of the problem is that we just don't understand this area of biology well enough. Perhaps intensive proteomic and metabolomic analysis on collected samples will yield new markers which will guide better management. Perhaps better animal models can be developed and exploited to understand the complex series of events which occur in sepsis.
I wish I had some answers; on this I'll declare complete defeat. That, and a haunting image in my mind of a cheerful face which now exists only in memories and photographs.
A computational biologist's personal views on new technologies & publications on genomics & proteomics and their impact on drug discovery
Monday, May 31, 2010
Tuesday, May 25, 2010
Guesting over at MolBio Research Highlights
I have an invited piece on the various sequencing instruments over at MolBio Research Highlights. The fact that much of it is an extended riff on the Winter Olympics is suggestive of how long ago the invite came; I was not diligent about turning around revisions quickly. Alejandro Montenegro-Montero there was nice to liven my text with some images and also put up with my inattention to schedule.
I'll make a public pledge here to do better the next time -- if anyone is daring enough to give me a next time.
I'll make a public pledge here to do better the next time -- if anyone is daring enough to give me a next time.
Bike in the Commuting Fold
Okay, a little bragging: I biked into to work last week for Bike to Work week. I actually bike the last leg of my commute a lot of days now on a folding bike (pictured), but this was the whole enchilada on my new 24-speed road bike. By Google maps it's 23 miles each way, but with my accidental deviations the morning was definitely more like 24. I did meet my family for dinner part way home, so the last few miles were sans backpack.
I've always enjoyed a bicycle but am very sporadic about using one. My previous distance record for one day was 42 miles but that was for a charity fundraiser, was nearly dead flat (South Jersey; though we did cross the Ben Franklin Bridge first which is a climb) and my mitochondrial DNA donor insisted on regular practice runs for several weeks beforehand. This ride lacked that level of preparation, so the next day I was a bit saddlesore -- though thankfully none of my joints were complaining.
My more typical commute now is a 3-speed folding bike for the 4+ miles from North Station to Infinity. Infinity's location finally pushed me last year to contemplate this option, as it really is awkward from North Station. The choices by transit are: the EZRide bus and then a 10+ minute walk (through a pleasant neighborhood), walking or Orange/Green Line to the Red Line to Central and then walking or catching a shuttle provided by the landlord. No matter how you slice it, it is a bunch of connections and timing. Plus, the Red Line grows more unreliable and slow every year.
So last year I picked up a folding bike on Craigslist for $120. I had contemplated a bunch in different price ranges but ended up with this bike. I asked a lot of folks with bikes about theirs (there's two more in my railroad car tonight). Most folder owners are quite willing to answer intelligent questions about their gear, a practice which I try to uphold. This one was a good trial, though I am now monitoring Craigslist again looking for an upgrade -- more gears & bigger wheels please!
The T is at best lukewarm to the biking community. Most buses now have bike racks, but I haven't used them. A few stations now have bike cages which require special activation of your T-pass (or something similar), but unfortunately North Station has only an unprotected set of bike racks. The idea of leaving my machine exposed to the elements, vandals & thieves isn't pleasant -- especially since my college bike first rusted out a chain and later disappeared, though it's possible I just forgot where I parked it. Some conductors are quite nice, but one barks at me everytime I yank the bike on unfolded -- one minor issue with mine is a balky knurled nut that locks the frame. One colleague was refused entry to the subway with her's folded, which indicates not everyone at the T understands the long-time policy.
The folding bikes do have some downsides. Mine has very small (12" I think) wheels, which makes potholes and root-lifted sidewalks quite scary. For my commute I really could use a couple of more gears, for the occasional hill and to get speed on flat ground. But overall it works.
For North Station to Infinity there are two obvious routes. The fastest is along the Cambridge side of the river, but it is also narrower and rougher. The Boston side is a touch slower, but is shadier and has a higher density of dog walkers. Both offer great views of sculls and sailboats on the Charles. Both are heavily used, which is generally manageable but you are sometimes left pondering how a single runner can obstruct the path far more efficiently than a pack of four leashed dogs. Some stretches are badly lifted by roots -- and one stretch I don't routinely use upstream of Harvard Square is downright scary, with the path seriously undermined by erosion. In any case, I end up with a very predictable travel time (it's the variance which kills you when you need to catch a train!). Plus, very little interaction with Boston traffic! However, due to 2 years of a paper route I am averse to darkness and bad weather, so I all too often find excuses to skip the bike and retire it altogether during Standard Time. Plus, I do end up standing on the train more (as there are few places to store a bike -- and the seats nearby are usually taken) and so can't read or work on the commute sometimes.
The other great advantage of a bike is greater range off the commuter rail and T. I've used it for seminars and errands and meeting folks for lunch, as well as to meet family for dinner or run errands at home.
A major thought that hits me: why didn't I think about this sooner? During any of my Millennium or Codon days it would have made sense, especially in our old house where I had an annoying short drive the station (where sometimes parking was not to be had). Especially when I was at 640, the folding bike would have been great. But, I never seriously considered the idea. What an opportunity missed!
Tonight I'm going to look at an upgrade on my folding bike. A major investment in a better commute.
Saturday, May 22, 2010
Just say no to programming primitivism
A consistently reappearing thread in any bioinformatics discussion space is "What programming language should I learn/teach?". As one might expect, just about every language under the sun has some proponents (still waiting -- hopefully forever -- for a BioCobol fan), but the responses tend to cluster into a few camps & someone could probably carefully classify the arguments for each language into a small number of bins. I'm not going to do that, but each time I see these I do try to evaluate my own muddled opinions in this space. I've been debating writing one long post on some of the recent traffic, but the areas I think worth commenting on are distinct enough that they can't really fit well into a single body. So another of my erratic & indeterminate series of thematic posts.
One viewpoint I strongly disagree with was stated in one thread on SEQAnswers.
Yes, there is something to be admired about being able to be dropped in the wilderness with nothing but a pocketknife and emerging alive. But the reality is that this is a very rare occurrence. Similarly, it is a neat trick to be able to work in a completely bare bones computing environment -- but few will ever face this. Nearly twenty years in the business, and I have yet to encounter such a situation.
The cost of such an attitude is what worries me. First, the demands of such a primitivist approach to programming will drive a lot of people out very early. That may appeal to some people, but not me. I would like to see as many people as possible get a taste of programming. In order to do that, you need to focus on stripping away the impediments and roadblocks which will trip up a newcomer. So from this viewpoint, a good IDE is not only desirable but near essential. Having to fire up a debugger and learn some terse syntax for exploring your code's behavior is far more daunting than a good graphical IDE. Similarly, the sort of down-to-the-compute-guts programming that C enables is very undesirable; you want a newcomer to be able to focus on program design and not tracking down memory leaks. Also, I believe Object Oriented Programming should be learned early, perhaps from the very beginning. That's easily the subject of an entire post. Finally, I strongly believe the first language learned should have powerful inherent support for advanced collection types such as associative arrays (aka hashtables or dictionaries)
Once you have passed those tests, then I get much less passionate. I increasingly believe Perl should only be taught as a handy text mangler and not a language in which to develop large systems -- but still break those rules daily (and will probably use Perl as a core piece of my teaching this summer). Python is generally what I recommend to others -- I simply am not comfortable enough in it to teach it. I'm liking Scala, but should it be a first language? I'm not quite ready to make that leap. Java or C#? Not bad choices either. R? Another one I don't really feel comfortable to teach (though there are some textbooks to help me get past that discomfort).
One viewpoint I strongly disagree with was stated in one thread on SEQAnswers.
Learn C and bash and the most basic stuff first. LEARN vi as your IDE and your word processor and your only way of knowing how to enter text. Understand how to log into a machine with the most basic of linux available and to actually do something functional to bring it back to life. There will be times when there is no python, no jvm, no eclipse. If you cannot function in such an environment then you are shooting yourself in the foot.
Yes, there is something to be admired about being able to be dropped in the wilderness with nothing but a pocketknife and emerging alive. But the reality is that this is a very rare occurrence. Similarly, it is a neat trick to be able to work in a completely bare bones computing environment -- but few will ever face this. Nearly twenty years in the business, and I have yet to encounter such a situation.
The cost of such an attitude is what worries me. First, the demands of such a primitivist approach to programming will drive a lot of people out very early. That may appeal to some people, but not me. I would like to see as many people as possible get a taste of programming. In order to do that, you need to focus on stripping away the impediments and roadblocks which will trip up a newcomer. So from this viewpoint, a good IDE is not only desirable but near essential. Having to fire up a debugger and learn some terse syntax for exploring your code's behavior is far more daunting than a good graphical IDE. Similarly, the sort of down-to-the-compute-guts programming that C enables is very undesirable; you want a newcomer to be able to focus on program design and not tracking down memory leaks. Also, I believe Object Oriented Programming should be learned early, perhaps from the very beginning. That's easily the subject of an entire post. Finally, I strongly believe the first language learned should have powerful inherent support for advanced collection types such as associative arrays (aka hashtables or dictionaries)
Once you have passed those tests, then I get much less passionate. I increasingly believe Perl should only be taught as a handy text mangler and not a language in which to develop large systems -- but still break those rules daily (and will probably use Perl as a core piece of my teaching this summer). Python is generally what I recommend to others -- I simply am not comfortable enough in it to teach it. I'm liking Scala, but should it be a first language? I'm not quite ready to make that leap. Java or C#? Not bad choices either. R? Another one I don't really feel comfortable to teach (though there are some textbooks to help me get past that discomfort).
Thursday, May 20, 2010
The New Genome on the Block
The world is abuzz with the announcement by Craig Venter and colleaguesthat they have successfully booted up a synthetic bacterial genome.
I need to really read the paper but I have skimmed it and spotted a few things. For example, this is a really impressive feat of gene synthesis but even so a mutation slipped in which went unnoticed until one version was tested. Even bugs need debuggers!
It is also a small but important step. Describing it as a man-made organism is in some ways true and some ways not. In particular, any die-hard vitalists (which nobody will admit to being, though there are clearly huge number of health food products sold using vitalist claims) will point out that there was never a time when there wasn't a living cell -- the new genome was started up within an old one.
It is fun to speculate about possible next directions. For example, they booted a new Mycoplasma genome within another Mycoplasma cell -- different species, but very similar to the host. Clearly one research direction will be to try to create increasingly different genomes. A related one is to try to bolt on entire new subsystems. A Japanese group tried fusing B.subtilis (a heavily studied soil bug) with a cyanobacterium to see if they could build a hybrid which retained the photosynthetic capabilities of the cyano; alas they got only sickly hybrids that didn't do much of interest. Could you add in photosynthesis to the new bug? Or a bacterial flagellum? Or some other really complex more-than-just-coupled-enzymes subsystem?
But as someone with a computer background -- and someone who has thought off-and-on about this topic since graduate school (mostly off, to be honest), to me a really interesting demonstration would be a dual-boot genome. Again, in this case the two bacterial species were very similar, so their major operational signals are the same. Consider two of the most important systems which do vary widely from bacterial clade to clade (the genetic code is, of course, near universal -- though Mycoplasma do have an idiosyncratic variation on the code): promoters and ribosome binding sites. Could you build the second genome to use a completely incompatible set of one of these (later both) and successfully boot it? Clearly what you would need is for the host genome -- or an auxillary plasmid -- to supply the necessary factors. Probably the easier one would be to have the synthetic genome use the ribosomal signals of the host but a different promoter scheme. In theory just expressing the sigma factor for those promoters would be sufficient -- but would it be? To me this would be a fascinating exercise!
Now, I did claim dual-boot. A true dual-boot system could use both. That is much trickier, but particularly on the transcriptional side it is somewhat plausible -- just arrange the two promoters in tandem. Ribosome binding sites would need to be hybrids, which isn't as striking a change.
There are even more outlandish proposals floating out there -- synthetic bugs with very different genetic codes (perhaps even non-triplet codes) or the ultimate synthetic beast -- one with the reverse handedness to all its chiral molecules. Those are clearly a long ways off, but today's announcement is another step in these directions.
I need to really read the paper but I have skimmed it and spotted a few things. For example, this is a really impressive feat of gene synthesis but even so a mutation slipped in which went unnoticed until one version was tested. Even bugs need debuggers!
It is also a small but important step. Describing it as a man-made organism is in some ways true and some ways not. In particular, any die-hard vitalists (which nobody will admit to being, though there are clearly huge number of health food products sold using vitalist claims) will point out that there was never a time when there wasn't a living cell -- the new genome was started up within an old one.
It is fun to speculate about possible next directions. For example, they booted a new Mycoplasma genome within another Mycoplasma cell -- different species, but very similar to the host. Clearly one research direction will be to try to create increasingly different genomes. A related one is to try to bolt on entire new subsystems. A Japanese group tried fusing B.subtilis (a heavily studied soil bug) with a cyanobacterium to see if they could build a hybrid which retained the photosynthetic capabilities of the cyano; alas they got only sickly hybrids that didn't do much of interest. Could you add in photosynthesis to the new bug? Or a bacterial flagellum? Or some other really complex more-than-just-coupled-enzymes subsystem?
But as someone with a computer background -- and someone who has thought off-and-on about this topic since graduate school (mostly off, to be honest), to me a really interesting demonstration would be a dual-boot genome. Again, in this case the two bacterial species were very similar, so their major operational signals are the same. Consider two of the most important systems which do vary widely from bacterial clade to clade (the genetic code is, of course, near universal -- though Mycoplasma do have an idiosyncratic variation on the code): promoters and ribosome binding sites. Could you build the second genome to use a completely incompatible set of one of these (later both) and successfully boot it? Clearly what you would need is for the host genome -- or an auxillary plasmid -- to supply the necessary factors. Probably the easier one would be to have the synthetic genome use the ribosomal signals of the host but a different promoter scheme. In theory just expressing the sigma factor for those promoters would be sufficient -- but would it be? To me this would be a fascinating exercise!
Now, I did claim dual-boot. A true dual-boot system could use both. That is much trickier, but particularly on the transcriptional side it is somewhat plausible -- just arrange the two promoters in tandem. Ribosome binding sites would need to be hybrids, which isn't as striking a change.
There are even more outlandish proposals floating out there -- synthetic bugs with very different genetic codes (perhaps even non-triplet codes) or the ultimate synthetic beast -- one with the reverse handedness to all its chiral molecules. Those are clearly a long ways off, but today's announcement is another step in these directions.
Tuesday, May 18, 2010
Journey to Atlantis
I've only seen it a few times, but the sight of the iconic cavernous building always makes my heart race. But this time even more so, as it meant the end of a race against the clock. We had reached our position for the big event with just minutes to go.
Attempting to be speedy but efficient, I assembled the fancy digital SLR rig atop my tripod. Except it wouldn't work. Removing the tele-extender restored autofocus (in retrospect, probably applying another newton of force would have too) and then I got the camera in the wrong shutter mode -- timer instead of multi-fire. A cheer rises from the crowd and the dark smudge to the left of the building emits a shape trailing a brilliant blaze of red-orange, a color which no photograph seems to capture remotely well. Below that is a growing, intricately braided cloud of smoke. I don't get my camera remotely under control until it is tilted about 45 degrees, and only then do I realize that in my fumbling I had it at minimum zoom! A photographic opportunity dreamed about for nearly a quarter century almost utterly botched! The crowd's sound builds again as the rumble of the engines finally reaches us.
But, I was there. We all were -- TNG will remember it for his entire lifetime. Atlantis punched right through a cloud (don't believe the reports of a cloudless sky!) and soared. All too quickly it was out of sight, leaving for many minutes the detailed smoke tail.
Our plans had been too optimistic, trying to squeeze the trip in with minimal disruption of other schedules, plus a final hesitancy to pull the trigger on plane tickets. What seemed like a plan with a little room for delay was undone by a rental car company that apparently stocks the break room with Protoslo and a traffic jam stretching from Orlando International Airport to the Cape.
I grew up with Apollo. I remember the last moon launches and moon walks. I do not remember the early manned Apollo missions, though I was technically around for all of them. Indeed, it is a great disappointment to me that none of those who could remember can remember if I was toddling in front of the TV when Neil Armstrong made his first steps. I devoured all the books in the school library and then the public library on space and watched many an early shuttle launch and landing (we had a school assembly for the 1st landing!). I remember precisely what I was doing when the news of Challenger's loss came & again with Columbia. I sometimes dreamed of being an astronaut, though never enough to force my academic path in that direction -- but I certainly spent more than a few times in bed before going to sleep as a kid on my back with my knees bent, imagining what liftoff must feel like (I still sometimes close my eyes on airplane takeoffs to try to return to those youthful fantasies). But I had never seen a launch. There are the near-mythical VIP tickets my family once had for a payload my father worked on, but that would have launched in May 1986. After the Challenger-imposed hiatus, somehow we didn't get the tickets again.
When I announced to some of my co-workers that I might try to go for this launch, I got a lot of support. That camera was a very generous loan from one colleague. But the most interesting reaction was the number of individuals who were shocked that the shuttle program was coming to an end. "What do you mean the third to last flight?". And it hit me -- for many of these folks, the shuttle IS the manned space program simply because it is older than they are.
I have a complex love for the shuttle program. It is one of the most amazing devices ever realized from human imagination. It is capable of so much and has contributed so many wonderful images. But it is also a mishmash of design requirements, resulting in a tool not optimal for any task and a design which has proven deadly twice and nearly so on other occasions. The shuttle also sucked so much post-Apollo, post-Vietnam funding that could have gone into some spectacular unmanned missions.
But now I have finally seen a launch. It is spectacular, and I am hungry for more. Alas, I wasted my youth in not making plans and now have that laundry list of responsibilities which come with adulthood. We were lucky that the launch occurred precisely on schedule; too few have stuck to their assigned time. I probably won't be able to do better than a giant screen TV for the last two -- you do get a better view, but it just isn't the same. But you can bet I'll be cross-referencing future vacations against unmanned launch schedules.
Of course, if anyone has some VIP tickets they aren't using, I won't claim I would resist temptation...
Thursday, May 06, 2010
Sales
Two weeks ago I participated in a roundtable sponsored by the Massachusetts Technology Leadership Council (MassTLC) titled "R&D IT Best Practices for Growing Small/Mid-Sized Biopharmas". It was a nice intimate gathering -- about a half dozen panelists, a few dozen audience members and NO SLIDES! A chance for some real discussion -- moderator Joseph Cerro would throw topics out or take them from the audience and the panel would address them as they saw fit. Nice and free-flowing.
I expected this event to be attended by a lot of biotech executives, and while there were more than a few a large fraction of the audience were actually in software sales. One of them expressed their interest in the topic quite succinctly: "Why aren't you guys buying from us?" In his view, his company offered excellent products that met his potential customer's needs, yet too rarely they bought.
One aspect of course -- or perhaps THE aspect, is that we don't have infinite budgets. In my current role, I can spend money on a variety of things -- I can buy software, order consulting or have a CRO generate data for me. I'll confess: my tastes tend to run towards data generation; I tend to lean towards the latter.
One reality which anyone trying to sell me software or databases must face is that it is guaranteed that their software (a) solves some of my problems (b) fails to solve some others and (c) overlaps with other solutions I have already or am strongly considering. When I brought this up one sales guy accused us of not having an overall software vision. That's a tricky subject -- part of me agreed and part wanted to yell "them's fighting words!". I have often had software visions; I have also often given up on them in despair. The truth is that any grand vision would require far too much custom work to be practical or to ever get done. Grand visions don't go well with compromises, and any off-the-shelf solution will involve compromises.
But, one does try to have an overall plan to how things will fit together. Again, one challenge is figuring out what constellation of imperfect yet overlapping pieces to assemble. At a more detailed level, it is deciding what desired features are critical and which are dispensable. Plus, generally you aren't starting with a tabula rasa but rather there is already a set of tools already in place or that are too near-and-dear to someone important to be ignored.
I'm sure trying to sell to me is exasperating. I want detailed technical information on a moment's notice. I'm routinely throwing out projects or configuations to be priced, with few if any actually going forward. At Codon I did exactly that part of the sales game, it was a lot of work and very frustrating to see so little ever come of it. I'm also a pain on software products and databases in insisting on hands-one trials. One database vendor never understood this, which is why I won't bother ever talking to that company again. Perhaps my only virtue is that I attempt to be unfailingly polite through the whole process. I suppose that counts for something.
I expected this event to be attended by a lot of biotech executives, and while there were more than a few a large fraction of the audience were actually in software sales. One of them expressed their interest in the topic quite succinctly: "Why aren't you guys buying from us?" In his view, his company offered excellent products that met his potential customer's needs, yet too rarely they bought.
One aspect of course -- or perhaps THE aspect, is that we don't have infinite budgets. In my current role, I can spend money on a variety of things -- I can buy software, order consulting or have a CRO generate data for me. I'll confess: my tastes tend to run towards data generation; I tend to lean towards the latter.
One reality which anyone trying to sell me software or databases must face is that it is guaranteed that their software (a) solves some of my problems (b) fails to solve some others and (c) overlaps with other solutions I have already or am strongly considering. When I brought this up one sales guy accused us of not having an overall software vision. That's a tricky subject -- part of me agreed and part wanted to yell "them's fighting words!". I have often had software visions; I have also often given up on them in despair. The truth is that any grand vision would require far too much custom work to be practical or to ever get done. Grand visions don't go well with compromises, and any off-the-shelf solution will involve compromises.
But, one does try to have an overall plan to how things will fit together. Again, one challenge is figuring out what constellation of imperfect yet overlapping pieces to assemble. At a more detailed level, it is deciding what desired features are critical and which are dispensable. Plus, generally you aren't starting with a tabula rasa but rather there is already a set of tools already in place or that are too near-and-dear to someone important to be ignored.
I'm sure trying to sell to me is exasperating. I want detailed technical information on a moment's notice. I'm routinely throwing out projects or configuations to be priced, with few if any actually going forward. At Codon I did exactly that part of the sales game, it was a lot of work and very frustrating to see so little ever come of it. I'm also a pain on software products and databases in insisting on hands-one trials. One database vendor never understood this, which is why I won't bother ever talking to that company again. Perhaps my only virtue is that I attempt to be unfailingly polite through the whole process. I suppose that counts for something.
Saturday, May 01, 2010
Asymptotically Approaching A Grok of Scala
When learning a new language, it is tempting to fall back on the patterns of a previous language. This isn't always a bad thing but is worth being aware of. For example, when I did a little bit of Python at Codon I realized that compared to someone else who had just learned Python, I tended to use dictionaries in my code quite frequently. That's a pattern coming from Perl. This was also reflected in my C# code, except there (to my glee!) I could use typesafe dictionaries. My code at Codon, in comparison with some other programmers, tended to be very Dictionary-rich (and they were always typesafe!) That's not saying my style was better, just distinctive and influenced by prior experience.
Now in some language transitions, there's very little of this -- because the new language is too different. SQL is an obvious example -- it's just not a procedural language and so I can't easily identify any of my SQL programming patterns which are influenced by prior languages.
But, if a programming not only supports but encourages a different style of programming, it is useful to recognize this bias and try to go outside it, and when you have a breakthrough it is wonderful. For me, to intuitively understand a subject is to "grok" it; Heinlein's invention is too rarely used.
I had that moment tonight with Scala. The assignment was to read genotype data out of a bunch of Affymetrix 6.0 CHP files from a vendor. Now, Affy makes available an SDK for this -- but it is a frustrating one. The C++ example code is all but a printf statement away from converting CHP to tab-delimited.
But I decided to make this a Scala moment. There's a Java SDK, but it is very spartanly documented -- there's really no documentation beyond what individual methods and classes do -- no attempt to help you grok the overall scheme of things.
Worse, the class design is inconsistent. One case: the example Java code parses an expression file and one key piece of information to get out is the number of probesets in the file, which is via the getHeader() method. Unfortunately, it turns out getHeader is defined in the specific class and not the base class, so code working on genotyping information needs to use a different approach. Personally, I'm already annoyed because I'd rather have an enumerator to step over the probesets rather than getting a count and asking for each one in turn -- but that is a point of style.
Okay, problem solved. The main part of the code reads in the data into a big HashTable (the dictionary-type generic class in Scala) -- that pattern again! Now I want to write the data out -- listing each genotype in a separate column with the 0th column containing the probeset name. So, I need to create a row of output values and then write it as a line to my file.
Version 1 is the straight old-style, what I used in Perl/C# and pretty much everything before it -- I initialize a Queue to hold the values I want to write on one line. Here out is a Java BufferedWriter which is writing to a file. The one significant Scala-ism is the code to write the line -- the reduceLeft function (bolded) is the equivalent here of a Perl join command to create the tab-delimited line
Now, on looking at this I had working code, which should be time to stop. But could I take it to a more Scala-ish form? That's a challenge, which I'm happy to find I succeeded at.
This version eliminates the queue -- an anonymous function (bold) simply generates a list which the reduceLeft trick consolidates. I had cheated before and loaded the probeset name onto the queue, so here I need to tweak the String.Format stuff to get that in.
Now, the question is -- is this better? One metric might be readability, and I'm not sure which I find more readable. The first is a style I'm used to reading and I tend to recognize the pattern -- or do I? If I revisit that code 6 months from now will I say "What is this queue for?". The second one is terser -- but is it a good terser? Perhaps if I start using that pattern repeatedly it will become second nature to read
Another would be performance -- which is tedious to measure but my guess is that since I am following the form suggested by the language, it is likely to optimize this better.
Ah, but after writing this entry I saw I could do better -- definitely cleaner. Instead of the explicit loop in the code I'll use the map function, which takes a series of values and applies a transformation on each. So I still have a long way to go before I can claim to grok Scala! I could blame this on being diverted away from Scala for a month plus (I'd actually created some code like the below before, now that I think about it)
It is worth noting that this final style is actually largely available in Perl, which has a map function and some other stuff to support this. I never really tried to work that way and personally I foresee all sorts of bugaboos from a lack of type safety. But I could have worked this way in the past.
One final note: I'm getting to like the way Scala can do a lot of compile-time type checking without my needing to clutter the code with lots of type annotations. C# is particularly bad about most type annotations being written twice, but even after cleaning that up Scala goes one further and infers many types. "sample" in both examples is strictly a String, but I don't have to declare that -- and so the code is stripped to nearly the bare essentials but I get a bit of proofreading
Now in some language transitions, there's very little of this -- because the new language is too different. SQL is an obvious example -- it's just not a procedural language and so I can't easily identify any of my SQL programming patterns which are influenced by prior languages.
But, if a programming not only supports but encourages a different style of programming, it is useful to recognize this bias and try to go outside it, and when you have a breakthrough it is wonderful. For me, to intuitively understand a subject is to "grok" it; Heinlein's invention is too rarely used.
I had that moment tonight with Scala. The assignment was to read genotype data out of a bunch of Affymetrix 6.0 CHP files from a vendor. Now, Affy makes available an SDK for this -- but it is a frustrating one. The C++ example code is all but a printf statement away from converting CHP to tab-delimited.
But I decided to make this a Scala moment. There's a Java SDK, but it is very spartanly documented -- there's really no documentation beyond what individual methods and classes do -- no attempt to help you grok the overall scheme of things.
Worse, the class design is inconsistent. One case: the example Java code parses an expression file and one key piece of information to get out is the number of probesets in the file, which is via the getHeader() method. Unfortunately, it turns out getHeader is defined in the specific class and not the base class, so code working on genotyping information needs to use a different approach. Personally, I'm already annoyed because I'd rather have an enumerator to step over the probesets rather than getting a count and asking for each one in turn -- but that is a point of style.
Okay, problem solved. The main part of the code reads in the data into a big HashTable (the dictionary-type generic class in Scala) -- that pattern again! Now I want to write the data out -- listing each genotype in a separate column with the 0th column containing the probeset name. So, I need to create a row of output values and then write it as a line to my file.
Version 1 is the straight old-style, what I used in Perl/C# and pretty much everything before it -- I initialize a Queue to hold the values I want to write on one line. Here out is a Java BufferedWriter which is writing to a file. The one significant Scala-ism is the code to write the line -- the reduceLeft function (bolded) is the equivalent here of a Perl join command to create the tab-delimited line
val q = new Queue[String]()
q.enqueue(probesetName)
for (sample<-sampleNames)
q.enqueue(ProbeSetMultiDataGenotypeData.genotypeCallToString(genotypes(probesetName)(sample)))
out.write(String.format("%s\n",q.reduceLeft(_ + "\t" + _)))
Now, on looking at this I had working code, which should be time to stop. But could I take it to a more Scala-ish form? That's a challenge, which I'm happy to find I succeeded at.
out.write(String.format("%s\t%s\n",probesetName,
(for (sample<-sampleNames)
yield ProbeSetMultiDataGenotypeData.genotypeCallToString(genotypes(probesetName)(sample)))
.reduceLeft(_ + "\t" + _)))
This version eliminates the queue -- an anonymous function (bold) simply generates a list which the reduceLeft trick consolidates. I had cheated before and loaded the probeset name onto the queue, so here I need to tweak the String.Format stuff to get that in.
Now, the question is -- is this better? One metric might be readability, and I'm not sure which I find more readable. The first is a style I'm used to reading and I tend to recognize the pattern -- or do I? If I revisit that code 6 months from now will I say "What is this queue for?". The second one is terser -- but is it a good terser? Perhaps if I start using that pattern repeatedly it will become second nature to read
Another would be performance -- which is tedious to measure but my guess is that since I am following the form suggested by the language, it is likely to optimize this better.
Ah, but after writing this entry I saw I could do better -- definitely cleaner. Instead of the explicit loop in the code I'll use the map function, which takes a series of values and applies a transformation on each. So I still have a long way to go before I can claim to grok Scala! I could blame this on being diverted away from Scala for a month plus (I'd actually created some code like the below before, now that I think about it)
out.write(String.format("%s\t%s\n",probesetName,
sampleNames.map(sample=>
ProbeSetMultiDataGenotypeData.genotypeCallToString(genotypes(probesetName)(sample)))
.reduceLeft(_ + "\t" + _)))
It is worth noting that this final style is actually largely available in Perl, which has a map function and some other stuff to support this. I never really tried to work that way and personally I foresee all sorts of bugaboos from a lack of type safety. But I could have worked this way in the past.
One final note: I'm getting to like the way Scala can do a lot of compile-time type checking without my needing to clutter the code with lots of type annotations. C# is particularly bad about most type annotations being written twice, but even after cleaning that up Scala goes one further and infers many types. "sample" in both examples is strictly a String, but I don't have to declare that -- and so the code is stripped to nearly the bare essentials but I get a bit of proofreading
Subscribe to:
Posts (Atom)