Tuesday, June 26, 2012

Qiagen Buys A Sequencer

Today's news (take your pick of Bio-IT or GenomeWeb for original stories) is that Qiagen has bought out Intelligent Biosystems (IBS), the company in nearby Waltham developing a platform using sequencing-by-synthesis chemistry licensed from Columbia University

Tuesday, June 19, 2012

Out Damned Spot Instance! Out I say!

A piece of advice for anyone in the bioinformatics world: get working knowledge of the Amazon EC2 cloud computing system.  Now, there is a lot of controversy over whether EC2 (or other cloud services) eliminate the need for a mongo local compute resource, but even if you like doing things at home there are multiple niches that Amazon can fill.  It can be your experiment sandbox, which can be quickly shut down if something goes haywire.  It can be overflow capacity, dealing with a sudden surge of compLocationute need.  It can also be a reserve for if your main system is experiencing hardware issues.  Or it can be a neutral zone for collaborating with someone outside your organization's walls.  Or it can be where you do your consulting work (approved, of course!) that is independent of your organization (or between organizations).  Or anything else; it's there and it's likely at some point you could use it, if not now then in the future.  Better yet, you can get your feet wet with their Free Tier of services, which of course is a hook to try to get you to consume more.

Don't be scared off by the wide array of different services Amazon offers.  You certainly need to understand only a few to get started, and many are really intended for e-commerce sites and the like.  I've probably used less than half a dozen services from the menu, though I'm sure there are a few more I could use profitably.

One catch with EC2 is that you are going to do a lot of low level UNIX systems administration, something I've generally avoided in my career.  I've been able to because I usually have a few UNIX gurus close enough by to do all that, and besides it's been in their job description and not mine!  The few times I have dabbled have been mixed.  At Harvard I once burned a day getting a printer back on the network, but was compensated by that lab's PI with a gift certificate for yummy bread.  On the other hand, at one of my employers I succeeded in disabling my server, which could only be restore by re-installing the OS.  Again, one reason to consider Amazon for a sandbox!

What do I mean by low level?  Well, with the nice web GUI you fire up a machine.  Note that any disk attached to that machine by default is (a) too tiny for real work and (b) will go away when you kill the machine.   If you want big, persistent storage you need to create an "EBS Volume".  With the GUI you create the volume and then attach it to the machine, but at that point it is useless.  Using low level UNIX commands you need to now format the drive, create an attach point, mount the drive and set the permissions.  If you want password-less SSH between nodes, that's a few more configuration file tweaks.  Not rocket science, but tedious to do time after time.

A past colleague and friend of mine recently let me know about STAR::Cluster, and this free software is amazing.  It automates not only the UNIX toil and trouble I found tedious, but other low level stuff I hadn't gotten around to yet.  For example, every EBS volume in the cluster is NFS-mounted to all the nodes, which is critical for some operations (though other tools, such as MIRA, are positively allergic to such setups, as the extra IO traffic kills performance).  Plus, your cluster comes loaded with useful cluster tools such as OpenMPI and the Sun Grid Engine job queuing system.

Each of these is useful for bioinformatics.  For example, OpenMPI is the framework for the nifty Ray assembler.  Ray can handle your really big de novo assembly jobs, as it allows you to spread the job out across multiple nodes.  In contrast, on Amazon you are very limited by tools such as Velvet because they can work in the memory of only a single machine, and the biggest machines at Amazon aren't very big (about 68Gb).  Celera Assembler can use the Grid Engine, which is pretty much essential with that assembler. Furthermore, under Amazon's pricing model to get big memory you must rent a lot of cores, and for a single core tool that's a bit of a waste.

So for now, I'm loving STAR::Cluster but forsaking spot clusters.  That is, until I figure out a way to divine the correct bidding strategy, which may require the services of a cauldron and some eye of newt.

STAR::Cluster has mostly behaved for me, but I have had a few hiccups in which nodes didn't quite come up as planned.  I don't know why, and in one case I reverted to doing the low level work to fix it (indeed, I finally learned how to NFS mount a volume).  In the other case, I couldn't figure out a solution and had to kill the damaged nodes.  Still, most times everything has gone as planned.

However, STAR::Cluster also tempts you with spot instances, which have not been productive for me.  Amazon's pricing is a 3-dimensional grid: where is the machine, what is its capability and which pricing scheme.  On the where side, most times you probably just want cheap, which tends to mean one of the US sites (their Asian sites are definitely about 10% more expensive to use).  It is useful to stay in one location, as only when EBS volumes are in the same zone as a compute instance can you attach (and then mount) that volume on that machine.

As noted above, capability spans a number of machine classes.  I tend to go for two of them.  The 32-bit instances are cheap (about the cost of a newspaper per day) and useful for maintaining a permanent presence for uploading & downloading files, but are under-powered for much else.  At the other end, I tend to use the premium-priced high-memory quadruple extra large instance, because this gets the most compute power and memory for the standard instances, which tends to be needed for the projects I'm offloading to Amazon like huge short read assembly or mapping efforts.  I haven't tried out the cluster compute instances yet, which are even pricier but may yield higher performance (faster networking and power) nor have I tried the GPU instances; both are likely in my future.

After these, Amazon offers three pricing schemes.  On demand instances are simple to use: you fire one up and pay for each hour you use it; make it go away and the meter stops turning (rounding up to the next hour, of course).  If you are using the system heavily, then a reserved instance involves an upfront payment but a lower per-hour cost.  Catch is, for each instance you want simultaneously you'll need to reserve another one.  The third scheme is interesting but can easily scorch your fingers: spot instances.

A spot instance is charged the current market rate for an instance of that type.  Much of the time, it's half the cost of an on demand instance, and when you have a cluster of big instances running at $45/day per node, that's not trivial.  However, you put in a bid for the maximum price you are willing to pay.  Should the spot price exceed that price, your instance can die instantly with no warning.  You can browse the prior history of a spot instance in your selected zone and get some idea, but so far I've been very unlucky.  Despite putting in spot prices well above the apparent previous price spike, new price spikes have bumped off my instances.

The big problem for me is that none of my applications can tolerate croaking in mid-operation.  Apparently there is a way to do this with Grid Engine, and apparently Ray can work off Grid Engine and probably Celera Assembler can be restarted automatically, but I'm not yet to the point of understanding how to do these.  So, having a cluster die late in a process is an expensive disaster, with the clock completely reset.  So, after multiple misadventures I've sworn off spot instances for now, which is probably costing the company significant dollars but now I'm not losing sleep -- and those aborted runs weren't free.

So STAR::Cluster lets you boil your data without a lot of toil and trouble


Friday, June 15, 2012

States Funding Companies: Always a Bad Idea

The annual BIO meeting will be in Boston next week, so the media is looking a little bit more at our sector.  Today's Globe has an Op-Ed from Geoff MacKay, the CEO of Organogenesis, titled "Keeping Mass in biotech race" (or, online, "Fueling the next wave of biotech growth" .  Reading it, I've been kicked out of my posting inertia, as it touts exactly the sort of strategies which I believe are mistakes, with recent events nearby providing evidence.

MacKay's lead-in is that Massachusetts has a thriving biotech industry and that this contributes in important ways to the economy of the state.  No argument there.  He goes on to describe how other states and countries are eagerly trying to lure companies away, which is certainly true.
When BIO Nebraska offers convention-goers Omaha steak tips at the end of a long day, or when BIOTECanada supplies Tim Horton’s doughnuts and coffee for thousands at breakfast, they are not just showcasing their hospitality. They are firing the opening salvo in an aggressive marketing campaign that may include strong financial incentives, tax breaks, lower labor costs, and, in some cases, a fairly convincing argument about quality of life benefits outside of our state.
MacKay knows this well, because as he continues his own company was heavily recruited a number of years ago, and only when Massachusetts laid out a package of financial incentives did Organogenesis commit to stay.

Now, I do not believe I'm a Mass biotech snob, and there are certainly many companies that do very well outside hub areas such as Boston, the Bay Area or the D.C. Metro area.  But, my suspicion is that many of those companies do well because they grew organically in their location, and I also believe there are many real costs to a business in operating outside of the hubs.  For some companies, the benefits outweigh the negatives, but that is even less likely if you move to such an environment.

A key advantage of being in a hub is a supply of experienced staff.  Biotech has its ups-and-downs, but because there are so many companies in the Boston area there are lots of folks who are looking for positions.  When you are the hiring manager, that means you can look carefully for the right fit yet find someone relatively quickly.  Big tech hubs like Boston also mean that there are more opportunities for spouses and life partners who are technically oriented, and while that doesn't describe my situation it does seem that an awful lot of senior biotech folks share their lives with other senior biotech folks.

I'm also suspicious that many CEO's know all this, and simply rationally play states against each other to their company's benefit.  However, there are plenty of examples of CEO's who aren't so sophisticated, with the former management of BiogenIdec serving as one data point.  A few years ago someone had the brilliant idea of moving the non-R&D folks outside the Cambridge headquarters to the tony town of Weston, which happened to be where the then-CEO lived.  Those who had to live with this decision rapidly discovered the sheer inefficiency of it, as while the organization chart might have shown a clean split the reality is that communication must flow throughout an organization, and communication is often most effective face-to-face.

As far as incentives, there is a danger that Massachusetts politicians will soon forget the recent blow-up of 38 Studios (indeed, I thought about titling this "38 Reasons States Shouldn't Bribe Companies to Stay", but who's going to read something that threatens you with 38 bullet points?).  Curt Schilling started and ended his major league baseball career in Boston, and played a key role in the 2004 drive to a World Series victory, most memorably while playing with a jury-rigged ankle repair that visibly bled during the playoffs.  At the tail end of his career, he became known both for loudly espousing a free-market approach to the economy (he was even mentioned as a possible candidate for political office) and for a deep interest in sophisticated video games.

Schilling founded a company which ended up being named for his uniform number, 38 Studios.  The goal was to produce a new generation of online multi-person video games.  After starting in Massachusetts, he successfully wangled a $75M loan guarantee from Rhode Island on the condition that the company move there and create a set number of jobs.  Amazingly, Massachusetts officials refused to make a sweeter offer.  38 Studios' first game arrived this past winter and was greeted with critical praise but relatively modest sales.  More importantly, in the meantime the governor's chair in Rhode Island switched from someone who had spearheaded the deal to someone who publicly opposed it.

This spring, it all blew up rather spectacularly.  Rhode Island realized that 38 Studios was in serious trouble, and was asking for an advance on some additional tax credits (which it could in turn sell).  Governor Lincoln Chafee, was clearly not in favor of further aid, though did consider it.  38 was also due to make a payment on one of the loans, and looked like it might miss and default on the loan.  To much fanfare, the company delivered a check for the amount -- but it soon came out that they had tried to quietly tell RI not to cash it immediately, because it would bounce!  Later, it would come out that the company hadn't been making payroll for several weeks.

The whole incident underscored multiple points of why governments should not be directly funding businesses.  Schilling had a passion for the business, but no experience.  A good VC (I'm not claiming these are universal or common, but this is my experience) would help guide the company; politicians are completely wrong in this goal.  Schilling would later complain that Chafee's dire pronouncements on the company had killed a potential financing deal, but one of the Business 101 lessons apparently never taught to #38 by his agent is that if you don't have contractual control over who says what, then anybody can say anything.  In any case, if public money is at risk then the governor is ethically obligated to speak up.

The state also had no business investing so much in a single risky company.  Just the financials looked ugly; 38's next game was at least a year from delivery and they were carrying hundreds of employees.  Given media reports, it seems not unlikely that 38 Studios was going to go $150M-$200M in the hole to develop the game, which could only be justified by absolutely chart-topping game sales.  RI was effectively also help fund economic activity in other states, as 38 Studios had acquired other companies along the way.

Massachusetts can't be smug about 38 Studios, as it bet a similar outsized amount in a green energy company called Evergreen Solar, which also went bust. Solyndra in California has been a national example.

If the pols really feel they must spend, there are good ways to spend that money.  Unfortunately, that spending is much more spread out and doesn't quickly generate a bunch of jobs which can be claimed to be the direct result of the legislation.  Of course, many such jobs really aren't -- many would have been created already.

One core area in need of investment is transportation.  The MBTA system is creaking along due to deferred maintenance and delayed rolling stock replacement, and probably needs billions of dollars.  A recent Globe front page graph showed clearly that MBTA ridership growth continues at a steady clip, but there hasn't been service growth to match.  Indeed, due to equipment shortages that spacing of trains on several lines is growing.  After the T, there is the expansion of the Hubway bicycle system beyond the small core region served now.  If you never use the T or Hubway, just remember that riders on these systems would probably be in cars competing with you on the roads.

Housing continues to be expensive in the Boston area for a variety of reasons.  While some of this is deliberate and chosen (such as towns wanting to manage their population densities), there are many opportunities for government to foster housing development in ways that will encounter little resistance.  Cleaning up brownfields and streamlining permitting processes are obvious ones.

There's also the local environment.  Workplaces are more appealing when they are near parks, restaurants and other amenities.  The Globe today also had an item on the conversion of individual parking spots to miniature parks, and idea that has a lot of merit as long as it isn't overdone.  Exploring ways to add more food and entertaining options, ranging from expanded night hours, additional liquor licenses for very small trendy bars and food truck zones, these are all the obvious purview of government.

Finally, there is that issue of training.  The area's premier universities and colleges play a key role in providing scientists, and many of the existing companies groom new executives.  Where the state should play a role is fostering training programs at the high school and community college level to prepare a healthy cadre of research technicians, which during the boom years at the end of the last century were in short supply.  Many of the skills that go into such training, such as keeping meticulous records, following procedures to the letter and learning how to capture and report data, all are valuable in non-biotech fields.

Perhaps that is my overall theme.  If government wishes to enhance the biotech industry, or the robotics industry, or the computer services industry or the myriad of other exciting sectors in the local economy, the wrong way is to try to throw money at individual industries or companies.  Targeted tax breaks and loan guarantees are rarely deployed well, are often timed badly, and benefit few outside those targeted.  Broad improvements to the transportation, park and educational infrastructures will pay dividends for every industry and all of the citizenry.  Let gullible states throw their money away luring a few gullible companies; invest instead in across-the-board improvements which will both boost the current crop of companies as well as enable new exciting industries that don't even exist yet.