Don't be scared off by the wide array of different services Amazon offers. You certainly need to understand only a few to get started, and many are really intended for e-commerce sites and the like. I've probably used less than half a dozen services from the menu, though I'm sure there are a few more I could use profitably.
One catch with EC2 is that you are going to do a lot of low level UNIX systems administration, something I've generally avoided in my career. I've been able to because I usually have a few UNIX gurus close enough by to do all that, and besides it's been in their job description and not mine! The few times I have dabbled have been mixed. At Harvard I once burned a day getting a printer back on the network, but was compensated by that lab's PI with a gift certificate for yummy bread. On the other hand, at one of my employers I succeeded in disabling my server, which could only be restore by re-installing the OS. Again, one reason to consider Amazon for a sandbox!
What do I mean by low level? Well, with the nice web GUI you fire up a machine. Note that any disk attached to that machine by default is (a) too tiny for real work and (b) will go away when you kill the machine. If you want big, persistent storage you need to create an "EBS Volume". With the GUI you create the volume and then attach it to the machine, but at that point it is useless. Using low level UNIX commands you need to now format the drive, create an attach point, mount the drive and set the permissions. If you want password-less SSH between nodes, that's a few more configuration file tweaks. Not rocket science, but tedious to do time after time.
A past colleague and friend of mine recently let me know about STAR::Cluster, and this free software is amazing. It automates not only the UNIX toil and trouble I found tedious, but other low level stuff I hadn't gotten around to yet. For example, every EBS volume in the cluster is NFS-mounted to all the nodes, which is critical for some operations (though other tools, such as MIRA, are positively allergic to such setups, as the extra IO traffic kills performance). Plus, your cluster comes loaded with useful cluster tools such as OpenMPI and the Sun Grid Engine job queuing system.
Each of these is useful for bioinformatics. For example, OpenMPI is the framework for the nifty Ray assembler. Ray can handle your really big de novo assembly jobs, as it allows you to spread the job out across multiple nodes. In contrast, on Amazon you are very limited by tools such as Velvet because they can work in the memory of only a single machine, and the biggest machines at Amazon aren't very big (about 68Gb). Celera Assembler can use the Grid Engine, which is pretty much essential with that assembler. Furthermore, under Amazon's pricing model to get big memory you must rent a lot of cores, and for a single core tool that's a bit of a waste.
So for now, I'm loving STAR::Cluster but forsaking spot clusters. That is, until I figure out a way to divine the correct bidding strategy, which may require the services of a cauldron and some eye of newt.
STAR::Cluster has mostly behaved for me, but I have had a few hiccups in which nodes didn't quite come up as planned. I don't know why, and in one case I reverted to doing the low level work to fix it (indeed, I finally learned how to NFS mount a volume). In the other case, I couldn't figure out a solution and had to kill the damaged nodes. Still, most times everything has gone as planned.
However, STAR::Cluster also tempts you with spot instances, which have not been productive for me. Amazon's pricing is a 3-dimensional grid: where is the machine, what is its capability and which pricing scheme. On the where side, most times you probably just want cheap, which tends to mean one of the US sites (their Asian sites are definitely about 10% more expensive to use). It is useful to stay in one location, as only when EBS volumes are in the same zone as a compute instance can you attach (and then mount) that volume on that machine.
As noted above, capability spans a number of machine classes. I tend to go for two of them. The 32-bit instances are cheap (about the cost of a newspaper per day) and useful for maintaining a permanent presence for uploading & downloading files, but are under-powered for much else. At the other end, I tend to use the premium-priced high-memory quadruple extra large instance, because this gets the most compute power and memory for the standard instances, which tends to be needed for the projects I'm offloading to Amazon like huge short read assembly or mapping efforts. I haven't tried out the cluster compute instances yet, which are even pricier but may yield higher performance (faster networking and power) nor have I tried the GPU instances; both are likely in my future.
After these, Amazon offers three pricing schemes. On demand instances are simple to use: you fire one up and pay for each hour you use it; make it go away and the meter stops turning (rounding up to the next hour, of course). If you are using the system heavily, then a reserved instance involves an upfront payment but a lower per-hour cost. Catch is, for each instance you want simultaneously you'll need to reserve another one. The third scheme is interesting but can easily scorch your fingers: spot instances.
A spot instance is charged the current market rate for an instance of that type. Much of the time, it's half the cost of an on demand instance, and when you have a cluster of big instances running at $45/day per node, that's not trivial. However, you put in a bid for the maximum price you are willing to pay. Should the spot price exceed that price, your instance can die instantly with no warning. You can browse the prior history of a spot instance in your selected zone and get some idea, but so far I've been very unlucky. Despite putting in spot prices well above the apparent previous price spike, new price spikes have bumped off my instances.
The big problem for me is that none of my applications can tolerate croaking in mid-operation. Apparently there is a way to do this with Grid Engine, and apparently Ray can work off Grid Engine and probably Celera Assembler can be restarted automatically, but I'm not yet to the point of understanding how to do these. So, having a cluster die late in a process is an expensive disaster, with the clock completely reset. So, after multiple misadventures I've sworn off spot instances for now, which is probably costing the company significant dollars but now I'm not losing sleep -- and those aborted runs weren't free.
So STAR::Cluster lets you boil your data without a lot of toil and trouble