Unideal Uncritical Thinking
Less cowbell.

Thoughts on EC2

Tags: , , , , , , , , ,

The group I work with is a technology incubator group within a larger company. We are tasked with making new products with revenue roadmaps but we  also consider ourselves as a testbed for new technology that can be spun off and used by other divisions.
Clouds
When it came time to decide our hosting model for our shiny new business information platform EC2 was just really hitting a tipping point in terms of reliability. In this case tipping point means there was about a year of data showing the service works as advertised and there was a shortage of Fear, Uncertainty and Doubt hanging over the thing. Our project was complicated but finite with a modern API design so it was a good demo for this. I did some testing and talked with smarter-than-me peers and decided it was stable enough to run with. We examined it from a business perspective and came up with hedge logic that “if for some reason we don’t think it will work then we just move it on to local unix boxes – nothing lost”.

My experiences have been very good. I’ve found, more than expected, that this model of doing things really is a different paradigm from the in-house hosting or typical outsourced hosting.

In the EC2 approach we think of machine instances as transient. We can build them up or take them down as needed as opposed to a model that assumes a fixed number of physical machines on hand. I’ve had machines running non-stop for over a year on EC2 without issue but the general advised development approach on EC2 is to assume that your machine instance can disappear at any moment and will need to be reconfigured from scratch, possibly at a different geographic location. At first this seemed like a burden but I find in hindsight it has added a level of rigor to our systems administration approach we would not have otherwise developed. In our target production environment we can assemble or destroy any machine image at will. We take a lot of care in crafting our machine images – in the event of a massive traffic increase we will be able to simultaneously launch many instances of our database and front-end web servers which can easily configure themselves and start answering requests. If you have predictable traffic loads you can use this to your advantage and schedule the expansion and contraction of your computing needs.

We found by accident that this flexibility has some interesting side effects. If we find a need for another machine in play on our cluster we can launch it nearly instantly and get rid of it just as fast. Example: We had a massive one-time data processing task that we needed to run. We rented an expensive ($1 per hour)  “High Compute Extra Large” instance from the cloud, launched it and configured it within 15 minutes. We spared our other systems and then flogged this thing for 3 days straight to churn exclusively on this problem. Then we returned it to the cloud like a rental car with smoking brakes. Total cost of $75 to rent a $5k machine for as long as we needed it. In this sense it adds an interesting tool to the toolbox – if you need something custom you can make it happen with low impact on schedule or budget.

Along these same lines : If we decide one of our existing machines needs greater cpu or memory we can detach the permanent storage, launch a version of that image with more juice, let it assume the identity of the old machine and move on. This is interesting because we can avoid buying overkill up front and upgrade hardware in-place. New solutions like these present themselves as we get further into the project and run into new problems.

EC2 instances do not become obsolete, they are essentially just blocks of computing potential rented by the hour. As price hardware drops the price of a given computing unit will drop. Amazon has a pretty good market but lots of players will be coming along with similar products to keep pricing honest.

There are a lot of interesting sub-services you can tap into with the Amazon stuff. Elastic Block Stores (essential), S3, Content Delivery Network, multiple availability zones – they launch additional services all of the time. Like any outsourced model we are at the mercy of their reliability and service. We are purchasing their ‘silver’ support model which gives us some guaranteed support channels to talk directly with technicians – this adds 10% on top of out fees. Gold plan is 20% and that includes guidance and some consulting, I believe. People are building entire business on it and existing businesses are leveraging it in hybrid models. I think it is a stable longterm bet.

Initially my calculated-on-a-napkin cost analysis dictated that EC2 might be cheaper for the first 12-24 months but a traditional model could likely save money over time. Recently Amazon introduced the concept of reserved instances that should mitigate costs further (30% – %60 by committing to instances over time). There are burgeoning projects popping up to abstract the well designed ec2 API layer for usage outside of the EC2 network. I think this will really shut down a lot of my personal reservations about tying my horse to EC2. If I had the option to transparently launch my application on multiple alternative virtualization infrastructures, including local hardware, arguments against doing so are tough to make.

Negatives? New stuff to learn means new mistakes to make. Even up-to-date sysadmin skills need a bit of polishing to get this stuff right. How much do I trust Amazon? Must trust them a lot in order to stake my product’s uptime completely on their ability to keep EC2 running. I’ve got 10 instances of various sizes all churning in their cloud. No serious problems so far but I always get a little nervous whenever I need to reboot an instance.

I saw a panel at sxswi’09 where Werner Vogels spoke alongside reps from Microsoft Asia and Google Gears. Google Gears seems too confining for a ninja usage scenario and Microsoft never makes anything I want to use. Werner was interesting to say the least. He re-emphasized that one of the greatest benefits of this model is forced rigor in image creation. I agree completely. I know I’m too often a duct-tape Perl hacker and this method of machine creation just forces you to Do The Right Thing rather than glue things together and handhold them.

There are some interesting business springing up in this ecosystem. RightScale has been plugging away from the beginning and has innovated some terrific turnkey images that cool me out. Scalr project is also a winner.

Tags: , , , , , , , , ,

Leave a Reply

© 2010 Unideal = Steve Berry, Austin, TX. All Rights Reserved.

This blog is powered by Wordpress and borrows from Magatheme by Bryan Helmig.