We’re just at the tail end of shifting our infrastructure from a static VPS to Amazon EC2. Someone much smarter than I designed a server self configuration process. Here’s what we came up with.
The final product
Here’s all it takes for us to spin up a fully functioning web node now :
- Take existing AMI, create it with the default groups (web, worker, etc), and the default size
- The AMI includes a startup script to do a git clone of the repository.
- The startup script runs exactly 1 script within that repository, which can be used for anything. We use it for a bundle install, installing a couple of server dependencies that aren’t in the AMI (yet), mount the EBS block if it’s a db_role, assign the elasticip (or load balancer) if a web role
- That script also launches god for each role, which is how we make sure our web servers, background processing tasks, and database engines are running
- Less maintenance overhead : Deploy new server == Deploy new code. There is only one set of scripts to maintain. There is no way to end of with the “oops, I forgot to tell you I would always have to run rake blah blah on each node after reboot” problem
- Auto scaling : Using tools like scalr or amazon auto-scale, we can trigger new nodes to spin up or shut down without human intervention.
- Identical environments for test, deploy, staging, and production
- Everything is in git : Everything is a rake task or config script in the one repository / project for now. Bringing a new engineer on board is less painful than it used to be. As we grow, we may need to split this out so that the environment and application can be managed independently, but for now this is a positive
- Capistrano deploy via symlink switching. We do our best for now and don’t reboot all servers at the same time, but it’s still riskier than the capistrano approach which deploys then switches the symlink at the last moment. We need to be careful not to suffer downtime during a deploy. We’re may end of using capistrano or our own process for this to deploy minor tweaks without a reboot (by dynamically querying all running instances to get the :roles).
- Capistrano rollback. We can’t easily rollback if a deploy was a bad idea. We love TDD so hope this isn’t a problem, but issues always slip through so we need to be careful to deploy via branches.
- Painful linux startup scripts. Writing code in a fully loaded ruby environment is easier than shell scripting, especially when the shell script is run at startup (without the environment configured). This took a while to get right. As we add complexity I hope this doesn’t slow us down.