In my previous post I talked about how we started out with only one dedicated server. This was of course a big risk. The first step we took to reduce this risk was to use multiple servers instead of only one. If there is one thing we’ve learned, it’s that a single points of failure is evil. And we had a bunch of those.
OK, but how do you split up a running system with a small team? We couldn’t do everything at once. It had to be done gradually. The idea was to tackle the biggest problems, the low hanging fruit, first. In our case it was our application server.
We used Mongrel to run our Ruby on Rails app and that didn’t run very smooth. It heavily leaked memory causing the system to crash every now and then. Before anything else, we needed to make this part more stable. How? By running two application servers on separate machines and by moving away from Mongrel.
We figured, if the app server itself is unreliable, then it’s better to have multiple app servers running. If one crashes, the other ones can handle the load. This gave us time to get the crashed one back up. That might keep the service online, but it didn’t fix the underlying problem, which was Mongrel. We had trouble managing its memory usage. The apps of 37signals ran on Phusion Passenger, so we gave that a try. While battling the memory load, we also switched from Apache to Nginx, because Nginx had a smaller memory foot print.
We decided to build a very simple, and light weight application server (called an app node). One that ran Linux, Nginx, Passenger and our Rails app. We’d start with two nodes and we would run new nodes when traffic grew. The next question was, where are we going to run this setup?
We didn’t know what kind of servers we needed for our light weight app nodes. And as a startup, we didn’t want to have big upfront costs of buying actual machines. Which also had the risk of buying the wrong machines. That’s why we gave the cloud, the AWS cloud, a try.
The ‘small instance’ seemed like a good choice. But it wasn’t. We spent countless hours figuring out why everything came to a halt when the nodes had to do some actual work. After a while our conclusion was that the instance was just too small. The 1.7GB memory was just too little to run our app. The 3.75GB of the ‘medium instance’ did a much better job.
Unfortunately, the small instances weren’t our only problem. Now that we had two app nodes running we needed to spread the requests between them. We needed a load balancer. The with the load balancer from AWS, Elastic Load Balancing, was that it didn’t support naked domains and no HTTPS. ELB supports it both now, but not at that time. We fixed this by firing up a small instance with Nginx and its load balancing feature.
Luckily it was worth all the effort. We had a load balancer in place that spread the traffic between the two app nodes, which were connected to our database. Nginx and Passenger were running nicely together and use a lot less memory than the previous setup. However, the database was still running outside of the AWS cloud. That’s the topic for my next post.