Techblog

Floorplanner's adventures in foam

Tag: aws

Scaling our product – moving tasks to background

In my previous post about scaling our product I talked about how a CDN helped us to deliver static content faster to our users. Especially since we have lots of users from all over the world. In this post I’m going to talk a bit about how moving tasks to the background helped us to scale our product further.

2D & 3D floorplan images

On the floorplanner.com website people can easily create floorplans in an online editor. Once they are finished creating a floorplan they can export it as a 2D image or as a 3D image.

image

Synchronous setup

Our floorplan images are created server side. In the beginning we used the same server instances that run our application nodes for the creation of our floorplan images. There were two problems with this setup. First, when the process of creating an image was stuck or leaked memory — which was certainly not rare — it could take down the whole server instance.  Not cool.

image

Secondly, this system is synchronous. Creating a floorplan image is something that takes a bit of time — minutes not milliseconds. Such a long running process locks an application instance until it’s finished. For example, if we have 12 application instances and they are busy creating 12 floorplan images, then no new requests can be served — making the whole website unavailable.

Moving tasks to background

We had to find a way to handle these long running processes in a different way. The standard practice is to move them to the background. How? By introducing a queueing system that allows us to handle tasks asynchronously.

For this queueing system we chose to use Celery with AMQP as a backend. AWS offers Simple Queue Service (SQS), but I can’t remember why we didn’t pick that…

Asynchronous setup

The system now works like this. When a user requests a 2D or a 3D floorplan image, an app node add a message to a queue. We use our own Ruby gem celerb for that. A worker monitors the queue (using Celery) and when it finds a message, it processes it. The image is stored on S3 and the link to the image is sent to the user by email.

image

Still no ideal solution

If we’d had to build a queueing system now, we would probably look at other options, like Redis or SQS. Why? First, the system is not obvious — AMQP in particular. We need a simple queue, but have to deal with server-side objects like ‘queues’, ‘exchanges’, ‘routing keys’ etc. Every app instance maintains a number of persistent connections to the AMQP broker. Plus Celery adds some complexity on top of that. The way Celery uses AMQP is also not ideal (separate queue object for each result).

Secondly, Celery needs workers written in Python – another language do deal with, and, as you know we use Ruby for our applications. And the last, but most important thing is that we miss good monitoring. We only have logs from workers, no way to see if some particular task is in the queue and how many tasks are in front of it.

Although the system is not ideal, it’s good enough at the moment. We have plans to change the way we create the 2D and 3D floorplan images in the future. And when we do that, we will probably change the queueing system too.

The most important thing to take away from this post  is that long running processes should be handled asynchronously.

Previous posts

This post is the fourth in a series about my experiences in scaling our product.

  1. Application nodes
  2. Cloud database
  3. CDN
  4. Moving tasks to background

Scaling our product – CDN

I’m writing a series of posts about how we scaled our product at Floorplanner.com. We started with a single server, then we created application servers in the cloud and after that we moved our databases to the cloud. This post is about our next step, about how we improved the delivery of our static files.

What files?

The floorplan drawing tool (the Editor) is a very important part of our product. It’s the place where all the floorplans are created. It comes with a 3D view, which is actually a separate application. Both apps are Flash application, so they run on the client side in the browser with the Flash Player.

Inside the 2D editor a floor plan can be decorated with furniture items and floor textures. These are all separate files in our system and we have thousands of them. We use separate files to keep the system flexible. A disadvantage of that is that it takes a lot of HTTP requests to load a floorplan with 50+ furniture items, but that’s another subject. Let’s move on.

Our file server

All these files were stored on the one machine we had. Being a file server was just another task it had to do. After we moved the website to the application nodes and the databases to the cloud, the old machine was still serving our files. The problem with this setup was that is was slow and it was a single point of failure.

The machine was located in the east of the US. For visitors from inside the US the time it took to download the 2D editor and the related files was acceptable. In Europe it was a bit slower, but also good enough. South America (especially Brazil) and Asia was a completely different story. It took 10-20 times longer to get the files to our users there. Distance matters. There was only one solution, we had to make sure that the files would be closer to our users.

CloudFront & S3

Luckily this problem was already solved by others, we had to use a content distribution network, aka CDN. Amazon had it’s own, called CloudFront. Before we could use that, we had to put our files on S3 (Simple Storage Service), the file storage solution from AWS.

“Objects are redundantly stored on multiple devices across multiple facilities in an Amazon S3 Region.” This directly solved our single point of failure issue. Although this is true in our experience, there have been some reports on S3 outages. “The number of objects you can store is unlimited.” That’s a nice plus, no need to worry about that. The number one reason for a crashing server was a full hard drive.

CDNetworks

However, we were seeing rapid growth in Brazil and Australia and CloudFront didn’t have any edge servers there (now they have one in Sao Paolo). So we still had a latency problem in those countries. Eventually that made us switch to CDNetworks which has 100 edge locations worldwide on 6 continents.

Scaling our product – Cloud database

This is the third post in a series about how we scaled our product at Floorplanner. The first one was about how it all started with one dedicated server. In the second post I talked about how we reduced some of the risk by running two application servers on separate machines in the cloud.

Scaling our product - how it all started

This post is about our database. The application servers and the load balancer were running in the AWS cloud, but the MySQL database was still on the dedicated server outside of the AWS cloud. Not really ideal. It had to be in the AWS cloud, but we wanted a bit more.

Now that our app servers were running very smooth, our new biggest risk was our database. I can’t remember if we made backups at all back in the days, but if we were, the backups were probably stored on the same disk anyway (because we only had one). We needed a proper backup system too.

Percona Server

At that time, AWS didn’t had the Relational Database Service (or it was very expensive …), so we had to do it ourselves. We got a very good tip from Valery Visnakovs from Ask.fm about using the Percona MySQL Server for our setup. Impressed by the Percona Server and their MySQL Performance Blog we choose to take it for a ride.

At the same time, we got some other good advice. Cody Faustner from Shopify told me “The very worst part of AWS, is disk I/O, so whatever you can do to minimize disk activity will go a long way.” Related, and even more important was his tip to have a server with lots of memory, enough for MySQL to load the whole working set into memory. More info about that on the MySQL Perfomance Blog.

To meet the memory demand we picked a High-Memory Extra Large Instance and installed Percona Server on it.

Backup strategy

With the new database server in place we had to think about our backup strategy. We started out with the idea to backup our database on a daily basis. But Floorplanner was being used by more and more people every day and it’d be a disaster if we’d loose a whole day of data. So we needed a better way to backup our data, a real time way. That meant replicating our data to another (mirror) server.

We created the first database server in the US east region, the same place as all the other servers. To protect our data, we didn’t want our database backup to be stored in the same region. Therefor we created a second database server in the EU west region and replicated our data there. I’ll write a short post soon about the steps we took to setup this replication.

When for some reason our main database would loose or cripple data, these changes would directly be synced to our backup database, removing important data there too. So we figured, we still needed a daily backup to reduce this risk. Amazon’s Elastic Block Store (EBS) has the perfect feature for this, snapshots. With it you can easily create a snapshot of a whole EBS block. Therefor we used and instance with EBS for the backup database server. Don’t forget to clean up your snapshots after a while, say a couple of days, otherwise your AWS bill will grow.

See below a new scheme including the database changes.

Scaling our product - cloud database

Scaling our product – How it all started

Scaling is tough. At least, that is my conclusion after doing it for a while. According to the Startup Genome, you must be able to grow your company in five core dimensions if you want to be successful. The five core dimensions are:  customers, product, team, business model and funding. I’m going to write a bit about one of those: how we scaled our product.

I assume that there are companies out there that are struggling with the scaling of their product, just like we did (and still are doing). I’m starting a series of posts about how we scaled our product. The ideas we had, the mistakes we made and things we learned. And although every company and every product is different, I hope this information will be helpful.

Floorplanner started on one machine, one dedicated server somewhere in the US. It ran Linux, Apache, Ruby on Rails and MySQL. Besides that, it also was our file server; serving the 2D and 3D apps, all the furniture items and other content. In short, it simply did everything.

We all knew that it was far from ideal, but we were in the early stage of our company. We just launched the first version of our product and we were working hard to make it better. It was a risky decision to keep everything on one server, but it was a decision we consciously made. We didn’t want to spend our time building a very scalable system, since we didn’t even know if we could make money with our product.

Yes, we wanted to build a sustainable business from the start. First a product, then finding our market fit, then scaling. It costed us some sleepless nights when the system was acting up, but I think it was the right approach for us. 

In the next post I’ll talk about our first improvement. What we did to reduce the risk of having only one server.

© 2014 Techblog

Theme by Anders NorenUp ↑