In my previous post about scaling our product I talked about how a CDN helped us to deliver static content faster to our users. Especially since we have lots of users from all over the world. In this post I’m going to talk a bit about how moving tasks to the background helped us to scale our product further.
2D & 3D floorplan images
On the floorplanner.com website people can easily create floorplans in an online editor. Once they are finished creating a floorplan they can export it as a 2D image or as a 3D image.
Our floorplan images are created server side. In the beginning we used the same server instances that run our application nodes for the creation of our floorplan images. There were two problems with this setup. First, when the process of creating an image was stuck or leaked memory — which was certainly not rare — it could take down the whole server instance. Not cool.
Secondly, this system is synchronous. Creating a floorplan image is something that takes a bit of time — minutes not milliseconds. Such a long running process locks an application instance until it’s finished. For example, if we have 12 application instances and they are busy creating 12 floorplan images, then no new requests can be served — making the whole website unavailable.
Moving tasks to background
We had to find a way to handle these long running processes in a different way. The standard practice is to move them to the background. How? By introducing a queueing system that allows us to handle tasks asynchronously.
The system now works like this. When a user requests a 2D or a 3D floorplan image, an app node add a message to a queue. We use our own Ruby gem celerb for that. A worker monitors the queue (using Celery) and when it finds a message, it processes it. The image is stored on S3 and the link to the image is sent to the user by email.
Still no ideal solution
If we’d had to build a queueing system now, we would probably look at other options, like Redis or SQS. Why? First, the system is not obvious — AMQP in particular. We need a simple queue, but have to deal with server-side objects like ‘queues’, ‘exchanges’, ‘routing keys’ etc. Every app instance maintains a number of persistent connections to the AMQP broker. Plus Celery adds some complexity on top of that. The way Celery uses AMQP is also not ideal (separate queue object for each result).
Secondly, Celery needs workers written in Python – another language do deal with, and, as you know we use Ruby for our applications. And the last, but most important thing is that we miss good monitoring. We only have logs from workers, no way to see if some particular task is in the queue and how many tasks are in front of it.
Although the system is not ideal, it’s good enough at the moment. We have plans to change the way we create the 2D and 3D floorplan images in the future. And when we do that, we will probably change the queueing system too.
The most important thing to take away from this post is that long running processes should be handled asynchronously.
This post is the fourth in a series about my experiences in scaling our product.