MongoDB and GridFS versioning
December 28th, 2011 by VvanGemertA few months ago we would still use Amazon’s S3 service to store Floorplanner’s designs in a XML based file, which we call FML (Floorplanner Markup Language). There were a few big issues with using this method. First of all, what happens when S3 goes down? We don’t have access to the data and our whole application will be useless. Another problem with S3 is that it’s slow, especially when using it inside a Rails action, because it’s using HTTP GET to retrieve the data.
For the future of Floorplanner we needed another solution to store the design files. After a lot research and trying out multiple file stores, XML and Document databases, we decided to go for MongoDB and in our case GridFS. MongoDB is a NoSQL document database and GridFS is a filesystem based on the document database principle. The best scenario would be to store our XML based files in a JSON format in MongoDB, so we can actually query all our designs on specific things like; which assets are most used; how many rooms does each project have on average.
In our situation this didn’t work well, we needed to parse the JSON code back to XML and from XML to JSON which is quiet heavy on the servers. Another problem is our own format, which relies on some hacks to make it work properly. Fortunately MongoDB provides an alternative to use, which is called GridFS (Grid File System). This way we’re able to add our FML files to Mongo and it allows us to make use of all the benefits Mongo has to offer. Mongo helps us to make our lives easier by using versioning, replication and increased speeds over S3.
Currently we are good with a Master/Slave replication, which was incredibly easy to set up (take that MySQL!). We had over 4 million files in our database and the slave was completely sync’d within a couple of hours. Although I do not encourage to do Master/Slave replication, it’s better to use Replica Sets, but it seems fine for our situation.
Versioning is build into GridFS by default. Every time you write a file with the same filename into GridFS it will be created as a new file and will also keep the old ones. If you retrieve the file it will get the lastest one. Now that’s magnificent and helps us by being able to retrieve old versions of designs if a users by accident clears his design and saves it for example. Versioning works by using the uploadDate timestamp. The lastest version is the one with the newest timestamp.
There is a fantastic Rails gem that helps us to use GridFS in just a few lines of code. There is just one little problem with that Ruby MongoDB driver gem. It doesn’t allow us to keep a limited number of versions. You can keep all the old versions or none at all, those are the two options. To solve this little issue I’ve created a monkey patch that allows you to keep a number of versions. In our system we store the last 10 designs to make sure our database size doesn’t grow out of control. But I’ve made it possible to set your own number of versions you want to keep.
# Monkey Patch for handling X number of versions module Mongo class GridFileSystem def open(filename, mode, opts={}) opts = opts.dup opts.merge!(default_grid_io_opts(filename)) if opts[:versions] && mode == 'w' versions = opts[:versions] - 1 opts.delete(:versions) end file = GridIO.new(@files, @chunks, filename, mode, opts) return file unless block_given? result = nil begin result = yield file ensure id = file.close unless versions.nil? self.delete do @files.find({'filename' => filename, '_id' => {'$ne' => id}}, :fields => ['_id'], :sort => ['uploadDate', -1], :skip => versions) end end end result end end end
To use this monkey patch the syntax changes a little bit. There used to be a delete_old option, which I removed because you can also do this with the new versions option, by specifying 1 version. Here’s an example of how to use the new option:
# Writing a new version versions = 10 @grid = Mongo::GridFileSystem.new(Mongo::Connection.new) @grid.open("filename", "w", {:versions => versions}) do |f| f.write "filecontent" end # Reading the lastest version file = @grid.open("filename", "r") {|f| f.read }
That’s how easy it is. To put back an older version I use the Mongo console to update the timestamp of the version I want to set back. If you have any questions or remarks add a comment or send me a email at: vincent [at] floorplanner.com.






