Techblog

Tech Blog

Contributions by Willem van Bergen

About Willem van Bergen

My website, online since 1999: www.vanbergen.org. I can also be found on Flickr, GitHub, LinkedIn and - of course - on Google.

15 August Rails log analyzer

My friend Bart from movesonrails.com just blogged about Rails log analyzer, a command line tool to get performance statistics for your Rails application by parsing its log file.

What started as an exercise for me to write a command line ruby program, has been extended and improved by Bart to be actually useful! We decided to release it under an MIT license. You can found the source on github. The project’s wiki contains usage information and an example of the output it will produce.

29 July Active OLAP released

Remember my post about easy OLAP queries in Rails? I rewrote it almost completely and published is as a Rails plugin for anyone to use on github! It is now called: Active OLAP.

Although it is a complete rewrite, the API I demoed in my previous post should still work with some small changes. The most important: you have to enable it for every class you want to use it on with the enable_active_olap method. You can provide a block to this method with dimension definitions, but is not mandatory:

class User < ActiveRecord::Base
 
  enable_active_olap do |olap|
 
    # create a simple dimension on the account_type field
    olap.dimension :account_type
 
    # create a dimension with custom categories
    # the order of the categories will be kept in the results 
    # if you use an array to define the categories.
    olap.dimension :nationality, :categories => [
      [:usa, { :country => 'US' }],
      [:china, { :country => 'CN' }]
      # other is automatically added
    ]
 
    # Easily create a trend dimension
    olap.dimension :created_daily, :trend => {
      :timestamp_field => :created_at,
      :period_length => 1.day, 
      :period_count => 20
    }
  end
end

Now, we can use these dimensions for our OLAP queries. Multiple dimensions are supported too!

# simple query
@result = User.olap_query(:nationality)
# @result[:usa] == 123, @result[:china] == 456, @result[:other] = 789
 
# do drilldown using will_paginate to paginate the results
# olap_drilldown is implemented as a named_scope
@users = User.olap_drilldown(:nationality => :china).paginate(:page => 1)
 
# multiple dimensions!
@result = User.olap_query(:nationality, :created_daily)
@users = User.olap_drilldown(:nationality => :china, 
                        :created_daily => :period_19)

I am working on a generic controller that can easily be added to your Rails project. Just define dimensions for your models and the controller will let you execute OLAP queries and display the results as a table or a graph.

Keep an eye on this weblog or the github project if you want to stay up-to-date! Or, contact me if you have questions, suggestions or want to help out.

26 July Easy search with ActiveRecord

A couple of minutes ago I released scoped_search, a Rails/ActiveRecord plugin that makes it easy to search your models. It is very easy to use:

  1. Install the plugin in your vendor/plugins directory from http://github.com/wvanbergen/scoped_search
  2. Define in what fields your model should be searched by calling
    searchable_on :some, :field, :names
  3. Find your records by calling search_for("query keywords")

That’s all! A short example:

class Project < ActiveRecord::Base
  searchable_on :name, :description
end
 
Project.search_for("search keywords").each do |project| 
  puts project.name
end
 
# SELECT * FROM projects WHERE 
#      (name LIKE '%search%' OR description LIKE '%search%') 
#  AND (name LIKE '%keywords%' OR description LIKE '%keywords%')

This functionality is completely build upon named_scope. The search_for method is actually a named scope that was created by the call to searchable_on. Because these scopes can be chained, this offers some great possibilities.

For example, in Floorplanner, we only want you to search on the projects you have access to. We have implemented this access logic in another named scope. The calls can simply be chained:

class Project < ActiveRecord::Base
  searchable_on :name, :description
 
  named_scope :accessible_by, lambda { |user| ... }
  named_scope :published, :conditions => 'published_at IS NOT NULL'
end
 
@projects = Project.accessible_by(current_user).published.search_for('query')
@projects.each { |project| ... }

This plugin is released under the BSD license, so please use it for any purpose you see fit. There are some TODO’s: you currently can not search on fields in other tables, and splitting the search string into keywords is very basic. Please contact me if you have implemented any of these features and you are willing to share them! Do not hesitate to contact me in case or problems either.

Update: I added support for quotes and the minus sign to the query language:
Project.search_for('willem -"van bergen"').count

19 July Snack 2.0

Earlier this week we discussed a fast food snack that is available in Rotterdam called “Kapsalon”, literally “Hairdresser’s”. According to the urban legend, the name came into existence after employees from a hairdresser’s composed their favorite meal at the shoarma place next door. The “calorie bomb” contains french fries, shoarma, cheese and lettuce, all thrown together. Unboxing pictures of it can be seen here.

Within a couple of months, it became rather popular in Rotterdam and most shoarma places include the dish on their menu, next to döner kebab and Turkish pizza. At Floorplanner we hope it will spread and become a national phenomenon. Not because the dish is so tasty or healthy, because it is not. It is, however, a very buzzword compliant meal:

  • It could be described as a mashup, as it just consists of some existing dishes trown together in a unique manner.
  • It is user-generated content, as the customers of the shoarma place invented the dish instead of the shoarma shop itself.
  • Its popularity is because of a grassroots campaign, instead of a major marketing undertaking by one of the big Dutch snack producers like Mora or Beckers.

It’s almost a shame that the birth of this phenomenon is taking place in the city of Feyenoord instead of the city of AJAX. ;-)

14 July Easy OLAP queries in ActiveRecord

Because I love statistics so much, I decided to add some neat statistics functionality to the Floorplanner administration interface, so we can get better insight in what is going on. Instead of writing complete OLAP SQL queries myself and adding a custom interface for each one of them so our management can use them (yes Jeroen, that means you!), I built an ActiveRecord extension to ease the work. Right now, I only have to define some categories, and it automagically generates the right SQL query to generate charts and tables with the number of records that fall in each category. Moreover, by clicking on these numbers, I can drill down to the individual records.

I can define the categories like this:

olap_definition = { :categories => {
  :project_is_private   => { :public => false, :publishd_at => nil },
  :project_is_public    => { :public => true,  :publishd_at => nil },
  :project_is_published => 'projects.published_at IS NOT NULL'
}}

Not too hard, was it? Now, I can easily feed this to Project.olap_query:

@query_result = Project.olap_query(olap_definition) 
# @query_result == {
#   :project_is_private   => 123,
#   :project_is_public    => 456,
#   :project_is_published => 3,
#   :other                => 2
# }

Note that the category other is added automatically, but can be omitted if you wish. (I found that the other-category is nice to spot data integrity problems in your dataset you didn’t think of beforehand). The result can be used to create a table with the results, plot a pie chart with the Google Charts API. Because this setup is completely generic, this functionality only has to be written once. DRY!

The SQL for other-category is “calculated” by OR-ing all the categories and checking whether the result is false, or NULL. The check for NULL is necessary if you have NULL-values in your table: this is a weird characteristic of SQL that defines that TRUE AND NULL equals NULL (see Wikipedia).

The actual SQL query for this example would be:

SELECT 
  SUM(projects.public = 0 AND projects.published_at IS NULL) AS project_is_private,
  SUM(projects.public = 1 AND projects.published_at IS NULL) AS project_is_public,
  SUM(projects.published_at IS NOT NULL) AS project_is_published,
  SUM( NOT (
    (projects.public = 0 AND projects.published_at IS NULL) OR
    (projects.public = 1 AND projects.published_at IS NULL) OR
    (projects.published_at IS NOT NULL)
  ) OR (
    (projects.public = 0 AND projects.published_at IS NULL) OR
    (projects.public = 1 AND projects.published_at IS NULL) OR
    (projects.published_at IS NOT NULL) IS NULL)) AS other
FROM projects

Some notes about this query:

  • It is complety built using the fragments from the categories. The fragment for the other-cagegory is a little verbose, but what do I care? It works and is generated automatically! :-)
  • Note that a record can be in multiple categories, depending on the category definitions. The other category only contains records that conform to none of the provided categories.
  • SUM is used in stead of COUNT. This way, I can query all the categories at once and it solves the problems with NULL-values, while keeping my WHERE and GROUP BY clause nice and clean :-)
  • The query is built completely using ActiveRecords find method by using anonymous scopes. Therefore, Rails 2.1 is required, but this makes some neat tricks possible as well.

I also have a Project.olap_drilldown method that I can use to find the individual projects in a category:

@projects = Project.olap_drilldown(olap_definition, :project_is_public)
# SELECT projects.* FROM projects 
# WHERE (projects.public = 1 AND projects.published_at IS NULL)
 
@projects.each do |project|
  puts project.name
end

Because this functionality is built on anonymous scopes, it offers some interesting additional functionality. You can use your own scopes to limit the input dataset

class Project < ActiveRecord::Base
  named_scope :recent, lambda { { :conditions => 
              ['created_at > ?', Time.now - 7.days]} }
  ...
end
# This will add a WHERE-clause to the OLAP query
results = Project.recent.olap_query(olap_definition)
 
# Or, use :conditions for the same effect
results = Project.olap_query(olap_definition.merge(
            :conditions => ['created_at > ?', Time.now - 7.days]))

As I noted before, the GROUP BY-clause is not used. I already built an extension to use the GROUP BY clause to group the results in periods of a given timestamp field of the model (e.g. created_by). When I pass the result of such a query to the Google Chart API, I can generate trend graphs to see how my dataset is evolving.

If I have time and there is any interest, I may release this extension as a gem or Rails plugin.

UPDATE: I rewrote it and released this project on github.

11 July Using git-svn

I personally am a fan of the git version control system. The best part of git is its speed, and the simplicity of using local branches.

Local branches are very helpful if you are working on different features at the same time but want to keep them apart. An example: it happens all the time that I am working on some feature and than I have to put my current work aside to work on a high priority issue. Once this issue is solved, I need to commit the changes and usually do a deploy of the web application so that the problem is solved as soon as possible. With Subversion, I sometimes commit files that were part of the unfinished feature I was working on before I started on the high priority issue. If I am not careful and deploy those files, unfinished work will be put into production and this can go horribly wrong, like every page request returning a 500-error of our high traffic site :(.

Using git, I can put my current work aside easily by using git stash. When I am finished with the high priority issue, I can revert to my previous work with git stash apply.

Another option: branching the project (using git branch feature) if the feature I am working on is invasive and than switch branches for high priority issues using git checkout master. I can go back to the feature branch with git checkout feature, followed by git merge master to merge back the changes I just made in the master branch. Branching and merging is very fast in git and merging is not the PITA like it is in Subversion.

However, our main code repository will probably remain in SVN for now. Luckily for me, I can use git-svn locally to profit from these advantages. I found an informative page on installing and getting started with git-svn on OSX. If you know Subversion, this page is helpful to translate Subversion commandos to their git alternatives.

22 May Developing RESTful APIs in Rails

As you may have read on this blog, we are working on a RESTful API for Floorplanner. This post contains some random observations I have made and questions I had (and still have) during the development.

to_xml incompatible with to_json
ActiveRecord#to_json does not seem to be fully compatible with ActiveRecord#to_xml. With to_xml, it is possible to overwrite the to_xml method of your models. The overwritten method will be called, even if an instance of such a model is :included within the XML of another model. For to_json however, only the overwritten method of the instance you call to_json on will be executed; for every included model, the default implementation is used.

A workaround in most cases is to pass every option to the initial call of to_json:

@object.to_json(:except => [:id],
    :include =&gt; {:related_objects => {:except => [:id, :object_id]}})

Weird behavior in to_xml called on an array of objects
One model in our project seems to have some caching issues in production mode. If to_xml is called on a collection of instances of this model, the results seems to get stored in cache. On every call, the result is appended to this cached value. The result is a lot of repetition of the same XML, which is invalid XML. The weird part is that it works OK if it is only a single instance of this model or if config_cache_classes == false (a development environment). to_json does not seem to have this problem either. All other models are unaffected as well. A more complete write-up of the problem can be found here.

I am still not able to figure out what causes this behavior and I am currently working around this issue by using some String#split-magic on the result of the to_xml-call. I know this is extremely ugly, so if anybody has experienced a similar problem, please let me know! It’s driving me nuts!

Testing an XML API
What is the best way to test a REST API, besides the Unit-tests that are already in place? Currently, I have an integration test suite, with a lot of testing code that looks like this:

def test_create_project
  post projects_path(:format => :xml),
      {:project => {:description => 'Original description'}}
  assert_response :created
  @new_project_location = headers['location'].first # array?
 
  get @new_project_location
  assert_response :success
  project_doc = REXML::Document.new(response.body)
  assert_equal 'Original description',
      project_doc.elements['//description'].text
 
end

It works, but is isn’t very elegant in my eyes. Moreover, all that XML parsing is making the test suite slower and slower. Does anybody have suggestions to build a cleaner and faster test suite for a RESTful XML API? By the way, is there an easy way to POST an XML document rather than “normal” POST-parameters in these calls?

3 April Finding the correct IP address in Rails

Today, I have added a server switch to Floorplanner.com. Now, it will load the floorplanner elements from the server that is nearest to you, which can yield a significant improvement in the initial loading time of your floorplans.

To determine your location, your IP address is matched against a table of locations. This worked fine in our development version, but it didn’t work at all on the production server. After some searching, I found that our server configuration was causing this. We use Apache as our web server, which uses mod_proxy to send the request to our Mongrel cluster. This intermediary step caused the IP address that Rails would receive to always be the IP address of the Apache server: 127.0.0.1. Therefore, the location matching did not work.

However, I found that mod_proxy adds an additional header to the request with the original IP address: HTTP_X_FORWARDED_FOR. This header can be used for our purpose. Now, I use the following function to determine the correct IP address:

1
2
3
4
5
def determine_ip(request)
  # use HTTP_X_FORWARDED_FOR if available
  # otherwise fall back to default header
  request.env["HTTP_X_FORWARDED_FOR"] || request.remote_addr
end

On a related note: to match an IP address against ranges of IP addresses in our location table, it must be converted from a string (”1.2.3.4″) to a number (16909060). I use the following oneliner, which uses some nice functional programming tricks and an application of bit-shifting:

1
2
3
4
def numeric_ip(ip_str)
  ip_str.split('.').inject(0) { |ip_num, part|
            ( ip_num << 8 ) + part.to_i }
end

Yes, I am really proud if this function! :-)

UPDATE: I just found out that request.remote_ip does the same as my determine_ip-function. Unfortunately, it only works in Rails 2.0.

28 March Rounding errors in practice

In college I followed a course in numerical analysis. The main point of the course was to be careful with floating point arithmetic, because it is vulnerable to rounding errors that can significantly influence the result of complex computations. Until yesterday I never had encountered such a problem. Now that I have lost my innocence in this matter, I would like to share my tale of nasty debugging and frustration.

After receiving some bug reports of Floorplanner designs that failed to save properly, we dove into the code to see what was going wrong. After some time, we found that the errors were caused by the script that loads the design after it has been saved with a unique name. This unique name is passed to the script to be able to find the design. We used the current timestamp as a unique name for the design. The current timestamp simply is the number of seconds passed since January 1, 1970 and looks something like this: 1206712028. As a design name, this number was passed to different scripts, both client-side and server-side. However, at some point in this chain of scripts, the number was changed slightly to 1206712030 and because of this the associated design could not be found, resulting in an error.

At first, we investigated the possibility that the stored timestamp was overwritten by a newer timestamp, as this could explain the slight increase in the number. However, we were not able to find this anywhere in the code and sometimes, the number was decreased a bit instead of being increased.

Finally, we monitored the data being sent between the different scripts, and we found that ActionScript automatically converted the numeric design name into a number in scientific notation. In our case, this would be 0.1206712028 x 10^10. Unfortunately, this number was rounded to 0.120671203 x 10^10 because computers use floating point arithmetic to store numbers in scientific notation. This number would eventually be converted back to normal notation, but it was now 1206712030 because of the rounding error.

We fixed it by putting an ‘a’ in front of the timestamp, preventing the automatic conversion to a number. Not very elegant, but it works!

11 March Putting HTTP status codes to use with Rails

We are currently implementing an API for Floorplanner, so other sites can use the service Floorplanner offers for their own needs. The Floorplanner website is developed with Rails, so we are trying to be a good Rails citizen and create a API based on REST.

REST embraces the HTTP protocol by coupling URIs to resources and HTTP methods (GET, PUT, POST and DELETE) to actions for manipulating them. Today, I tried to embrace the HTTP protocol even more by using its various status codes for Floorplanner’s particular needs.

Most people will know the HTTP 404 status code, which a server returns if you request a page that does not exist. But there are many more interesting codes that can be put to good use as well. The Forbidden status code (403) can for instance be returned if you try to access a another user’s floorplan. Moreover, Floorplanner offers paid subscriptions that include additional functionality. If you try to access this functionality with an account without the necessary privileges, an Upgrade Required status code (426) can be returned. And if you forgot to pay your subscription fee, a Payment Required code (402) can be returned to indicate this.

However, in the end, users wants to see a nice and informative page that tells them what is wrong without cryptic error codes. This is easily possible by leveraging some new features in Rails 2. First of all, we define some custom exceptions that can be raised if some of these conditions occur is wrong:

class PermissionDenied < Exception
  # no further implementation necessary
end
 
class AccountExpired < Exception
  # no further implementation necessary
end

Now, we can raise these exceptions when needed in our controllers:

def show
  @plan = Floorplan.find(params[:id])
  raise PermissionDenied unless @plan.user == current_user
  # ...
end
 
def login
  # ...
  raise AccountExpired if Time.now > current_user.account_paid_until
  # ...
end

Normally, raised exceptions will trigger an Internal server error (HTTP status code 500). The new exceptions will be handled manually to return the intended status code and serve a pretty page explaining what is wrong. Exceptions can be caught using the rescue_from method. If we put the code in the Application controller, it will automagically work for all our controllers. DRY.

class ApplicationController < ActionController::Base
 
  rescue_from PermissionDenied { |e| http_status_code(:forbidden, e) }
  rescue_from AccountExpired   { |e| http_status_code(:payment_required, e) }
 
  # Returns a HTTP status code, with a nice error page
  def http_status_code(status, exception)
    # store the exception so its message can be used in the view
    @exception = exception
 
    # Only add the error page to the status code if the reuqest-format was HTML
    respond_to do |format|
      format.html { render :template => "shared/status_#{status.to_s}", :status => status }
      format.any  { head status } # only return the status code
    end
  end
 
end

Now, it is easy to create super fancy pages by editing the views file like app/views/shared/status_forbidden.html.erb.

Note that you cannot and should not use this method for handling internal server errors with status code 500. These should not occur because we have thought of every possible way our code will be used and misused… in theory. In practice, we installed the exception notifier plugin, so we receive a message if one of these occurs and get our asses back to work.