Techblog

Tech Blog

Our latest geek adventures!

Posts Tagged ‘active_olap’

29 July Active OLAP released

Remember my post about easy OLAP queries in Rails? I rewrote it almost completely and published is as a Rails plugin for anyone to use on github! It is now called: Active OLAP.

Although it is a complete rewrite, the API I demoed in my previous post should still work with some small changes. The most important: you have to enable it for every class you want to use it on with the enable_active_olap method. You can provide a block to this method with dimension definitions, but is not mandatory:

class User < ActiveRecord::Base
 
  enable_active_olap do |olap|
 
    # create a simple dimension on the account_type field
    olap.dimension :account_type
 
    # create a dimension with custom categories
    # the order of the categories will be kept in the results 
    # if you use an array to define the categories.
    olap.dimension :nationality, :categories => [
      [:usa, { :country => 'US' }],
      [:china, { :country => 'CN' }]
      # other is automatically added
    ]
 
    # Easily create a trend dimension
    olap.dimension :created_daily, :trend => {
      :timestamp_field => :created_at,
      :period_length => 1.day, 
      :period_count => 20
    }
  end
end

Now, we can use these dimensions for our OLAP queries. Multiple dimensions are supported too!

# simple query
@result = User.olap_query(:nationality)
# @result[:usa] == 123, @result[:china] == 456, @result[:other] = 789
 
# do drilldown using will_paginate to paginate the results
# olap_drilldown is implemented as a named_scope
@users = User.olap_drilldown(:nationality => :china).paginate(:page => 1)
 
# multiple dimensions!
@result = User.olap_query(:nationality, :created_daily)
@users = User.olap_drilldown(:nationality => :china, 
                        :created_daily => :period_19)

I am working on a generic controller that can easily be added to your Rails project. Just define dimensions for your models and the controller will let you execute OLAP queries and display the results as a table or a graph.

Keep an eye on this weblog or the github project if you want to stay up-to-date! Or, contact me if you have questions, suggestions or want to help out.

3 Comments - Tags: , , ,

14 July Easy OLAP queries in ActiveRecord

Because I love statistics so much, I decided to add some neat statistics functionality to the Floorplanner administration interface, so we can get better insight in what is going on. Instead of writing complete OLAP SQL queries myself and adding a custom interface for each one of them so our management can use them (yes Jeroen, that means you!), I built an ActiveRecord extension to ease the work. Right now, I only have to define some categories, and it automagically generates the right SQL query to generate charts and tables with the number of records that fall in each category. Moreover, by clicking on these numbers, I can drill down to the individual records.

I can define the categories like this:

olap_definition = { :categories => {
  :project_is_private   => { :public => false, :publishd_at => nil },
  :project_is_public    => { :public => true,  :publishd_at => nil },
  :project_is_published => 'projects.published_at IS NOT NULL'
}}

Not too hard, was it? Now, I can easily feed this to Project.olap_query:

@query_result = Project.olap_query(olap_definition) 
# @query_result == {
#   :project_is_private   => 123,
#   :project_is_public    => 456,
#   :project_is_published => 3,
#   :other                => 2
# }

Note that the category other is added automatically, but can be omitted if you wish. (I found that the other-category is nice to spot data integrity problems in your dataset you didn’t think of beforehand). The result can be used to create a table with the results, plot a pie chart with the Google Charts API. Because this setup is completely generic, this functionality only has to be written once. DRY!

The SQL for other-category is “calculated” by OR-ing all the categories and checking whether the result is false, or NULL. The check for NULL is necessary if you have NULL-values in your table: this is a weird characteristic of SQL that defines that TRUE AND NULL equals NULL (see Wikipedia).

The actual SQL query for this example would be:

SELECT 
  SUM(projects.public = 0 AND projects.published_at IS NULL) AS project_is_private,
  SUM(projects.public = 1 AND projects.published_at IS NULL) AS project_is_public,
  SUM(projects.published_at IS NOT NULL) AS project_is_published,
  SUM( NOT (
    (projects.public = 0 AND projects.published_at IS NULL) OR
    (projects.public = 1 AND projects.published_at IS NULL) OR
    (projects.published_at IS NOT NULL)
  ) OR (
    (projects.public = 0 AND projects.published_at IS NULL) OR
    (projects.public = 1 AND projects.published_at IS NULL) OR
    (projects.published_at IS NOT NULL) IS NULL)) AS other
FROM projects

Some notes about this query:

  • It is complety built using the fragments from the categories. The fragment for the other-cagegory is a little verbose, but what do I care? It works and is generated automatically! :-)
  • Note that a record can be in multiple categories, depending on the category definitions. The other category only contains records that conform to none of the provided categories.
  • SUM is used in stead of COUNT. This way, I can query all the categories at once and it solves the problems with NULL-values, while keeping my WHERE and GROUP BY clause nice and clean :-)
  • The query is built completely using ActiveRecords find method by using anonymous scopes. Therefore, Rails 2.1 is required, but this makes some neat tricks possible as well.

I also have a Project.olap_drilldown method that I can use to find the individual projects in a category:

@projects = Project.olap_drilldown(olap_definition, :project_is_public)
# SELECT projects.* FROM projects 
# WHERE (projects.public = 1 AND projects.published_at IS NULL)
 
@projects.each do |project|
  puts project.name
end

Because this functionality is built on anonymous scopes, it offers some interesting additional functionality. You can use your own scopes to limit the input dataset

class Project < ActiveRecord::Base
  named_scope :recent, lambda { { :conditions => 
              ['created_at > ?', Time.now - 7.days]} }
  ...
end
# This will add a WHERE-clause to the OLAP query
results = Project.recent.olap_query(olap_definition)
 
# Or, use :conditions for the same effect
results = Project.olap_query(olap_definition.merge(
            :conditions => ['created_at > ?', Time.now - 7.days]))

As I noted before, the GROUP BY-clause is not used. I already built an extension to use the GROUP BY clause to group the results in periods of a given timestamp field of the model (e.g. created_by). When I pass the result of such a query to the Google Chart API, I can generate trend graphs to see how my dataset is evolving.

If I have time and there is any interest, I may release this extension as a gem or Rails plugin.

UPDATE: I rewrote it and released this project on github.

1 Comment - Tags: , , , ,