Techblog

Tech Blog

Our latest geek adventures!

Posts Tagged ‘ActiveRecord’

17 November Case-insensitive validates_uniqueness_of slowness

Watch out when using validates_uniqueness_of :field, :case_sensitive => false. Rails transforms this in a query that cannot be supported by an index, which will really slow validation down if the underlying table grows larger.

For example, we use validates_uniqueness_of to check for duplicate e-mail addresses. Because email addresses are case-insensitive, adding :case_sensitive => false seems like a natural choice. However, this results in the following queries:

# For a new User instance:
SELECT id FROM users 
 WHERE LOWER(users.email) = BINARY 'user@example.com'
 
# For an existing User instance:
SELECT id FROM users 
 WHERE LOWER(users.email) = BINARY 'user@example.com' 
   AND users.id <> 42

This query cannot be optimized by a (unique) index on the email field and thus has to scan the full table. As our users table grew larger, these queries started to show up in our slow query log.

However, MySQL uses case-insensitive comparison by default. (To be exact, case-sensitiveness depends on the current collation, which can vary. Rails generates the weird query to make sure the check works, regardless of the current collation.) The conversion to lowercase therefore is not necessary for a uniqueness check (as long as the field has a case-insensitive collation like utf8_general_ci). I decided to write my own validation method that issues a query that can be optimized by a query.

  # Alternative for validates_uniqueness_of :email, :case_sensitive => false
  validate do |user|
    conditions = "users.email = :email"
    conditions << " AND users.id != :id" unless user.new_record?
    conditions = [conditions, { :email => user.email, :id => user.id }]
    if User.find(:first, :select => :id, :conditions => conditions)
      user.errors.add(:email, 'Already in use')
    end
  end

There is a ticket for this issue in Rails’s Lighthouse, but as of yet this issue is unresolved. For now, this solution works to keep our slow query log nice and quiet!

5 Comments - Tags: , , , , , ,

31 August New version of Scoped search

After an almost complete rewrite, I am proud to present version 2.0 of scoped_search, the ActiveRecord plugin that makes it easy to find records using a simple query language. This new version support a new query language that supports more complex constructs, and can therefore be used to conduct more fine-grained queries on your models.

New query language

  • Logical operators: AND (&, &&), OR (|, ||) and NOT (!, -) operators, and parentheses to structure the boolean logic: police AND (car || uniform), -"village people". By default, the AND operator is used to combine different segments of your query.
  • Comparison operators: the most common comparison operators are supported, and to what you expect on integer and date field.
  • Explicit field support: only search in the specified field instead of all fields: age >= 21, created < 2009-01-01, username != "root".
  • Check for NULL fields: null? parent, set? error_message
  • Commas are supported to separate the different parts of the query.

More information about the query language can be found in the project wiki on GitHub.

New definition syntax

The new version supports a new syntax to define what fields of your model can be searched and in what cases. An example:

class User < ActiveRecord::Base   
  belongs_to :account_type
 
  scoped_search :on => [:first_name, :last_name]
  scoped_search :on => :created_at, :alias => :created, :only_explicit => true
  scoped_search :in => :account_type, :on => [:name, :description]
end

After the fields have been defined, the search_for method can be used to search your models using a named scope, just like it was before. The project wiki has more information about this new syntax. The search syntax itself hasn’t changed:

@users = User.search_for(params[:q]).paginate(page => params[:page])

Installation or upgrade

Include the gem in your environment.rb configuration and run rake gems:install to install it:

config.gem 'scoped_search', :source => 'http://gemcutter.org'

Backwards compatibility

The new version has a new syntax to define the fields that can be searched with a query. This new syntax gives you more fine-grained control over the queries that will be generated, so I urge you to adopt this new syntax. However, the old searchable_on syntax is still available for backwards compatibility.

Please contact me if you have any issues with the new version.

(Updated with new gemcutter installation instructions.)

2 Comments - Tags: , , , , , ,

29 July Active OLAP released

Remember my post about easy OLAP queries in Rails? I rewrote it almost completely and published is as a Rails plugin for anyone to use on github! It is now called: Active OLAP.

Although it is a complete rewrite, the API I demoed in my previous post should still work with some small changes. The most important: you have to enable it for every class you want to use it on with the enable_active_olap method. You can provide a block to this method with dimension definitions, but is not mandatory:

class User < ActiveRecord::Base
 
  enable_active_olap do |olap|
 
    # create a simple dimension on the account_type field
    olap.dimension :account_type
 
    # create a dimension with custom categories
    # the order of the categories will be kept in the results 
    # if you use an array to define the categories.
    olap.dimension :nationality, :categories => [
      [:usa, { :country => 'US' }],
      [:china, { :country => 'CN' }]
      # other is automatically added
    ]
 
    # Easily create a trend dimension
    olap.dimension :created_daily, :trend => {
      :timestamp_field => :created_at,
      :period_length => 1.day, 
      :period_count => 20
    }
  end
end

Now, we can use these dimensions for our OLAP queries. Multiple dimensions are supported too!

# simple query
@result = User.olap_query(:nationality)
# @result[:usa] == 123, @result[:china] == 456, @result[:other] = 789
 
# do drilldown using will_paginate to paginate the results
# olap_drilldown is implemented as a named_scope
@users = User.olap_drilldown(:nationality => :china).paginate(:page => 1)
 
# multiple dimensions!
@result = User.olap_query(:nationality, :created_daily)
@users = User.olap_drilldown(:nationality => :china, 
                        :created_daily => :period_19)

I am working on a generic controller that can easily be added to your Rails project. Just define dimensions for your models and the controller will let you execute OLAP queries and display the results as a table or a graph.

Keep an eye on this weblog or the github project if you want to stay up-to-date! Or, contact me if you have questions, suggestions or want to help out.

5 Comments - Tags: , , ,

26 July Easy search with ActiveRecord

A couple of minutes ago I released scoped_search, a Rails/ActiveRecord plugin that makes it easy to search your models. It is very easy to use:

  1. Install the plugin in your vendor/plugins directory fromĀ http://github.com/wvanbergen/scoped_search
    Add the gem to your rails environment.rb:

    config.gem 'wvanbergen-scoped_search', :lib => 'scoped_search', 
        :source => 'http://gems.github.com'

    Call rake gems:install afterwards to ensure the gem is installed.

  2. Define in what fields your model should be searched by calling
    searchable_on :some, :field, :names
  3. Find your records by calling search_for("query keywords")

That’s all! A short example:

class Project < ActiveRecord::Base
  searchable_on :name, :description
end
 
Project.search_for("search keywords").each do |project| 
  puts project.name
end
 
# SELECT * FROM projects WHERE 
#      (name LIKE '%search%' OR description LIKE '%search%') 
#  AND (name LIKE '%keywords%' OR description LIKE '%keywords%')

This functionality is completely build upon named_scope. The search_for method is actually a named scope that was created by the call to searchable_on. Because these scopes can be chained, this offers some great possibilities.

For example, in Floorplanner, we only want you to search on the projects you have access to. We have implemented this access logic in another named scope. The calls can simply be chained:

class Project < ActiveRecord::Base
  searchable_on :name, :description
 
  named_scope :accessible_by, lambda { |user| ... }
  named_scope :published, :conditions => 'published_at IS NOT NULL'
end
 
@projects = Project.accessible_by(current_user).published.search_for('query')
@projects.each { |project| ... }

This plugin is released under the MIT license, so please use it for any purpose you see fit. There are some TODOs: you currently can not search on fields in other tables, and splitting the search string into keywords is very basic. Please contact me if you have implemented any of these features and you are willing to share them! Do not hesitate to contact me in case or problems either.

Update: I added support for quotes and the minus sign to the query language:
Project.search_for('willem -"van bergen"').count

Update #2: Wes Hays implemented the OR keyword:
Project.search_for('wes OR hays').count.
A big thanks to Wes for helping out on this project!!

10 Comments - Tags: , , , ,

14 July Easy OLAP queries in ActiveRecord

Because I love statistics so much, I decided to add some neat statistics functionality to the Floorplanner administration interface, so we can get better insight in what is going on. Instead of writing complete OLAP SQL queries myself and adding a custom interface for each one of them so our management can use them (yes Jeroen, that means you!), I built an ActiveRecord extension to ease the work. Right now, I only have to define some categories, and it automagically generates the right SQL query to generate charts and tables with the number of records that fall in each category. Moreover, by clicking on these numbers, I can drill down to the individual records.

I can define the categories like this:

olap_definition = { :categories => {
  :project_is_private   => { :public => false, :publishd_at => nil },
  :project_is_public    => { :public => true,  :publishd_at => nil },
  :project_is_published => 'projects.published_at IS NOT NULL'
}}

Not too hard, was it? Now, I can easily feed this to Project.olap_query:

@query_result = Project.olap_query(olap_definition) 
# @query_result == {
#   :project_is_private   => 123,
#   :project_is_public    => 456,
#   :project_is_published => 3,
#   :other                => 2
# }

Note that the category other is added automatically, but can be omitted if you wish. (I found that the other-category is nice to spot data integrity problems in your dataset you didn’t think of beforehand). The result can be used to create a table with the results, plot a pie chart with the Google Charts API. Because this setup is completely generic, this functionality only has to be written once. DRY!

The SQL for other-category is “calculated” by OR-ing all the categories and checking whether the result is false, or NULL. The check for NULL is necessary if you have NULL-values in your table: this is a weird characteristic of SQL that defines that TRUE AND NULL equals NULL (see Wikipedia).

The actual SQL query for this example would be:

SELECT 
  SUM(projects.public = 0 AND projects.published_at IS NULL) AS project_is_private,
  SUM(projects.public = 1 AND projects.published_at IS NULL) AS project_is_public,
  SUM(projects.published_at IS NOT NULL) AS project_is_published,
  SUM( NOT (
    (projects.public = 0 AND projects.published_at IS NULL) OR
    (projects.public = 1 AND projects.published_at IS NULL) OR
    (projects.published_at IS NOT NULL)
  ) OR (
    (projects.public = 0 AND projects.published_at IS NULL) OR
    (projects.public = 1 AND projects.published_at IS NULL) OR
    (projects.published_at IS NOT NULL) IS NULL)) AS other
FROM projects

Some notes about this query:

  • It is complety built using the fragments from the categories. The fragment for the other-cagegory is a little verbose, but what do I care? It works and is generated automatically! :-)
  • Note that a record can be in multiple categories, depending on the category definitions. The other category only contains records that conform to none of the provided categories.
  • SUM is used in stead of COUNT. This way, I can query all the categories at once and it solves the problems with NULL-values, while keeping my WHERE and GROUP BY clause nice and clean :-)
  • The query is built completely using ActiveRecords find method by using anonymous scopes. Therefore, Rails 2.1 is required, but this makes some neat tricks possible as well.

I also have a Project.olap_drilldown method that I can use to find the individual projects in a category:

@projects = Project.olap_drilldown(olap_definition, :project_is_public)
# SELECT projects.* FROM projects 
# WHERE (projects.public = 1 AND projects.published_at IS NULL)
 
@projects.each do |project|
  puts project.name
end

Because this functionality is built on anonymous scopes, it offers some interesting additional functionality. You can use your own scopes to limit the input dataset

class Project < ActiveRecord::Base
  named_scope :recent, lambda { { :conditions => 
              ['created_at > ?', Time.now - 7.days]} }
  ...
end
# This will add a WHERE-clause to the OLAP query
results = Project.recent.olap_query(olap_definition)
 
# Or, use :conditions for the same effect
results = Project.olap_query(olap_definition.merge(
            :conditions => ['created_at > ?', Time.now - 7.days]))

As I noted before, the GROUP BY-clause is not used. I already built an extension to use the GROUP BY clause to group the results in periods of a given timestamp field of the model (e.g. created_by). When I pass the result of such a query to the Google Chart API, I can generate trend graphs to see how my dataset is evolving.

If I have time and there is any interest, I may release this extension as a gem or Rails plugin.

UPDATE: I rewrote it and released this project on github.

1 Comment - Tags: , , , ,