Case-insensitive validates_uniqueness_of slowness

Watch out when using validates_uniqueness_of :field, :case_sensitive => false. Rails transforms this in a query that cannot be supported by an index, which will really slow validation down if the underlying table grows larger.

For example, we use validates_uniqueness_of to check for duplicate e-mail addresses. Because email addresses are case-insensitive, adding :case_sensitive => false seems like a natural choice. However, this results in the following queries:

# For a new User instance:
SELECT id FROM users 
 WHERE LOWER(users.email) = BINARY 'user@example.com'
 
# For an existing User instance:
SELECT id FROM users 
 WHERE LOWER(users.email) = BINARY 'user@example.com' 
   AND users.id <> 42

This query cannot be optimized by a (unique) index on the email field and thus has to scan the full table. As our users table grew larger, these queries started to show up in our slow query log.

However, MySQL uses case-insensitive comparison by default. (To be exact, case-sensitiveness depends on the current collation, which can vary. Rails generates the weird query to make sure the check works, regardless of the current collation.) The conversion to lowercase therefore is not necessary for a uniqueness check (as long as the field has a case-insensitive collation like utf8_general_ci). I decided to write my own validation method that issues a query that can be optimized by a query.

  # Alternative for validates_uniqueness_of :email, :case_sensitive => false
  validate do |user|
    conditions = "users.email = :email"
    conditions << " AND users.id != :id" unless user.new_record?
    conditions = [conditions, { :email => user.email, :id => user.id }]
    if User.find(:first, :select => :id, :conditions => conditions)
      user.errors.add(:email, 'Already in use')
    end
  end

There is a ticket for this issue in Rails’s Lighthouse, but as of yet this issue is unresolved. For now, this solution works to keep our slow query log nice and quiet!

Tags: , , , , , ,

6 Responses to “Case-insensitive validates_uniqueness_of slowness”

  1. Scott Johnson Says:

    Umm… why not just leave off the :case_sensitive => true?

  2. Scott Johnson Says:

    Or is that :case_sensitive => false. I think your post mixes them up just as I have.

  3. Willem van Bergen Says:

    I just tested what queries are issued when using different values for the :case_sensitive option of validates_uniqueness_of:

    # :case_sensitive => false
    SELECT `users`.id FROM `users` 
     WHERE (LOWER(`users`.`email`) = BINARY 'test@example.com') LIMIT 1
     
    # :case_sensitive => true
    SELECT `users`.id FROM `users` 
     WHERE (`users`.`email` = BINARY 'test@example.com') LIMIT 1
     
    # without :case_sensitive option
    SELECT `users`.id FROM `users` 
     WHERE (`users`.`email` = BINARY 'test@example.com') LIMIT 1

    Rails always uses the BINARY keyword in the queries it issues to make the check case sensitive, regardless of the collation. That’s why I really need :case_sensitive => false for a case insensitive check, which uses the horribly inefficient query.

    So unfortunately, this custom implementation really is needed to make sure the index on the field is used.

    BTW: I changed a :case_sensitive => true into :case_sensitive => false. Thanks for spotting the mistake!

  4. Scott Johnson Says:

    Thanks, I understand better now. In my case I want :case_sensitive => true, and the default Rails-generated query seems to work fine (EXPLAIN shows it using the index).

  5. Willem van Bergen Says:

    Yes, the LOWER function makes that MySQL cannot use the index on the column anymore. Unfortunately, MySQL does not support indices on expressions like PostgreSQL does.

  6. Ennuyer.net » Blog Archive » Rails Reading - November 24, 2009 Says:

    [...] Case-insensitive validates_uniqueness_of slowness | Floorplanner Tech Blog [...]

Leave a Reply