Techblog

Tech Blog

Our latest geek adventures!

Posts Tagged ‘sql’

17 November Case-insensitive validates_uniqueness_of slowness

Watch out when using validates_uniqueness_of :field, :case_sensitive => false. Rails transforms this in a query that cannot be supported by an index, which will really slow validation down if the underlying table grows larger.

For example, we use validates_uniqueness_of to check for duplicate e-mail addresses. Because email addresses are case-insensitive, adding :case_sensitive => false seems like a natural choice. However, this results in the following queries:

# For a new User instance:
SELECT id FROM users 
 WHERE LOWER(users.email) = BINARY 'user@example.com'
 
# For an existing User instance:
SELECT id FROM users 
 WHERE LOWER(users.email) = BINARY 'user@example.com' 
   AND users.id <> 42

This query cannot be optimized by a (unique) index on the email field and thus has to scan the full table. As our users table grew larger, these queries started to show up in our slow query log.

However, MySQL uses case-insensitive comparison by default. (To be exact, case-sensitiveness depends on the current collation, which can vary. Rails generates the weird query to make sure the check works, regardless of the current collation.) The conversion to lowercase therefore is not necessary for a uniqueness check (as long as the field has a case-insensitive collation like utf8_general_ci). I decided to write my own validation method that issues a query that can be optimized by a query.

  # Alternative for validates_uniqueness_of :email, :case_sensitive => false
  validate do |user|
    conditions = "users.email = :email"
    conditions << " AND users.id != :id" unless user.new_record?
    conditions = [conditions, { :email => user.email, :id => user.id }]
    if User.find(:first, :select => :id, :conditions => conditions)
      user.errors.add(:email, 'Already in use')
    end
  end

There is a ticket for this issue in Rails’s Lighthouse, but as of yet this issue is unresolved. For now, this solution works to keep our slow query log nice and quiet!

5 Comments - Tags: , , , , , ,

17 December How to remove hidden tab characters

Posted by Gert-Jan in Databases

At this moment, all the language translations of the Floorplanner 2D app are stored in a database table. Today we discovered that a couple of these translations didn’t align properly in the interface. After some investigation we discovered that they all contained a hidden tab character at the end of  each string. This was probably caused by importing a malformed CSV file.

I thought a simple REPLACE query would fix this problem, but (as usual) it was a little more complicated than that. First I had to find the fields with the tab character… Willem pointed me to the right direction with his favorite weapon of choice REGEXP. According to the MySQL docs I could find tab characters with something like this:

SELECT * FROM table WHERE field REGEXP '[[.LF.]]'

The next step was to remove the tab characters. My first thought was to do this by replacing them with an empty string. It turns out you can’t combine a REPLACE with a REGEXP in a query. So I used good ol’ PHP for the job. A nice advantage was that I didn’t have to do any replacing, I could just use the trim() function.

$res = mysql_query("SELECT id, field FROM table WHERE field REGEXP '[[.LF.]]'");
if($res) {
	while($row = mysql_fetch_assoc($res)) {
		$id = $row['id'];
		$field = trim($row['field']);
		mysql_query("UPDATE table SET field = '$field' WHERE id = $id");
	}
}

Rather simple, when you know what to do… Another bug bites the dust!

3 Comments - Tags: , , , ,