Techblog

Tech Blog

Our latest geek adventures!

Archive for the ‘Databases’ Category

18 November Request-log-analyzer 1.5.0

Bart and I just released request-log-analyzer version 1.5.0. New features include:

  • MySQL slow query log format support to analyze what queries are slowing down your database.
  • Format autodetection: with all those supported file formats, remembering the right --format parameter gets tricky. With format autodetection, this usually is not needed anymore!

As always, run the following command to install or upgrade to the latest version:

$ gem install request-log-analyzer

No Comments - Tags: , , , ,

17 November Case-insensitive validates_uniqueness_of slowness

Watch out when using validates_uniqueness_of :field, :case_sensitive => false. Rails transforms this in a query that cannot be supported by an index, which will really slow validation down if the underlying table grows larger.

For example, we use validates_uniqueness_of to check for duplicate e-mail addresses. Because email addresses are case-insensitive, adding :case_sensitive => false seems like a natural choice. However, this results in the following queries:

# For a new User instance:
SELECT id FROM users 
 WHERE LOWER(users.email) = BINARY 'user@example.com'
 
# For an existing User instance:
SELECT id FROM users 
 WHERE LOWER(users.email) = BINARY 'user@example.com' 
   AND users.id <> 42

This query cannot be optimized by a (unique) index on the email field and thus has to scan the full table. As our users table grew larger, these queries started to show up in our slow query log.

However, MySQL uses case-insensitive comparison by default. (To be exact, case-sensitiveness depends on the current collation, which can vary. Rails generates the weird query to make sure the check works, regardless of the current collation.) The conversion to lowercase therefore is not necessary for a uniqueness check (as long as the field has a case-insensitive collation like utf8_general_ci). I decided to write my own validation method that issues a query that can be optimized by a query.

  # Alternative for validates_uniqueness_of :email, :case_sensitive => false
  validate do |user|
    conditions = "users.email = :email"
    conditions << " AND users.id != :id" unless user.new_record?
    conditions = [conditions, { :email => user.email, :id => user.id }]
    if User.find(:first, :select => :id, :conditions => conditions)
      user.errors.add(:email, 'Already in use')
    end
  end

There is a ticket for this issue in Rails’s Lighthouse, but as of yet this issue is unresolved. For now, this solution works to keep our slow query log nice and quiet!

5 Comments - Tags: , , , , , ,

9 November Database replication in 5 easy steps

Running an online service like Floorplanner is not without risks. There are a lot of things that can go wrong. One of them is downtime of your service due to a crashing system. It doesn’t have to happen of course, but when it does, it can have some nasty consequences. A good way to reduce the risk of a crashing system is to set up a redundant system.

In short, a redundant system is an exact copy of your main system. When your main system goes down (for whatever reason), the redundant system can take over. One of the most important parts of any online service is the database. A way to maintain an exact copy of your database is setting up database replication. This post gives you the basics on how to set up database replication using a MySQL database.

Database replication isn’t probably something you’d setup right from the start. This means that you have to work with a running system when you decide to do so. From past experience we know that it’s not a good idea to replicate your database to the same machine, let alone to the same disk. To make this work, you need a separate machine with MySQL installed. This is your slave machine and database.

Let’s get down to business. These are the steps for setting up database replication:

  1. Enable binary logging on master database
  2. Create a backup of master database
  3. Transfer backup to slave machine
  4. Import backup to slave database
  5. Start slave database

1. Enable binary logging on master database

The binary log contains all statements that update data [..]. Statements are stored in the form of “events” that describe the modifications. The binary log also contains information about how long each statement took that updated data.

Every write action that’s performed on your database, like an insert or an update, is stored in these binary logs (bin logs). The master dumps all actions in these logs and the slave database can read them and perform the same actions. This way your master and slave databases will always have the same data.

To enable binary logging you have to create a config file, for example master.cnf. An important thing here is the server-id, which is needed for replication to work. Then there is log-bin which specifies base name of binary logs, and finally binlog-do-db, which specifies which databases should be bin logged.

[mysqld]
datadir=/home/yourname/mysql/myisam/data
basedir=/usr/local/mysql
port=3306
server-id=1
log-bin=mysql-bin
binlog-do-db=my_fat_db1
binlog-do-db=ma_other_fat_db

Then start (or restart) the master database using the config file:

~ $ mysqld --defaults-file=master.cnf&amp;

The slave database needs a userid/password in order to access the master machine. It’s best practice create a dedicated user with just the required privileges. This can be done with the following SQL statement.

GRANT replication slave ON *.* TO 'repl'@'slave-ip-here'

2. Create a backup of master database

The next step it to create a backup of your master database. The type of your database is very important for this action. If you use MyISAM, you need to schedule some downtime to dump your database. Otherwise the dumped data isn’t consistent and therefor useless. With InnoDB it’s possible to dump a consistent backup of your database while keeping it up and running. We stumbled on this feature by accident, but we haven’t been able to in the official documentation.

MyISAM

~ $ mysqldump --lock-all-tables

InnoDB

~ $ mysqldump --single-transaction

3. Transfer backup to slave machine

Now that the backup is created on the master machine, it needs to be transferred to the slave machine.

4. Import backup to slave database

Once the backup is present on the slave machine, it can be imported into the slave database:

~ $ mysql &lt; filename

5. Start slave database

The last step is to tell the database that it’s a slave database. First we have to give it an server id. We use a config file for this, slave.cnf:

[mysqld]
datadir=/home/yourname/mysql/myisam/data
basedir=/usr/local/mysql
port=3306
server-id=2
replicate-do-db=my_fat_db1
replicate-do-db=my_other_fat_db

Then we start the database with this config file:

~ $ mysqld --defaults-file=slave.cnf&amp;

and we issue the following SQL statement:

CHANGE MASTER TO
-&gt; MASTER_HOST='master-ip-here',
-&gt; MASTER_PORT=3306,
-&gt; MASTER_USER='repl',
-&gt; MASTER_PASSWORD='repl',
-&gt; MASTER_LOG_FILE='bin-log-filename-here',
-&gt; MASTER_LOG_POS=4;

It’s hard to predict the MASTER_LOG_POS. You can find this number by issuing the following SQL statement on the master database:

SHOW MASTER STATUS\G;

Once the slave is configured correctly it will get the bin logs from the master machine, parse them and execute each action. It takes some time to have an exact copy of the master database, depending on the size and number of bin logs. To get an idea about the delay, issue the following query:

SHOW SLAVE STATUS\G;

With this post we hope to contribute to a less riskier world and more peaceful state of mind for the sys admins out there :-D

2 Comments - Tags:

17 December How to remove hidden tab characters

Posted by Gert-Jan in Databases

At this moment, all the language translations of the Floorplanner 2D app are stored in a database table. Today we discovered that a couple of these translations didn’t align properly in the interface. After some investigation we discovered that they all contained a hidden tab character at the end of  each string. This was probably caused by importing a malformed CSV file.

I thought a simple REPLACE query would fix this problem, but (as usual) it was a little more complicated than that. First I had to find the fields with the tab character… Willem pointed me to the right direction with his favorite weapon of choice REGEXP. According to the MySQL docs I could find tab characters with something like this:

SELECT * FROM table WHERE field REGEXP '[[.LF.]]'

The next step was to remove the tab characters. My first thought was to do this by replacing them with an empty string. It turns out you can’t combine a REPLACE with a REGEXP in a query. So I used good ol’ PHP for the job. A nice advantage was that I didn’t have to do any replacing, I could just use the trim() function.

$res = mysql_query("SELECT id, field FROM table WHERE field REGEXP '[[.LF.]]'");
if($res) {
	while($row = mysql_fetch_assoc($res)) {
		$id = $row['id'];
		$field = trim($row['field']);
		mysql_query("UPDATE table SET field = '$field' WHERE id = $id");
	}
}

Rather simple, when you know what to do… Another bug bites the dust!

3 Comments - Tags: , , , ,

29 July Active OLAP released

Remember my post about easy OLAP queries in Rails? I rewrote it almost completely and published is as a Rails plugin for anyone to use on github! It is now called: Active OLAP.

Although it is a complete rewrite, the API I demoed in my previous post should still work with some small changes. The most important: you have to enable it for every class you want to use it on with the enable_active_olap method. You can provide a block to this method with dimension definitions, but is not mandatory:

class User < ActiveRecord::Base
 
  enable_active_olap do |olap|
 
    # create a simple dimension on the account_type field
    olap.dimension :account_type
 
    # create a dimension with custom categories
    # the order of the categories will be kept in the results 
    # if you use an array to define the categories.
    olap.dimension :nationality, :categories => [
      [:usa, { :country => 'US' }],
      [:china, { :country => 'CN' }]
      # other is automatically added
    ]
 
    # Easily create a trend dimension
    olap.dimension :created_daily, :trend => {
      :timestamp_field => :created_at,
      :period_length => 1.day, 
      :period_count => 20
    }
  end
end

Now, we can use these dimensions for our OLAP queries. Multiple dimensions are supported too!

# simple query
@result = User.olap_query(:nationality)
# @result[:usa] == 123, @result[:china] == 456, @result[:other] = 789
 
# do drilldown using will_paginate to paginate the results
# olap_drilldown is implemented as a named_scope
@users = User.olap_drilldown(:nationality => :china).paginate(:page => 1)
 
# multiple dimensions!
@result = User.olap_query(:nationality, :created_daily)
@users = User.olap_drilldown(:nationality => :china, 
                        :created_daily => :period_19)

I am working on a generic controller that can easily be added to your Rails project. Just define dimensions for your models and the controller will let you execute OLAP queries and display the results as a table or a graph.

Keep an eye on this weblog or the github project if you want to stay up-to-date! Or, contact me if you have questions, suggestions or want to help out.

5 Comments - Tags: , , ,

14 July Easy OLAP queries in ActiveRecord

Because I love statistics so much, I decided to add some neat statistics functionality to the Floorplanner administration interface, so we can get better insight in what is going on. Instead of writing complete OLAP SQL queries myself and adding a custom interface for each one of them so our management can use them (yes Jeroen, that means you!), I built an ActiveRecord extension to ease the work. Right now, I only have to define some categories, and it automagically generates the right SQL query to generate charts and tables with the number of records that fall in each category. Moreover, by clicking on these numbers, I can drill down to the individual records.

I can define the categories like this:

olap_definition = { :categories => {
  :project_is_private   => { :public => false, :publishd_at => nil },
  :project_is_public    => { :public => true,  :publishd_at => nil },
  :project_is_published => 'projects.published_at IS NOT NULL'
}}

Not too hard, was it? Now, I can easily feed this to Project.olap_query:

@query_result = Project.olap_query(olap_definition) 
# @query_result == {
#   :project_is_private   => 123,
#   :project_is_public    => 456,
#   :project_is_published => 3,
#   :other                => 2
# }

Note that the category other is added automatically, but can be omitted if you wish. (I found that the other-category is nice to spot data integrity problems in your dataset you didn’t think of beforehand). The result can be used to create a table with the results, plot a pie chart with the Google Charts API. Because this setup is completely generic, this functionality only has to be written once. DRY!

The SQL for other-category is “calculated” by OR-ing all the categories and checking whether the result is false, or NULL. The check for NULL is necessary if you have NULL-values in your table: this is a weird characteristic of SQL that defines that TRUE AND NULL equals NULL (see Wikipedia).

The actual SQL query for this example would be:

SELECT 
  SUM(projects.public = 0 AND projects.published_at IS NULL) AS project_is_private,
  SUM(projects.public = 1 AND projects.published_at IS NULL) AS project_is_public,
  SUM(projects.published_at IS NOT NULL) AS project_is_published,
  SUM( NOT (
    (projects.public = 0 AND projects.published_at IS NULL) OR
    (projects.public = 1 AND projects.published_at IS NULL) OR
    (projects.published_at IS NOT NULL)
  ) OR (
    (projects.public = 0 AND projects.published_at IS NULL) OR
    (projects.public = 1 AND projects.published_at IS NULL) OR
    (projects.published_at IS NOT NULL) IS NULL)) AS other
FROM projects

Some notes about this query:

  • It is complety built using the fragments from the categories. The fragment for the other-cagegory is a little verbose, but what do I care? It works and is generated automatically! :-)
  • Note that a record can be in multiple categories, depending on the category definitions. The other category only contains records that conform to none of the provided categories.
  • SUM is used in stead of COUNT. This way, I can query all the categories at once and it solves the problems with NULL-values, while keeping my WHERE and GROUP BY clause nice and clean :-)
  • The query is built completely using ActiveRecords find method by using anonymous scopes. Therefore, Rails 2.1 is required, but this makes some neat tricks possible as well.

I also have a Project.olap_drilldown method that I can use to find the individual projects in a category:

@projects = Project.olap_drilldown(olap_definition, :project_is_public)
# SELECT projects.* FROM projects 
# WHERE (projects.public = 1 AND projects.published_at IS NULL)
 
@projects.each do |project|
  puts project.name
end

Because this functionality is built on anonymous scopes, it offers some interesting additional functionality. You can use your own scopes to limit the input dataset

class Project < ActiveRecord::Base
  named_scope :recent, lambda { { :conditions => 
              ['created_at > ?', Time.now - 7.days]} }
  ...
end
# This will add a WHERE-clause to the OLAP query
results = Project.recent.olap_query(olap_definition)
 
# Or, use :conditions for the same effect
results = Project.olap_query(olap_definition.merge(
            :conditions => ['created_at > ?', Time.now - 7.days]))

As I noted before, the GROUP BY-clause is not used. I already built an extension to use the GROUP BY clause to group the results in periods of a given timestamp field of the model (e.g. created_by). When I pass the result of such a query to the Google Chart API, I can generate trend graphs to see how my dataset is evolving.

If I have time and there is any interest, I may release this extension as a gem or Rails plugin.

UPDATE: I rewrote it and released this project on github.

1 Comment - Tags: , , , ,

17 October Raise by one SQL statement

Posted by Gert-Jan in Databases

Today I discovered that it’s very easy to raise a value by one (or any other number) when you’re using a MySQL database. This is all:

UPDATE tablename SET value = value + 1 WHERE ...

No Comments - Tags: ,

11 July Multiple relations between 2 tables in Rails

Just a small post about something I figured out recently…

If 2 database tables have a relation, in Rails you model it using the belongs_to and the has_one (has_many) attributes. Rails guesses the foreign key based on the names of the tables and its field names. In my setup I had 4 relations between 2 tables, so Rails couldn’t guess the foreign key’s by itself. All I had to do to make it work, was to specify the foreign keys and the related class:

  belongs_to :var_1, :foreign_key => "id_1", :class_name => "RelatedClass"
  belongs_to :var_2, :foreign_key => "id_2", :class_name => "RelatedClass"
  belongs_to :var_3, :foreign_key => "id_3", :class_name => "RelatedClass"
  belongs_to :var_4, :foreign_key => "id_4", :class_name => "RelatedClass

1 Comment -