Techblog

Tech Blog

Our latest geek adventures!

Posts Tagged ‘statistics’

12 January Request-log-analyzer 1.0

After a complete rewrite, Bart and I are proud to present request-log-analyzer version 1.0! Request-log-analyzer is an open-source command-line tool to analyze production log files from your Rails application to produce a performance report.

What’s new?

  • More robust log parser. It parses more lines and it now combines all lines that belong to the same request, which greatly improves the amount of information available. 
  • It produces more detailed and more beautiful reports
  • A database builder is included, which will create an SQLite 3 database with all parsed request information, so you can roll your own queries.
  • Request filtering options, so you can exclude irrelevant data. An example on how this can be applied in practice can be found in the wiki. 
  • Better, more modularized design under the hood. The parser is now fully log file format-agnostic. Developing extensions and modifications, or adding support for other log file formats should be much easier now. See the development-page for some pointers.
  • Documentation in the project’s wiki. Hopefully, this helps people get up to speed with the new version and answers most questions about using the tool. If you still have questions, please contact us so we can keep improving it!

Installation

Install or upgrade to the new version with the following command:

$ sudo gem install wvanbergen-request-log-analyzer 
                --source http://gems.github.com

To get the best results out of request-log-analyzer, it is important to configure logging correctly for your application. Some pointers on how to set things up correctly can be found in the wiki.

1 Comment - Tags: , , , , , , ,

29 August Rails-log-analyzer matures

Since I announced rails-log-analyzer some weeks ago, quite a lot has happened! Apparently there is some interest in such a tool: on this blog we get a lot of traffic looking for more info, the github project already has 22 watchers and it even has been forked!

In the mean time, Bart and I worked hard to add new functionality and refactored the internal design. As a result, I have released request-log-analyzer 0.1.0 today!

Changes: 

  • The project is renamed to request-log-analyzer, because we plan to support log files from other frameworks as well; Merb is planned to be supported in the near future.
  • The tool is distributed as a gem, making it much easier to install and update.
  • More reports, colorized output, parsing progress bars, command line arguments, etc…
  • Added a tool to create a SQLite database with all the parsed info from the log file, so you can do your own analysis.

Installation:

gem sources -a http://gems.github.com
sudo gem install wvanbergen-request-log-analyzer

Usage:

request-log-analyzer  [LOG FILES*]
request-log-analyzer -c 20 -z log/production.log

Please let me know what you think! If you have any problems using the tool, do not hesitate to contact me!

13 Comments - Tags: , , , ,

15 August Rails log analyzer

My friend Bart from movesonrails.com just blogged about Rails log analyzer, a command line tool to get performance statistics for your Rails application by parsing its log file.

What started as an exercise for me to write a command line ruby program, has been extended and improved by Bart to be actually useful! We decided to release it under an MIT license. You can found the source on github. The project’s wiki contains usage information and an example of the output it will produce.

5 Comments - Tags: , , ,

14 July Easy OLAP queries in ActiveRecord

Because I love statistics so much, I decided to add some neat statistics functionality to the Floorplanner administration interface, so we can get better insight in what is going on. Instead of writing complete OLAP SQL queries myself and adding a custom interface for each one of them so our management can use them (yes Jeroen, that means you!), I built an ActiveRecord extension to ease the work. Right now, I only have to define some categories, and it automagically generates the right SQL query to generate charts and tables with the number of records that fall in each category. Moreover, by clicking on these numbers, I can drill down to the individual records.

I can define the categories like this:

olap_definition = { :categories => {
  :project_is_private   => { :public => false, :publishd_at => nil },
  :project_is_public    => { :public => true,  :publishd_at => nil },
  :project_is_published => 'projects.published_at IS NOT NULL'
}}

Not too hard, was it? Now, I can easily feed this to Project.olap_query:

@query_result = Project.olap_query(olap_definition) 
# @query_result == {
#   :project_is_private   => 123,
#   :project_is_public    => 456,
#   :project_is_published => 3,
#   :other                => 2
# }

Note that the category other is added automatically, but can be omitted if you wish. (I found that the other-category is nice to spot data integrity problems in your dataset you didn’t think of beforehand). The result can be used to create a table with the results, plot a pie chart with the Google Charts API. Because this setup is completely generic, this functionality only has to be written once. DRY!

The SQL for other-category is “calculated” by OR-ing all the categories and checking whether the result is false, or NULL. The check for NULL is necessary if you have NULL-values in your table: this is a weird characteristic of SQL that defines that TRUE AND NULL equals NULL (see Wikipedia).

The actual SQL query for this example would be:

SELECT 
  SUM(projects.public = 0 AND projects.published_at IS NULL) AS project_is_private,
  SUM(projects.public = 1 AND projects.published_at IS NULL) AS project_is_public,
  SUM(projects.published_at IS NOT NULL) AS project_is_published,
  SUM( NOT (
    (projects.public = 0 AND projects.published_at IS NULL) OR
    (projects.public = 1 AND projects.published_at IS NULL) OR
    (projects.published_at IS NOT NULL)
  ) OR (
    (projects.public = 0 AND projects.published_at IS NULL) OR
    (projects.public = 1 AND projects.published_at IS NULL) OR
    (projects.published_at IS NOT NULL) IS NULL)) AS other
FROM projects

Some notes about this query:

  • It is complety built using the fragments from the categories. The fragment for the other-cagegory is a little verbose, but what do I care? It works and is generated automatically! :-)
  • Note that a record can be in multiple categories, depending on the category definitions. The other category only contains records that conform to none of the provided categories.
  • SUM is used in stead of COUNT. This way, I can query all the categories at once and it solves the problems with NULL-values, while keeping my WHERE and GROUP BY clause nice and clean :-)
  • The query is built completely using ActiveRecords find method by using anonymous scopes. Therefore, Rails 2.1 is required, but this makes some neat tricks possible as well.

I also have a Project.olap_drilldown method that I can use to find the individual projects in a category:

@projects = Project.olap_drilldown(olap_definition, :project_is_public)
# SELECT projects.* FROM projects 
# WHERE (projects.public = 1 AND projects.published_at IS NULL)
 
@projects.each do |project|
  puts project.name
end

Because this functionality is built on anonymous scopes, it offers some interesting additional functionality. You can use your own scopes to limit the input dataset

class Project < ActiveRecord::Base
  named_scope :recent, lambda { { :conditions => 
              ['created_at > ?', Time.now - 7.days]} }
  ...
end
# This will add a WHERE-clause to the OLAP query
results = Project.recent.olap_query(olap_definition)
 
# Or, use :conditions for the same effect
results = Project.olap_query(olap_definition.merge(
            :conditions => ['created_at > ?', Time.now - 7.days]))

As I noted before, the GROUP BY-clause is not used. I already built an extension to use the GROUP BY clause to group the results in periods of a given timestamp field of the model (e.g. created_by). When I pass the result of such a query to the Google Chart API, I can generate trend graphs to see how my dataset is evolving.

If I have time and there is any interest, I may release this extension as a gem or Rails plugin.

UPDATE: I rewrote it and released this project on github.

1 Comment - Tags: , , , ,