Techblog

Tech Blog

Our latest geek adventures!

Posts Tagged ‘performance’

17 January Ode to Array#pack and String#unpack

Remember my last post, where I representing a pixel with a Fixnum, storing the R, G, B and A value in its 4 bytes of memory? Well, I have been working some more on my PNG library and I am now trying loading and saving an image.

Using the PNG specification, building a PNG encoder/decoder isn’t that hard, but the required algorithmic calculations make sure that performance in Ruby is less than stellar. I have rewritten all calculations to only use fast integer math (plus, minus, multiply and bitwise operators), but simply the amount of code that is getting executed is slowing Ruby down. What more can I do to improve the performance?

Encoding RGBA images

Optimizing loading images is very hard, because PNG images can have many variations, and taking shortcuts means that some images are no longer supported. Not so with saving images: as long an image is saved using one of the valid variations, every PNG decoder will be able to read the file. Let’s see if it is possible to optimize one of these encoding variations.

During encoding, the image get splits up into scanlines (rows) of pixels, which in turn get converted into bytes. These bytes can be filtered for optimal compression. For a 3×3 8-bit RGBA image, the result looks like this:

F Rf Gf Bf Af Rf Gf Bf Af Rf Gf Bf Af
F Rf Gf Bf Af Rf Gf Bf Af Rf Gf Bf Af
F Rf Gf Bf Af Rf Gf Bf Af Rf Gf Bf Af

Every line starts with a byte F indicating the filter method, followed by the filtered R, G and B value for every pixel on that line. Now, if we choose filter method 0, which means no filtering, the result looks like this:

0 Ro Go Bo Ao Ro Go Bo Ao Ro Go Bo Ao
0 Ro Go Bo Ao Ro Go Bo Ao Ro Go Bo Ao
0 Ro Go Bo Ao Ro Go Bo Ao Ro Go Bo Ao

Now, the original R, G, B and A byte from the original pixel’s Fixnum, occur in big-endian or network byte order, starting with the top left pixel, moving left to right and then top to bottom. Exactly like the pixels are stored in our image’s pixel array! This means that we can use the Array#pack method to encode into this format. The Array#pack-notation for this is "xN3" in which x get translated into a null byte, and every N as 4-byte integer in network byte order. For optimal performance, it is best to not split the original array in lines, but to pack the complete pixel array at once. So, we can encode all pixels with this command:

pixeldata = pixels.pack("xN#{width}" * height)

This way, the splitting the image into lines, splitting the pixels into bytes, and filtering the bytes can be skipped. In Ruby 1.8.7, this means a speedup of over 1500% (no typo)! Of course, because no filtering applied, the subsequent compression is not optimal, but that is a tradeoff that I am willing to make.

Encoding RGB images

What about RGB images without alpha channel? We can simply choose to encode these using the RGBA method, but that increases the file size with roughly 25%. Can we fix this somehow?

The unfiltered pixel data should look something like this:

0 Ro Go Bo Ro Go Bo Ro Go Bo
0 Ro Go Bo Ro Go Bo Ro Go Bo
0 Ro Go Bo Ro Go Bo Ro Go Bo

This means that for every pixel that is encoded as a 4-byte integer, the last byte should be ditched. Luckily, the Array#pack method offers a modifier that does just that: X. Packing a 3 pixel line can be done with "xNXNXNX". Again we would like to pack the whole pixel array at once:

pixeldata = pixels.pack(("x" + ('NX' * width)) * height)

Because all the encoding steps can get skipped once again, the speed improvement is again 1500%! And the result is 25% smaller than the RGBA method. This method is actually so speedy, that saving an image using Ruby 1.9.1 is only a little bit slower (< 10%) than saving a PNG image using RMagick! See my performance comparison.

Loading images

Given the promising results of the Array#pack method, using its counterpart String#unpack looks promising for speedy image loading, if you know the image’s size and the encoding format beforehand.

An RGBA formatted stream can be loaded quickly with this command:

pixels = rgba_pixeldata.unpack("N#{width * height}")
image = Image.new(width, height, pixels)

For an RGB formatted stream, we can use the X modifier again, but we have to make sure to set the alpha value for every pixel to 255:

pixels = rgb_pixeldata.unpack("NX" * (width * height))
pixels.map! { |pixel| pixel | 0x000000ff }
image = Image.new(width, height, pixels)

You can even use little-endian integers to load streams in ABGR format!

pixels = abgr_pixeldata.unpack("V#{width * height}")
image = Image.new(width, height, pixels)

Loading pixel data for an image like this is again over 1500% faster than decoding the same PNG image. However, this can only be applied if you have control over the input format of the image.

To conclude

Array#pack and String#unpack really have increased the performance for my code. If you can apply them for project, don’t hesitate and spread the love! For all other cases, use as little code as possible, and upgrade to Ruby 1.9 for improved algorithmic performance. :-)

No Comments - Tags: , , , , , , , ,

14 January Memory efficiency when using Ruby

I have been spending some time creating a pure Ruby PNG library. For this library, I need to have some representation of the image, which is composed of RGB pixels, supporting an alpha channel. Because images can be composed of a lot of pixels, I want the implementation to be as memory efficient as possible. I also would like decent performance.

A very naive Ruby implementation for an image represents the red, green, blue and alpha channel using a floating point number between 0.0 and 1.0, and might look something like this:

class Pixel
  attr_reader :r, :g, :b, :a
 
  def initialize(r, g, b, a = 1.0)
    @r, @g, @b, @a = r, g, b, a
  end
end
 
class Image
  attr_reader :width, :height
 
  def initialize(width, height)
    @width, @height = width, height
    @pixels = Array.new(width * height)
  end
 
  def [](x,y)
    @pixels[y * width + x]
  end
 
  def []=(x,y, pixel)
    @pixels[y * width + x] = pixel
  end
end

For a 10×10 image, this representation requires 4 times 100 floating point numbers, which require 8 bytes each. That’s already over 3kB for such a small image just for the floating point numbers! Ouch.

A simple improvement is to decide that 8-bit color depth is enough in the case, in which case each channel can be represented by an integer between 0 and 255. Storing such a number only costs one byte of memory. Ruby’s Fixnum class typically uses 4-byte integers. If only the 4 channels of one byte each could be combined into a single Fixnum instance… Behold!

class Pixel
  attr_reader :value
  alias :to_i :value
 
  def initialize(value)
    @value = value
  end
 
  def self.rgba(r, g, b, a = 255)
    self.new(r << 24 | g << 16 | b << 8 | a)
  end
 
  def r; (@value & 0xff000000) >> 24; end
  def g; (@value & 0x00ff0000) >> 16; end
  def b; (@value & 0x0000ff00) >>  8; end
  def a; (@value & 0x000000ff); end
end

Notice the bit operations, which are extremely fast. This only requires 100 times 4 bytes = 400 bytes for storing the RGBA values for a 10×10 image, an 8 times improvement!

This implementation wraps every pixel inside an object. This is nice, because I want to access the separate channels of every pixel easily using the r, g, b, and a methods, and every other method that is defined for every pixel. However, a Ruby object instance has an overhead of at least 20 bytes. That’s 20 times 100 is about 2kB for our 10×10 image!

To get rid of the object overhead, it is possible to simply store the Fixnum value for every pixel, and only wrapping it inside a Pixel object when it is accessed. This can be done by modifying the Image class:

class Image
  # ...
 
  def [](x,y)
    Pixel.new(@pixels[y * width + x]) # wrap
  end
 
  def []=(x,y, pixel)
    @pixels[y * width + x] = pixel.to_i # unwrap
  end
end

As you can see, some simply changes in the representation can really make a difference in the memory usage. Can this representation be improved further?

Integer math calculations

Because we are now using integers to represent a pixel, this can cause problems when the math requires you to use floating point numbers. For example, the formula for alpha composition of two pixels is as follows:

Alpha compositing formula

in which Ca is the color component of the foreground pixel, αa the alpha channel of the foreground pixel, Cb and αb the same values for the background pixel, all of which should be values between 0 and 1.

A naive implementation could convert the integer numbers to their floating point equivalents:

def compose(fg, bg)
  return bg if fg.a == 0
  return fg if fg.a == 255
 
  fg_alpha = fg.a / 255.0
  bg_alpha = fg.a / 255.0
  alpha_complement = (1.0 - fg_alpha) * bg_alpha
 
  new_r = (fg_alpha * fg.r + alpha_complement * bg.r).round
  new_g = (fg_alpha * fg.g + alpha_complement * bg.g).round
  new_b = (fg_alpha * fg.b + alpha_complement * bg.b).round
  new_a = ((fg_alpha + alpha_complement) * 255).round
 
  Pixel.rgba(new_r, new_g, new_b, new_a)
end

This implementation is already a little bit optimized: no unnecessary conversions and calculations are being performed. However, this composition can be done a lot quicker after realizing that 255 is almost a power of two, in which computers excel because it can use bitwise operators and shifting for some calculations.

My new approach uses a quicker implementation of multiplication of 8-bit integers that represent floating numbers between 0 and 1:

def compose(fg, bg)
  return bg if fg.a == 0
  return fg if fg.a == 255
 
  alpha_complement = multiply(255 - fg.a, bg.a)
  new_r = multiply(fg.a, fg.r) + multiply(alpha_complement, bg.r)
  new_g = multiply(fg.a, fg.g) + multiply(alpha_complement, bg.g)
  new_b = multiply(fg.a, fg.b) + multiply(alpha_complement, bg.b)
  new_a = fg.a + alpha_complement
 
  Pixel.rgba(new_r, new_g, new_b, new_a)
end
 
# Quicker alternative for (a * b / 255.0).round
def multiply(a, b)
  t = a * b + 0x80
  ((t >> 8) + t) >> 8
end

Not that the new implementation is less precise in theory, but this precision is lost anyway because we have to convert the values back to 8 bit RGBA values. Your thoughts?

5 Comments - Tags: , , , , , , , , , ,

8 January Request-log-analyzer 1.6.0

Bart & I just released request-log-analyzer 1.6.0. New features since the 1.5.0 release:

  • PostgreSQL query log support;
  • Delayed::Job log support;
  • Small fixes in the Rails file format;
  • Various other small fixes and improvements.

As always, run the following command to install or upgrade to the latest version:

sudo gem install request-log-analyzer

No Comments - Tags: , , , , , , ,

3 January Faster RESTful XML processing in Rails

The Floorplanner API uses XML-formatted requests and responses, so our servers process a lot of XML. In Rails, most XML parsing is done using the Hash.from_xml method. This method allows for different backends, but the current backends are either slow or buggy. I decided to fix this situation myself.

I fixed bugs in the current libxml and nokogiri backends, and I added some new SAX-based backends for additional performance. My patches are already accepted by the Rails team, so everybody will enjoy fast and bug-free XML parsing in Rails 2.3.6 and eventually in Rails 3!

Performance

I have benchmarked the new backends using an 1.8 MB XML file. The REXML, LibXML and Nokogiri backends currently ship with Rails, but are horribly slow or are buggy. The ++ variants are my improved versions of these backends, and the SAX variants are completely written from scratch using a SAX-based parser.

                  user     system      total        real
REXML        17.170000   0.060000  17.230000 ( 17.297263)
LibXML        2.100000   0.100000   2.200000 (  2.217380)
LibXML++      0.530000   0.000000   0.530000 (  0.531034)
LibXMLSAX     0.630000   0.010000   0.640000 (  0.632472)
Nokogiri      5.280000   0.020000   5.300000 (  5.322575)
Nokogiri++    1.840000   0.020000   1.860000 (  1.872055)
NokogiriSAX   0.770000   0.000000   0.770000 (  0.778777)

As you can see, LibXML++ is the fastest backend, but NokogiriSAX comes close if you want to stick to Nokogiri.

No patience?

Don’t want to wait for the next Rails version for this speed up? You don’t have to: just put the appropriate backend file in the /lib/active_support/xml_mini/ folder of your Rails project, and set your backend accordingly in your environment:

ActiveSupport::XmlMini.backend = 'NokogiriSAX'

Happy coding in 2010!

3 Comments - Tags: , , , , , ,

27 September Performance tweaking of Ruby algorithms

I have been working on request-log-analyzer quite a lot recently. One of the things I focused on was improving the parsing performance: because it parses log files that often are very big, processing times tend to be long. So all savings are very welcome.

Improving the performance of a command line application that does a lot of processing is very different from optimizing the performance of a web application. Request-log-analyzer basically reads a file line by line and processes it, so small performance improvements in the line processing algorithm can really add up when the file has a lot of lines. I used ruby-prof to get information about the performance of our algorithms split out by method to focus my tweaking efforts. I have written down some of my findings below; hopefully they can be helpful.

Parallelization

My first thought for improving performance was parallelization: parse multiple lines at the same time. Unfortunately, this did not yield the results I was hoping for: it instead become slower. Probably, this is because Ruby implements its own in-process threading and thus only uses one core of my processor.

Block overhead

The overhead of using a block should not be underestimated. Consider the following simple change, which improved the performance of request-log-analyzer by 1.5-2% on large log files:

# with block
io.each_line { |line| process_line(line) }
 
# without block
process_line(line) while line = io.gets

Regular expressions

If you’re using complex regular expressions, and you do not expect that every string will match it successfully, it can be beneficial to test the string with a simpler regexp first. For example, request-log-analyzer uses a complex regexp to see if a line in a log file is a “Completed in…” line with the request duration in it:

# Check every line to see if it is a "completed" line and capture the values
if line =~ /Completed in (\d+)ms \((?:View: (\d+), )?DB: (\d+)\) \| (\d\d\d).+\[(http.+)\]/
  # do something with the captured values
end

I improved this by first checking for a superficial regexp that tells me with 99% certainty that the complex regexp will match the line successfully as well:

if line =~ /Completed in/
  if line =~ /Completed in (\d+)ms \((?:View: (\d+), )?DB: (\d+)\) \| (\d\d\d).+\[(http.+)\]/
    # do something with the captured values
  else
    $stderr.puts "#{line.inspect} expected to match 'Completed' regexp!"
  end
end

Depending on the log file, this can increase performance by ~3%. Another benefit of this approach is that it will give feedback on lines that matched the simple regexp, but not the complex one. This information can be used to correct the regular expressions.

Calculate things that do not change only once

Calculate things that do not change only once is easier said than done. For instance, request-log analyzer can aggregate durations in a category. A category can be based on a request field that is parsed from the log, or a Proc that calculates it:

# during parsing:
if categorizer.respond_to?(:call)
  category = categorizer.call(request)
else
  category = request[categorizer]
end
categories[category].aggregate_request(request)

With this implementation, categorizer will be checked to be a Proc for every request, but as the value of categorizer will not change, the result of the check will be constant as well. I solved this my making sure that it is always a Proc beforehand, so the check is no longer necessary during parsing:

# before parsing:
if categorizer.respond_to?(:call)
  categorizer_proc = categorizer
else
  categorizer_proc = lambda { |request| request[categorizer] }
end
 
# during parsing:
category = categorizer_proc.call(request)
categories[category].aggregate_request(request)

Performance gain: ~1%! And because on several instances a similar technique could be applied, the performance got improved by about 4% in total.

Check the most common case first

Consider the following example, which converts a number in any traffic unit (kilobytes, MB, etc) to bytes:

def convert_traffic(value, unit)
  case unit
  when :GB, :G, :gigabyte      then (value.to_f * 1000_000_000).round
  when :MB, :M, :megabyte      then (value.to_f * 1000_000).round
  when :KB, :K, :kilobyte, :kB then (value.to_f * 1000).round
  # ... even more units here
  else value.to_i
  end
end

In most cases, the value will simply given in bytes, which will be returned by the final else after all possibilities have been checked. This can be improved by checking for this possibility first:

# Converts traffic in any unit to bytes.
def convert_traffic(value, unit)
  case unit
  when nil, :B, :byte          then value.to_i
  when :GB, :G, :gigabyte      then (value.to_f * 1000_000_000).round
  when :MB, :M, :megabyte      then (value.to_f * 1000_000).round
  when :KB, :K, :kilobyte, :kB then (value.to_f * 1000).round
  # ... even more units here
  else raise "Unknown unit: #{unit.inspect}!"
  end
end

Again this change adds up to a ~1% performance increase if this method is called very often.

These kinds of changes really improved the performance of request-log-analyzer by quite a bit, so upgrade to the latest version to get some of your valuable time back! :-)

6 Comments - Tags: , , , , , , , , ,

11 December Rails 2.2 support for request-log-analyzer

I just released version 0.2.0 of request-log-analyzer, our tool to analyze request log files that are generated by Rails and Merb for performance tweaking. This new version supports the new log format of Rails 2.2, which has changed slightly.

An updated gem should be available any minute now. Run sudo gem update to upgrade the newest version.

No Comments - Tags: , , , ,

16 July Slow Rails in production mode due to Gems

Posted by jaap in Ruby on Rails

Last week I decided to tune the performance of our Rails applications, because I had suspicions the app performed not at it’s best.
I installed a fresh Rails application and started the simple script/server -e production. I did the command: ab -c 5 -n 1000 http://localhost:3000/ and found out that we had only 396 requests per second. On my own, slower laptop, I had 500 requests per second for a test app of the same Rails version. Why o why?

Decided to take a look at the Gems. I saw that Gem installed a new version on top of an old version, when I updated. These old versions were still used in our production application, while these new versions were installed! You can specify your GEM version in the Rails app, but if you haven’t done this, Rails will apparently take the oldest version. After removing these old versions I benchmarked the server again, and now 560 request per second, ow yeah!

So an advice to all of you, figuring out why rails doesn’t perform at it’s best: update your gems AND remove the old versions. Now we updated our real applications and they are running lots faster.

No Comments - Tags: , ,