I have been spending some time creating a pure Ruby PNG library. For this library, I need to have some representation of the image, which is composed of RGB pixels, supporting an alpha channel. Because images can be composed of a lot of pixels, I want the implementation to be as memory efficient as possible. I also would like decent performance.
A very naive Ruby implementation for an image represents the red, green, blue and alpha channel using a floating point number between 0.0 and 1.0, and might look something like this:
class Pixel attr_reader :r, :g, :b, :a def initialize(r, g, b, a = 1.0) @r, @g, @b, @a = r, g, b, a end end class Image attr_reader :width, :height def initialize(width, height) @width, @height = width, height @pixels = Array.new(width * height) end def [](x,y) @pixels[y * width + x] end def []=(x,y, pixel) @pixels[y * width + x] = pixel end end
For a 10×10 image, this representation requires 4 times 100 floating point numbers, which require 8 bytes each. That’s already over 3kB for such a small image just for the floating point numbers! Ouch.
A simple improvement is to decide that 8-bit color depth is enough in the case, in which case each channel can be represented by an integer between 0 and 255. Storing such a number only costs one byte of memory. Ruby’s Fixnum class typically uses 4-byte integers. If only the 4 channels of one byte each could be combined into a single Fixnum instance… Behold!
class Pixel attr_reader :value alias :to_i :value def initialize(value) @value = value end def self.rgba(r, g, b, a = 255) self.new(r << 24 | g << 16 | b << 8 | a) end def r; (@value & 0xff000000) >> 24; end def g; (@value & 0x00ff0000) >> 16; end def b; (@value & 0x0000ff00) >> 8; end def a; (@value & 0x000000ff); end end
Notice the bit operations, which are extremely fast. This only requires 100 times 4 bytes = 400 bytes for storing the RGBA values for a 10×10 image, an 8 times improvement!
This implementation wraps every pixel inside an object. This is nice, because I want to access the separate channels of every pixel easily using the r, g, b, and a methods, and every other method that is defined for every pixel. However, a Ruby object instance has an overhead of at least 20 bytes. That’s 20 times 100 is about 2kB for our 10×10 image!
To get rid of the object overhead, it is possible to simply store the Fixnum value for every pixel, and only wrapping it inside a Pixel object when it is accessed. This can be done by modifying the Image class:
class Image # ... def [](x,y) Pixel.new(@pixels[y * width + x]) # wrap end def []=(x,y, pixel) @pixels[y * width + x] = pixel.to_i # unwrap end end
As you can see, some simply changes in the representation can really make a difference in the memory usage. Can this representation be improved further?
Integer math calculations
Because we are now using integers to represent a pixel, this can cause problems when the math requires you to use floating point numbers. For example, the formula for alpha composition of two pixels is as follows:

in which Ca is the color component of the foreground pixel, αa the alpha channel of the foreground pixel, Cb and αb the same values for the background pixel, all of which should be values between 0 and 1.
A naive implementation could convert the integer numbers to their floating point equivalents:
def compose(fg, bg) return bg if fg.a == 0 return fg if fg.a == 255 fg_alpha = fg.a / 255.0 bg_alpha = fg.a / 255.0 alpha_complement = (1.0 - fg_alpha) * bg_alpha new_r = (fg_alpha * fg.r + alpha_complement * bg.r).round new_g = (fg_alpha * fg.g + alpha_complement * bg.g).round new_b = (fg_alpha * fg.b + alpha_complement * bg.b).round new_a = ((fg_alpha + alpha_complement) * 255).round Pixel.rgba(new_r, new_g, new_b, new_a) end
This implementation is already a little bit optimized: no unnecessary conversions and calculations are being performed. However, this composition can be done a lot quicker after realizing that 255 is almost a power of two, in which computers excel because it can use bitwise operators and shifting for some calculations.
My new approach uses a quicker implementation of multiplication of 8-bit integers that represent floating numbers between 0 and 1:
def compose(fg, bg) return bg if fg.a == 0 return fg if fg.a == 255 alpha_complement = multiply(255 - fg.a, bg.a) new_r = multiply(fg.a, fg.r) + multiply(alpha_complement, bg.r) new_g = multiply(fg.a, fg.g) + multiply(alpha_complement, bg.g) new_b = multiply(fg.a, fg.b) + multiply(alpha_complement, bg.b) new_a = fg.a + alpha_complement Pixel.rgba(new_r, new_g, new_b, new_a) end # Quicker alternative for (a * b / 255.0).round def multiply(a, b) t = a * b + 0x80 ((t >> 8) + t) >> 8 end
Not that the new implementation is less precise in theory, but this precision is lost anyway because we have to convert the values back to 8 bit RGBA values. Your thoughts?
Tags: alpha composition, bitwise operations, efficiency, image, integer math, memory, performance, pixel, rgb, rgba, ruby








January 17th, 2010 at 2:15 pm
[...] Ode to Array#pack and String#unpack Posted by Willem van Bergen on in Ruby on Rails Remember my last post, where I representing a pixel with a Fixnum, storing the R, G, B and A value in its 4 bytes of [...]
February 24th, 2010 at 11:49 am
“Ruby’s Fixnum class typically uses 4-byte integers.”
Actually this is not entirely true. Ruby stores “regular” objects on the heap and references are represented by C pointers to those objects. Some primitive values like true, false, nil and integers in a certain range – also called “immediate values” – are stored directly inside the C pointer. The first (or last, don’t remember) bit of the pointer is used to identify whether a pointer contains an immediate value. This means that on x86, Fixnums are actually 31-bits, not 32-bits. In the worst case scenario a single pixel in your implementation would have to be represented by a Bignum object, which I believe is 5 words large on x86 (20 bytes).
On x86_64 a Fixnum can represent any 32-bit number so no problem there, but all Fixnums are double the size compared to x86.
February 24th, 2010 at 4:52 pm
dbussink already pointed me to this issue. I did not notice it as I only have 64-bit systems to my disposal
He also suggested reversing the order of the bytes to ABGR, and make A==0 mean fully opaque instead of fully transparent. This would make most pixels fit in a 32-bit Ruby Fixnum. This will however increase the complexity of the saving routine, making it much slower than the highly optimized routine I am using right now.
February 24th, 2010 at 5:08 pm
you should check out http://github.com/aberant/spittle . we made a pure ruby library for parsing, combining, and saving PNG images for use in CSS sprites.
February 24th, 2010 at 5:28 pm
You should really consider using ChunkyPNG for reading the input PNGs for your project, because ChunkyPNG also supports interlaced images, grayscale images and palette-based images (very common). Moreover, using ChunkyPNG to encode the sprite image will probably a lot faster than your current encoder (an estimated guess after looking at your source code).
Combining our efforts may eliminate writing some unnecessary duplicate code for both of us
If you are interested, please send me a message so I can help out.