<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Floorplanner Tech Blog &#187; ruby-prof</title>
	<atom:link href="http://techblog.floorplanner.com/tag/ruby-prof/feed/" rel="self" type="application/rss+xml" />
	<link>http://techblog.floorplanner.com</link>
	<description>Our latest geek adventures!</description>
	<lastBuildDate>Tue, 16 Mar 2010 18:45:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Performance tweaking of Ruby algorithms</title>
		<link>http://techblog.floorplanner.com/2009/09/27/performance-tweaking-of-ruby-algorithms/</link>
		<comments>http://techblog.floorplanner.com/2009/09/27/performance-tweaking-of-ruby-algorithms/#comments</comments>
		<pubDate>Sun, 27 Sep 2009 06:02:42 +0000</pubDate>
		<dc:creator>Willem van Bergen</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Ruby on Rails]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[blocks]]></category>
		<category><![CDATA[parallelization]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[regular expressions]]></category>
		<category><![CDATA[request-log-analyzer]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[ruby-prof]]></category>

		<guid isPermaLink="false">http://techblog.floorplanner.com/?p=716</guid>
		<description><![CDATA[I have been working on request-log-analyzer quite a lot recently. One of the things I focused on was improving the parsing performance: because it parses log files that often are very big, processing times tend to be long. So all savings are very welcome.
Improving the performance of a command line application that does a lot [...]]]></description>
			<content:encoded><![CDATA[<p>I have been working on <a href="http://github.com/wvanbergen/request-log-analyzer">request-log-analyzer</a> quite a lot recently. One of the things I focused on was improving the parsing performance: because it parses log files that often are very big, processing times tend to be long. So all savings are very welcome.</p>
<p>Improving the performance of a command line application that does a lot of processing is very different from optimizing the performance of a web application. Request-log-analyzer basically reads a file line by line and processes it, so small performance improvements in the line processing algorithm can really add up when the file has a lot of lines. I used <a href="http://ruby-prof.rubyforge.org/">ruby-prof</a> to get information about the performance of our algorithms split out by method to focus my tweaking efforts. I have written down some of my findings below; hopefully they can be helpful.</p>
<h3>Parallelization</h3>
<p>My first thought for improving performance was parallelization: parse multiple lines at the same time. Unfortunately, this did not yield the results I was hoping for: it instead become slower. Probably, this is because Ruby implements its own in-process threading and thus only uses one core of my processor.</p>
<h3>Block overhead</h3>
<p>The overhead of using a block should not be underestimated. Consider the following simple change, which improved the performance of request-log-analyzer by 1.5-2% on large log files:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family: Monaco,monospace;"><span style="color:#008000; font-style:italic;"># with block</span>
io.<span style="color:#9900CC;">each_line</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>line<span style="color:#006600; font-weight:bold;">|</span> process_line<span style="color:#006600; font-weight:bold;">&#40;</span>line<span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
<span style="color:#008000; font-style:italic;"># without block</span>
process_line<span style="color:#006600; font-weight:bold;">&#40;</span>line<span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#9966CC; font-weight:bold;">while</span> line = io.<span style="color:#CC0066; font-weight:bold;">gets</span></pre></div></div>

<h3>Regular expressions</h3>
<p>If you&#8217;re using complex regular expressions, and you do not expect that every string will match it successfully, it can be beneficial to test the string with a simpler regexp first. For example, request-log-analyzer uses a complex regexp to see if a line in a log file is a &#8220;Completed in&#8230;&#8221; line with the request duration in it:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family: Monaco,monospace;"><span style="color:#008000; font-style:italic;"># Check every line to see if it is a &quot;completed&quot; line and capture the values</span>
<span style="color:#9966CC; font-weight:bold;">if</span> line =~ <span style="color:#006600; font-weight:bold;">/</span>Completed <span style="color:#9966CC; font-weight:bold;">in</span> <span style="color:#006600; font-weight:bold;">&#40;</span>\d<span style="color:#006600; font-weight:bold;">+</span><span style="color:#006600; font-weight:bold;">&#41;</span>ms \<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#40;</span>?:View: <span style="color:#006600; font-weight:bold;">&#40;</span>\d<span style="color:#006600; font-weight:bold;">+</span><span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#006600; font-weight:bold;">&#41;</span>?DB: <span style="color:#006600; font-weight:bold;">&#40;</span>\d<span style="color:#006600; font-weight:bold;">+</span><span style="color:#006600; font-weight:bold;">&#41;</span>\<span style="color:#006600; font-weight:bold;">&#41;</span> \<span style="color:#006600; font-weight:bold;">|</span> <span style="color:#006600; font-weight:bold;">&#40;</span>\d\d\d<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#006600; font-weight:bold;">+</span>\<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006600; font-weight:bold;">&#40;</span>http.<span style="color:#006600; font-weight:bold;">+</span><span style="color:#006600; font-weight:bold;">&#41;</span>\<span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">/</span>
  <span style="color:#008000; font-style:italic;"># do something with the captured values</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>I improved this by first checking for a superficial regexp that tells me with 99% certainty that the complex regexp will match the line successfully as well:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family: Monaco,monospace;"><span style="color:#9966CC; font-weight:bold;">if</span> line =~ <span style="color:#006600; font-weight:bold;">/</span>Completed <span style="color:#9966CC; font-weight:bold;">in</span><span style="color:#006600; font-weight:bold;">/</span>
  <span style="color:#9966CC; font-weight:bold;">if</span> line =~ <span style="color:#006600; font-weight:bold;">/</span>Completed <span style="color:#9966CC; font-weight:bold;">in</span> <span style="color:#006600; font-weight:bold;">&#40;</span>\d<span style="color:#006600; font-weight:bold;">+</span><span style="color:#006600; font-weight:bold;">&#41;</span>ms \<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#40;</span>?:View: <span style="color:#006600; font-weight:bold;">&#40;</span>\d<span style="color:#006600; font-weight:bold;">+</span><span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#006600; font-weight:bold;">&#41;</span>?DB: <span style="color:#006600; font-weight:bold;">&#40;</span>\d<span style="color:#006600; font-weight:bold;">+</span><span style="color:#006600; font-weight:bold;">&#41;</span>\<span style="color:#006600; font-weight:bold;">&#41;</span> \<span style="color:#006600; font-weight:bold;">|</span> <span style="color:#006600; font-weight:bold;">&#40;</span>\d\d\d<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#006600; font-weight:bold;">+</span>\<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006600; font-weight:bold;">&#40;</span>http.<span style="color:#006600; font-weight:bold;">+</span><span style="color:#006600; font-weight:bold;">&#41;</span>\<span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">/</span>
    <span style="color:#008000; font-style:italic;"># do something with the captured values</span>
  <span style="color:#9966CC; font-weight:bold;">else</span>
    <span style="color:#ff6633; font-weight:bold;">$stderr</span>.<span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;#{line.inspect} expected to match 'Completed' regexp!&quot;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>Depending on the log file, this can increase performance by ~3%. Another benefit of this approach is that it will give feedback on lines that matched the simple regexp, but not the complex one. This information can be used to correct the regular expressions.</p>
<h3>Calculate things that do not change only once</h3>
<p>Calculate things that do not change only once is easier said than done. For instance, request-log analyzer can aggregate durations in a category. A category can be based on a request field that is parsed from the log, or a <code>Proc</code> that calculates it:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family: Monaco,monospace;"><span style="color:#008000; font-style:italic;"># during parsing:</span>
<span style="color:#9966CC; font-weight:bold;">if</span> categorizer.<span style="color:#9900CC;">respond_to</span>?<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:call</span><span style="color:#006600; font-weight:bold;">&#41;</span>
  category = categorizer.<span style="color:#9900CC;">call</span><span style="color:#006600; font-weight:bold;">&#40;</span>request<span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#9966CC; font-weight:bold;">else</span>
  category = request<span style="color:#006600; font-weight:bold;">&#91;</span>categorizer<span style="color:#006600; font-weight:bold;">&#93;</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
categories<span style="color:#006600; font-weight:bold;">&#91;</span>category<span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">aggregate_request</span><span style="color:#006600; font-weight:bold;">&#40;</span>request<span style="color:#006600; font-weight:bold;">&#41;</span></pre></div></div>

<p>With this implementation, <code>categorizer</code> will be checked to be a <code>Proc</code> for every request, but as the value of <code>categorizer</code> will not change, the result of the check will be constant as well. I solved this my making sure that it is always a <code>Proc</code> beforehand, so the check is no longer necessary during parsing:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family: Monaco,monospace;"><span style="color:#008000; font-style:italic;"># before parsing:</span>
<span style="color:#9966CC; font-weight:bold;">if</span> categorizer.<span style="color:#9900CC;">respond_to</span>?<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:call</span><span style="color:#006600; font-weight:bold;">&#41;</span>
  categorizer_proc = categorizer
<span style="color:#9966CC; font-weight:bold;">else</span>
  categorizer_proc = <span style="color:#CC0066; font-weight:bold;">lambda</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>request<span style="color:#006600; font-weight:bold;">|</span> request<span style="color:#006600; font-weight:bold;">&#91;</span>categorizer<span style="color:#006600; font-weight:bold;">&#93;</span> <span style="color:#006600; font-weight:bold;">&#125;</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
<span style="color:#008000; font-style:italic;"># during parsing:</span>
category = categorizer_proc.<span style="color:#9900CC;">call</span><span style="color:#006600; font-weight:bold;">&#40;</span>request<span style="color:#006600; font-weight:bold;">&#41;</span>
categories<span style="color:#006600; font-weight:bold;">&#91;</span>category<span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">aggregate_request</span><span style="color:#006600; font-weight:bold;">&#40;</span>request<span style="color:#006600; font-weight:bold;">&#41;</span></pre></div></div>

<p>Performance gain: ~1%! And because on several instances a similar technique could be applied, the performance got improved by about 4% in total.</p>
<h3>Check the most common case first</h3>
<p>Consider the following example, which converts a number in any traffic unit (kilobytes, MB, etc) to bytes:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family: Monaco,monospace;"><span style="color:#9966CC; font-weight:bold;">def</span> convert_traffic<span style="color:#006600; font-weight:bold;">&#40;</span>value, unit<span style="color:#006600; font-weight:bold;">&#41;</span>
  <span style="color:#9966CC; font-weight:bold;">case</span> unit
  <span style="color:#9966CC; font-weight:bold;">when</span> <span style="color:#ff3333; font-weight:bold;">:GB</span>, <span style="color:#ff3333; font-weight:bold;">:G</span>, <span style="color:#ff3333; font-weight:bold;">:gigabyte</span>      <span style="color:#9966CC; font-weight:bold;">then</span> <span style="color:#006600; font-weight:bold;">&#40;</span>value.<span style="color:#9900CC;">to_f</span> <span style="color:#006600; font-weight:bold;">*</span> <span style="color:#006666;">1000</span>_000_000<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">round</span>
  <span style="color:#9966CC; font-weight:bold;">when</span> <span style="color:#ff3333; font-weight:bold;">:MB</span>, <span style="color:#ff3333; font-weight:bold;">:M</span>, <span style="color:#ff3333; font-weight:bold;">:megabyte</span>      <span style="color:#9966CC; font-weight:bold;">then</span> <span style="color:#006600; font-weight:bold;">&#40;</span>value.<span style="color:#9900CC;">to_f</span> <span style="color:#006600; font-weight:bold;">*</span> <span style="color:#006666;">1000</span>_000<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">round</span>
  <span style="color:#9966CC; font-weight:bold;">when</span> <span style="color:#ff3333; font-weight:bold;">:KB</span>, <span style="color:#ff3333; font-weight:bold;">:K</span>, <span style="color:#ff3333; font-weight:bold;">:kilobyte</span>, <span style="color:#ff3333; font-weight:bold;">:kB</span> <span style="color:#9966CC; font-weight:bold;">then</span> <span style="color:#006600; font-weight:bold;">&#40;</span>value.<span style="color:#9900CC;">to_f</span> <span style="color:#006600; font-weight:bold;">*</span> <span style="color:#006666;">1000</span><span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">round</span>
  <span style="color:#008000; font-style:italic;"># ... even more units here</span>
  <span style="color:#9966CC; font-weight:bold;">else</span> value.<span style="color:#9900CC;">to_i</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>In most cases, the value will simply given in bytes, which will be returned by the final <code>else</code> after all possibilities have been checked. This can be improved by checking for this possibility first:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family: Monaco,monospace;"><span style="color:#008000; font-style:italic;"># Converts traffic in any unit to bytes.</span>
<span style="color:#9966CC; font-weight:bold;">def</span> convert_traffic<span style="color:#006600; font-weight:bold;">&#40;</span>value, unit<span style="color:#006600; font-weight:bold;">&#41;</span>
  <span style="color:#9966CC; font-weight:bold;">case</span> unit
  <span style="color:#9966CC; font-weight:bold;">when</span> <span style="color:#0000FF; font-weight:bold;">nil</span>, <span style="color:#ff3333; font-weight:bold;">:B</span>, <span style="color:#ff3333; font-weight:bold;">:byte</span>          <span style="color:#9966CC; font-weight:bold;">then</span> value.<span style="color:#9900CC;">to_i</span>
  <span style="color:#9966CC; font-weight:bold;">when</span> <span style="color:#ff3333; font-weight:bold;">:GB</span>, <span style="color:#ff3333; font-weight:bold;">:G</span>, <span style="color:#ff3333; font-weight:bold;">:gigabyte</span>      <span style="color:#9966CC; font-weight:bold;">then</span> <span style="color:#006600; font-weight:bold;">&#40;</span>value.<span style="color:#9900CC;">to_f</span> <span style="color:#006600; font-weight:bold;">*</span> <span style="color:#006666;">1000</span>_000_000<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">round</span>
  <span style="color:#9966CC; font-weight:bold;">when</span> <span style="color:#ff3333; font-weight:bold;">:MB</span>, <span style="color:#ff3333; font-weight:bold;">:M</span>, <span style="color:#ff3333; font-weight:bold;">:megabyte</span>      <span style="color:#9966CC; font-weight:bold;">then</span> <span style="color:#006600; font-weight:bold;">&#40;</span>value.<span style="color:#9900CC;">to_f</span> <span style="color:#006600; font-weight:bold;">*</span> <span style="color:#006666;">1000</span>_000<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">round</span>
  <span style="color:#9966CC; font-weight:bold;">when</span> <span style="color:#ff3333; font-weight:bold;">:KB</span>, <span style="color:#ff3333; font-weight:bold;">:K</span>, <span style="color:#ff3333; font-weight:bold;">:kilobyte</span>, <span style="color:#ff3333; font-weight:bold;">:kB</span> <span style="color:#9966CC; font-weight:bold;">then</span> <span style="color:#006600; font-weight:bold;">&#40;</span>value.<span style="color:#9900CC;">to_f</span> <span style="color:#006600; font-weight:bold;">*</span> <span style="color:#006666;">1000</span><span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">round</span>
  <span style="color:#008000; font-style:italic;"># ... even more units here</span>
  <span style="color:#9966CC; font-weight:bold;">else</span> <span style="color:#CC0066; font-weight:bold;">raise</span> <span style="color:#996600;">&quot;Unknown unit: #{unit.inspect}!&quot;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>Again this change adds up to a ~1% performance increase if this method is called very often.</p>
<p>These kinds of changes really improved the performance of request-log-analyzer by quite a bit, so upgrade to the latest version to get some of your valuable time back! <img src='http://techblog.floorplanner.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://techblog.floorplanner.com/2009/09/27/performance-tweaking-of-ruby-algorithms/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
