<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Hack MySQL Blog</title>
	<atom:link href="http://hackmysql.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://hackmysql.com/blog</link>
	<description>Dolphins and camels, oh my!</description>
	<lastBuildDate>Wed, 06 Jan 2010 17:52:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Book review: Optimizing Oracle Peformance</title>
		<link>http://hackmysql.com/blog/2010/01/06/book-review-optimizing-oracle-peformance/</link>
		<comments>http://hackmysql.com/blog/2010/01/06/book-review-optimizing-oracle-peformance/#comments</comments>
		<pubDate>Wed, 06 Jan 2010 17:52:44 +0000</pubDate>
		<dc:creator>Daniel Nichter</dc:creator>
				<category><![CDATA[Book Review]]></category>
		<category><![CDATA[Cary Millsap]]></category>
		<category><![CDATA[Jeff Holt]]></category>
		<category><![CDATA[Optimizing Oracle Peformance]]></category>

		<guid isPermaLink="false">http://hackmysql.com/blog/?p=98</guid>
		<description><![CDATA[Optimizing Oracle Performance by     Cary Millsap and Jeff Holt uses Oracle to make its points, but these points apply also to MySQL.  The primary lesson I took away from this book is: all else aside, optimize/fix the user-action that provides the most economic benefit to the company; do this by [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://oreilly.com/catalog/9780596005276">Optimizing Oracle Performance</a> by     Cary Millsap and Jeff Holt uses Oracle to make its points, but these points apply also to MySQL.  The primary lesson I took away from this book is: all else aside, optimize/fix the user-action that provides the most economic benefit to the company; do this by profiling just that action and optimizing/fixing the most time-consuming events even if they are &#8220;idle&#8221; or &#8220;wait&#8221; events.</p>
<p>The authors call the aforementioned approach to performance optimization &#8220;Method R&#8221;.  It&#8217;s meant to be deterministic and teachable unlike &#8220;Method C&#8221;&#8211;the conventional method&#8211;whereby one uses their best judgment and experience to find the cause(s) of problems and fix them.  I agree, and Method R is fundamentally, imho, just the scientific method in practice.  Therefore, I like Method R because it puts &#8220;science&#8221; back into &#8220;computer science.&#8221;</p>
<p>The book also discusses queueing theory. It&#8217;s a whirlwind tour (~60 pages) but the authors provide everything you need to get started, including helper scripts and Excel worksheets.  I&#8217;m pretty sure that I&#8217;ll be working with this theory more in my job; when I do, I&#8217;ll begin with what the authors have given me (&#8221;<a href="http://en.wikiquote.org/wiki/Isaac_Newton">stand on the shoulders of giants</a>&#8220;).</p>
<p>One criticism/clarification comes to mind: Method R is reactive.  Let&#8217;s say your MySQL configuration is terrible so you&#8217;re not getting the most from your server as you could.  Method R may indirectly expose this only if the configuration is the root cause of a slow user-action.  So the configuration is only examined and fixed if the user deems their action unacceptably slow.  However, users don&#8217;t always complain; sometimes they just &#8220;live with it&#8221; because they don&#8217;t care or they don&#8217;t think it can be fixed or they&#8217;re afraid to complain or it&#8217;s always been that way so they&#8217;re not even aware that things could be better.  Thus, I think a more holistic view of performance optimization requires both a proactive method and a reactive method.  Method R is a great reactive method, but someone should be checking stuff even when there doesn&#8217;t seem to be a problem.  The authors don&#8217;t say &#8220;Method R is all you ever need to do&#8221;&#8211;I&#8217;m just making a clarification here.</p>
<p>Oracle extended SQL traces are used throughout the book to investigate performance issues.  Does MySQL have anything similar?  Nothing as cohesive comes to my mind (correct me if I&#8217;m wrong).  I think we can achieve the same thing via microslow logs, the community PROFILE feature, session status values, and scripts to glue it all together.  That&#8217;s a lot a of disparate pieces.  I&#8217;d rather have MySQL extended SQL traces (in a format more easily parsable than Oracle&#8217;s).</p>
<p>In summary, Optimizing Oracle Performance is a must-read for any database professional.  I think its emphases on providing the business the most &#8220;bang for its buck&#8221; and the deterministic nature of Method R are timeless and timely lessons for those of us who earn our livings by engaging in the science of computing.</p>
]]></content:encoded>
			<wfw:commentRss>http://hackmysql.com/blog/2010/01/06/book-review-optimizing-oracle-peformance/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Debugging and ripple effects</title>
		<link>http://hackmysql.com/blog/2009/11/18/debugging-and-ripple-effects/</link>
		<comments>http://hackmysql.com/blog/2009/11/18/debugging-and-ripple-effects/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 22:58:01 +0000</pubDate>
		<dc:creator>Daniel Nichter</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[mk-query-digest]]></category>

		<guid isPermaLink="false">http://hackmysql.com/blog/?p=92</guid>
		<description><![CDATA[Like I said earlier, every tiny change that the test suite reveals after code changes is significant.  I caught a very subtle &#8220;bug&#8221; today in recent changes to mk-query-digest (a.k.a. mqd).  If you like to read about subtle bugs, read on.
An mqd test on sample file slow023.txt began to differ after some pretty [...]]]></description>
			<content:encoded><![CDATA[<p>Like I <a href="http://hackmysql.com/blog/2009/10/30/zero-is-a-big-number/">said earlier</a>, every tiny change that the test suite reveals after code changes is significant.  I caught a very subtle &#8220;bug&#8221; today in recent changes to mk-query-digest (a.k.a. mqd).  If you like to read about subtle bugs, read on.</p>
<p>An mqd test on sample file slow023.txt began to differ after some pretty extensive code changes of late:<br />
<code><br />
< # Query 1: 0 QPS, 0x concurrency, ID 0x8E38374648788E52 at byte 0 ________<br />
---<br />
> # Query 1: 0 QPS, 0x concurrency, ID 0x2CFD93750B99C734 at byte 0 ________<br />
</code><br />
The ID which depends on the query&#8217;s fingerprint has changed.  It&#8217;s very important that we don&#8217;t suddenly change these on users because these IDs are pivotal in trend analyses with mqd&#8217;s <code>--review-history</code> option. First some background info on the recent code changes and then the little story about how I tracked down the source of this change.</p>
<p>mqd internals used to run like this: call parser module (like SlowLogParser) and pass it an array of callbacks which it ran events through.  Now this has changed so there&#8217;s a single, unified pipeline of &#8220;callbacks&#8221; (they&#8217;re technically no longer callbacks).  The first process in the pipeline is usually a parser module which returns each event and then mqd keeps pumping the events through the pipeline (in contrast to before where the parser module did the pumping).  So &#8220;obviously&#8221; this has nothing to do with query fingerprinting or ID making which is done in code that has not changed.  Thus, this &#8220;bug&#8221; was very perplexing at first.</p>
<p>First step: see what value <code>make_checksum()</code>, which makes the query IDs, used to get and gets now by using the Perl debugger:<br />
<code><br />
DB<3> x $item<br />
0  'select count(*) as a from x '<br />
</code><br />
<code><br />
DB<12> x $item<br />
0  'select count(*) as a from x'<br />
</code><br />
The difference is that single trailing space. But why has this space suddenly disappeared in the new (later) rev? Something in <code>fingerprint()</code> must have changed, which is the sub that makes that query.  Use the debugger again to step through <code>fingerprint()</code> while a watch is set on the var:<br />
<code><br />
1574:	   $query =~ s/\A\s+//;                  # Chop off leading whitespace<br />
  DB<6><br />
Watchpoint 0:	$query changed:<br />
    old value:	' select count(*) as A from X<br />
'<br />
    new value:	'select count(*) as A from X<br />
'<br />
QueryRewriter::fingerprint(bin/mk-query-digest:1575):<br />
1575:	   chomp $query;                         # Kill trailing whitespace<br />
  DB<6><br />
QueryRewriter::fingerprint(bin/mk-query-digest:1576):<br />
1576:	   $query =~ tr[ \n\t\r\f][ ]s;          # Collapse whitespace<br />
  DB<6><br />
Watchpoint 0:	$query changed:<br />
    old value:	'select count(*) as A from X<br />
'<br />
    new value:	'select count(*) as A from X '<br />
</code><br />
Notice that the var did not change after the line &#8220;# Kill trailing whitespace&#8221; was executed.  The trailing newline was removed and reduced to a single trailing space when &#8220;# Collapse whitespace&#8221; was executed.  The new rev:<br />
<code><br />
1585:	   $query =~ s/\A\s+//;                  # Chop off leading whitespace<br />
  DB<4><br />
Watchpoint 0:	$query changed:<br />
    old value:	' select count(*) as A from X<br />
'<br />
    new value:	'select count(*) as A from X<br />
'<br />
QueryRewriter::fingerprint(../mk-query-digest:1586):<br />
1586:	   chomp $query;                         # Kill trailing whitespace<br />
  DB<4><br />
Watchpoint 0:	$query changed:<br />
    old value:	'select count(*) as A from X<br />
'<br />
    new value:	'select count(*) as A from X'<br />
</code><br />
Notice how <code>chomp</code> in the new rev removed all trailing whitespace; the result of <code>chomp</code> has changed, but why?  In case you didn&#8217;t know, <code>chomp</code> actually chomps any trailing <code>$INPUT_RECORD_SEPARATOR</code>, not just newlines.  It just so happens that many of the parser modules change <code>$INPUT_RECORD_SEPARATOR</code>.</p>
<p>The root of this subtle but very important change is due to the fact that the parser modules no longer call the pipeline callbacks.  When they did, their changes to <code>$INPUT_RECORD_SEPARATOR</code> were visible to the callbacks, and operations like <code>fingerprint()</code> are part of the callbacks.  Now that they do not, their changes to <code>$INPUT_RECORD_SEPARATOR</code> are &#8220;invisible&#8221; and operations like <code>fingerprint()</code> see a different (i.e. the default) <code>$INPUT_RECORD_SEPARATOR</code>.</p>
<p>Conclusion in brief: an issue of scope at the beginning of mk-query-digest affects <code>chomp</code> causing <code>fingerprint()</code> and <code>make_checksum()</code> to generate different query IDs at the end of the script.</p>
]]></content:encoded>
			<wfw:commentRss>http://hackmysql.com/blog/2009/11/18/debugging-and-ripple-effects/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Setting the MySQL Sandbox prompt</title>
		<link>http://hackmysql.com/blog/2009/11/15/setting-the-mysql-sandbox-prompt/</link>
		<comments>http://hackmysql.com/blog/2009/11/15/setting-the-mysql-sandbox-prompt/#comments</comments>
		<pubDate>Sun, 15 Nov 2009 20:49:50 +0000</pubDate>
		<dc:creator>Daniel Nichter</dc:creator>
				<category><![CDATA[Random]]></category>
		<category><![CDATA[MySQL Sandbox]]></category>

		<guid isPermaLink="false">http://hackmysql.com/blog/?p=74</guid>
		<description><![CDATA[This is far from deeply technical but little things that should be simple but aren&#8217;t annoy me.  I found that MySQL Sandbox --prompt_prefix and --prompt_body don&#8217;t &#8220;just work.&#8221;  I wanted the prompt to be mysql \v> .  So I tried:

make_sandbox_from_source /mysql/src/mysql-4.0.30 single --prompt_body=' \v> '
sh: -c: line 0: syntax error near unexpected [...]]]></description>
			<content:encoded><![CDATA[<p>This is far from deeply technical but little things that should be simple but aren&#8217;t annoy me.  I found that <a href="https://launchpad.net/mysql-sandbox">MySQL Sandbox</a> <code>--prompt_prefix</code> and <code>--prompt_body</code> don&#8217;t &#8220;just work.&#8221;  I wanted the prompt to be <code>mysql \v> </code>.  So I tried:<br />
<code><br />
make_sandbox_from_source /mysql/src/mysql-4.0.30 single --prompt_body=' \v> '<br />
sh: -c: line 0: syntax error near unexpected token `newline'<br />
sh: -c: line 0: `make_sandbox /mysql/src/mysql-4.0.30/4.0.30   --prompt_body= \v> '<br />
</code><br />
Maybe my shell knowledge is more terrible than I realize so I verified that &#8216;<code> \v> </code>&#8216; does not need special escaping:<br />
<code><br />
echo ' \v> '<br />
 \v><br />
</code><br />
Ok, so clearly it&#8217;s the fault of <code>make_sandbox_from_source</code>.  I tried and tried to do this on the command line but failed.  I decided to try <code>~/.msandboxrc</code>, specifying:<br />
<code><br />
prompt_prefix=mysql<br />
prompt_body= \v><br />
</code><br />
That didn&#8217;t work either; it created the prompt <code>'mysql4.0.30&gt;</code>.  The leading space for <code>prompt_body</code> is ignored and there&#8217;s an errant apostrophe at the start.  The final working version is:<br />
<code><br />
prompt_prefix=mysql<br />
prompt_body=\v> '<br />
</code><br />
You can&#8217;t see it but there&#8217;s a trailing space after &#8220;mysql&#8221; for the <code>prompt_prefix</code> which is not ignored (unlike leading space for the <code>prompt_body</code>).</p>
]]></content:encoded>
			<wfw:commentRss>http://hackmysql.com/blog/2009/11/15/setting-the-mysql-sandbox-prompt/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Zero is a big number</title>
		<link>http://hackmysql.com/blog/2009/10/30/zero-is-a-big-number/</link>
		<comments>http://hackmysql.com/blog/2009/10/30/zero-is-a-big-number/#comments</comments>
		<pubDate>Fri, 30 Oct 2009 15:20:30 +0000</pubDate>
		<dc:creator>Daniel Nichter</dc:creator>
				<category><![CDATA[Maatkit]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[mk-query-digest]]></category>

		<guid isPermaLink="false">http://hackmysql.com/blog/?p=70</guid>
		<description><![CDATA[I made changes to mk-query-digest yesterday that I didn&#8217;t expect to cause any adverse affects.  On the contrary, several tests began to fail because a single new but harmless line began to appear in the expected output: &#8220;Databases 0&#8243;.  Perhaps I&#8217;m preaching to the choir, as you are all fantastic, thorough and flawless [...]]]></description>
			<content:encoded><![CDATA[<p>I made changes to mk-query-digest yesterday that I didn&#8217;t expect to cause any adverse affects.  On the contrary, several tests began to fail because a single new but harmless line began to appear in the expected output: &#8220;Databases 0&#8243;.  Perhaps I&#8217;m preaching to the choir, as you are all fantastic, thorough and flawless programmers, but as for myself I&#8217;ve learned to never take a single failed test for granted.</p>
<p>One time a test failed because some values differed by a millisecond or two.  Being curious I investigated and found that our standard deviation equation was just shy of perfect.  I fixed it and spent hours cross-checking the myriad tiny values with my TI calculator. Probably no one cared about 0.023 vs. 0.022 but it&#8217;s the cultivation of a disposition towards perfection that matters.</p>
<p>My innocuous changes yesterday introduced a case of Perl auto-vivification.  Doing:</p>
<p><code>my ($db_for_show) = $sample->{db} ? $sample->{db} : keys %{$stats->{db}->{unq}};</code></p>
<p>can auto-vivify <code>$stats->{db}</code>.  Before yesterday, this was done before the report for those stats were printed; changes yesterday made this happen after the report.  Thus the report did it&#8217;s job and reported <code>db</code> or &#8220;Databases 0&#8243;.  It&#8217;s been fixed, and just in time since I&#8217;m doing Maatkit&#8217;s October release today.</p>
]]></content:encoded>
			<wfw:commentRss>http://hackmysql.com/blog/2009/10/30/zero-is-a-big-number/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL features timeline</title>
		<link>http://hackmysql.com/blog/2009/10/26/mysql-features-timeline/</link>
		<comments>http://hackmysql.com/blog/2009/10/26/mysql-features-timeline/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 00:28:47 +0000</pubDate>
		<dc:creator>Daniel Nichter</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[features]]></category>
		<category><![CDATA[timeline]]></category>

		<guid isPermaLink="false">http://hackmysql.com/blog/?p=68</guid>
		<description><![CDATA[I&#8217;ve begun a MySQL features timeline which is a quick reference showing as of what version MySQL features were added, changed or removed.  The manual tells us this, of course, but I wanted a quicker reference.  The list is far from complete as there&#8217;s a huge number of features to cover.  I&#8217;ll [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve begun a <a href="http://hackmysql.com/as-of">MySQL features timeline</a> which is a quick reference showing as of what version MySQL features were added, changed or removed.  The manual tells us this, of course, but I wanted a quicker reference.  The list is far from complete as there&#8217;s a huge number of features to cover.  I&#8217;ll continue to improve it and help is appreciated.  Send me a quick email saying &#8220;feature x added/removed/changed as of version y&#8221; and I&#8217;ll do the rest. &#8212; If someone has already done this, please give me the url so I don&#8217;t reinvent the wheel.</p>
]]></content:encoded>
			<wfw:commentRss>http://hackmysql.com/blog/2009/10/26/mysql-features-timeline/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>mk-table-sync and small tables</title>
		<link>http://hackmysql.com/blog/2009/10/23/mk-table-sync-and-small-tables/</link>
		<comments>http://hackmysql.com/blog/2009/10/23/mk-table-sync-and-small-tables/#comments</comments>
		<pubDate>Fri, 23 Oct 2009 20:32:24 +0000</pubDate>
		<dc:creator>Daniel Nichter</dc:creator>
				<category><![CDATA[Maatkit]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[mk-table-sync]]></category>

		<guid isPermaLink="false">http://hackmysql.com/blog/?p=56</guid>
		<description><![CDATA[Issue 634 made me wonder how the various mk-table-sync algorithms (Chunk, Nibble, GroupBy and Stream) perform when faced with a small number of rows.  So I ran some quick, basic benchmarks.
I used three tables, each with integer primary keys, having 109, 600 and 16k+ rows.  I did two runs for each of the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://code.google.com/p/maatkit/issues/detail?id=634">Issue 634</a> made me wonder how the various <a href="http://www.maatkit.org/doc/mk-table-sync.html">mk-table-sync</a> algorithms (Chunk, Nibble, GroupBy and Stream) perform when faced with a small number of rows.  So I ran some quick, basic benchmarks.</p>
<p>I used three tables, each with integer primary keys, having 109, 600 and 16k+ rows.  I did two runs for each of the four algorithms: the first run used an empty destination table so all rows from the source had to be synced; the second run used an already synced destination table so all rows had to be checked but none were synced.  I ran Perl with <a href="http://search.cpan.org/~ilyaz/DProf-19990108/dprofpp.PL">DProf</a> to get simple wallclock and user time measurements.</p>
<p>Here are the results for the first run:<br />
<img src="http://hackmysql.com/blog/wp-content/uploads/2009/10/mk-table-sync-small-tables-run-1.png" alt="mk-table-sync-small-tables-run-1" title="mk-table-sync-small-tables-run-1" width="668" height="870" class="alignnone size-full wp-image-57" /></p>
<p>When the table is really small (109 rows), there&#8217;s hardly any difference between the algorithms. As the table becomes larger, the GroupBy and Stream algorithms are much faster than the Chunk and Nibble algorithms.  This is actually expected, even though Chunk and Nibble are considered the best and fastest algorithms&#8211;see point 3 in the conclusion.</p>
<p>Now here&#8217;s the second run:<br />
<img src="http://hackmysql.com/blog/wp-content/uploads/2009/10/mk-table-sync-small-tables-run-2.png" alt="mk-table-sync-small-tables-run-2" title="mk-table-sync-small-tables-run-2" width="672" height="874" class="alignnone size-full wp-image-60" /></p>
<p>The small table is again roughly the same for all algorithms.  Stream is clearly the fastest but what&#8217;s more notable is that GroupBy and Nibble are nearly identical even though Nibble is tremendously more complex than GroupBy.  As the table becomes bigger (16k+ rows), mk-table-sync &#8220;conventional wisdom&#8221; is more clearly illustrated: Chunk and Nibble are extremely faster than GroupBy and Stream.</p>
<p>This was a very quick benchmarking job but from it we can draw some conclusions:</p>
<ol>
<li>There&#8217;s little difference between the algorithms when syncing small tables.</li>
<li>The GroupBy algorithm might be the best choice for small tables since it&#8217;s comparable to Chunk and Nibble but internally less complex (it doesn&#8217;t use checksums or crypto hashes for example).</li>
<li>If syncing to a destination that is missing a lot of rows, GroupBy and Stream can be much faster because Chunk and Nibble will waste a lot of time checksumming chunks which GroupBy and Stream do not do.</li>
<li>When syncing large tables or tables with few difference, conventional wisdom still holds: Chunk and Nibble are the best choices.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://hackmysql.com/blog/2009/10/23/mk-table-sync-and-small-tables/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Filtering and analyzing queries by year, month, hour and day with mk-query-digest</title>
		<link>http://hackmysql.com/blog/2009/10/22/filtering-and-analyzing-queries-by-year-month-hour-and-day-with-mk-query-digest/</link>
		<comments>http://hackmysql.com/blog/2009/10/22/filtering-and-analyzing-queries-by-year-month-hour-and-day-with-mk-query-digest/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 21:55:39 +0000</pubDate>
		<dc:creator>Daniel Nichter</dc:creator>
				<category><![CDATA[Maatkit]]></category>
		<category><![CDATA[mk-query-digest]]></category>

		<guid isPermaLink="false">http://hackmysql.com/blog/?p=45</guid>
		<description><![CDATA[I originally posted this on the Maatkit discussion list:
A little while ago a user asked in http://groups.google.com/group/maatkit-discuss/browse_thread/thread/256b6c780bdb066d if it was possible to use mk-query-digest to analyze queries per hour.  I responded with a skeleton script for use with &#8211;filter, but I didn&#8217;t actually test this.  Today, I filled out the script and tested [...]]]></description>
			<content:encoded><![CDATA[<p>I <a href="http://groups.google.com/group/maatkit-discuss/browse_thread/thread/8c8a5d65252efb4">originally posted</a> this on the <a href="http://groups.google.com/group/maatkit-discuss">Maatkit discussion list</a>:</p>
<p>A little while ago a user asked in http://groups.google.com/group/maatkit-discuss/browse_thread/thread/256b6c780bdb066d if it was possible to use mk-query-digest to analyze queries per hour.  I responded with a skeleton script for use with &#8211;filter, but I didn&#8217;t actually test this.  Today, I filled out the script and tested it and found that it works.  The script is available from trunk at:</p>
<p><a href="http://maatkit.googlecode.com/svn/trunk/mk-query-digest/t/samples/filter-add-ymdh-attribs.txt">http://maatkit.googlecode.com/svn/trunk/mk-query-digest/t/samples/filter-add-ymdh-attribs.txt<br />
</a></p>
<p>The test file I&#8217;m using is available at:</p>
<p><a href="http://maatkit.googlecode.com/svn/trunk/common/t/samples/binlog005.txt">http://maatkit.googlecode.com/svn/trunk/common/t/samples/binlog005.txt</a></p>
<p>The filter code does two things: it adds attributes called year, month, day and hour to each event, and it uses environment variables called YEAR, MONTH, DAY and HOUR to filter those newly added attributes.  I&#8217;ll show how this works later.</p>
<p>The filter works best with binary logs because binlogs reliably timestamp events.  If an event does not have a timestamp (as is often the case in a slowlog), then it gets values 0, 0, 0, 24 for year, month, day and hour respectively.  Since 0 is a valid hour, 24 is used to indicate that the event had no hour.</p>
<p>The basic usage is to group queries by hour.  Let&#8217;s say you want to see query stats for each hour.  The command line is:</p>
<p><code>mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour</code></p>
<p>Notice &#8220;&#8211;group-by hour&#8221;.  And the result is (truncated for brevity):</p>
<pre>
# ########################################################################
# Report grouped by hour
# ########################################################################

# Item 1: 1.50 QPS, 31.01kx concurrency, ID 0x0DB5E4B97FC2AF39 at byte 450
#              pct   total     min     max     avg     95%  stddev  median
# Count         30       3
# Exec time     30  62029s  20661s  20704s  20676s  19861s       0  19861s
# Time range 2007-12-07 13:02:08 to 2007-12-07 13:02:10
# bytes         23      81      27      27      27      27       0      27
# day           25      21       7       7       7       7       0       7
# error cod      0       0       0       0       0       0       0       0
# month         27      36      12      12      12      12       0      12
# year          27      21       7       7       7       7       0       7
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+  ################################################################
13
...
# Item 2: 0.00 QPS, 0.71x concurrency, ID 0xAA27A0C99BFF6710 at byte 301 _
#              pct   total     min     max     avg     95%  stddev  median
# Count         30       3
# Exec time     29  62000s  20661s  20675s  20667s  19861s       0  19861s
# Time range 2007-12-07 12:02:50 to 2007-12-08 12:12:12
# bytes         46     163      22      87   40.75   84.10   25.86   26.08
# day           37      30       7       8    7.50    7.70    0.36    7.70
# error cod      0       0       0       0       0       0       0       0
# month         36      48      12      12      12      12       0      12
# year          36      28       7       7       7       7       0       7
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+  ################################################################
12
...
# Rank Query ID           Response time    Calls   R/Call     Item
# ==== ================== ================ ======= ========== ====
#    1 0x                 62029.0000 30.0%       3 20676.3333 13
#    2 0x                 62000.0000 30.0%       3 20666.6667 12
#    3 0x                 20661.0000 10.0%       1 20661.0000 23
#    4 0x                 20661.0000 10.0%       1 20661.0000 10
#    5 0x                 20661.0000 10.0%       1 20661.0000 08
#    6 0x                 20661.0000 10.0%       1 20661.0000 18
</pre>
<p><code>&nbsp;</code><br />
Each item corresponds to the queries for that hour.  Shown above are hours 13 (1pm) and 12 (noon).  Then the profile gives you summarized information about each hour.  From this fake binlog we see that 30% of queries occurred in the noon hour.  (binlog005.txt is highly contrived; the values are just for demonstration.)</p>
<p>Unless your logs are rotated daily, chances are there will be noon-hour queries for multiple days.  If you want to see per-hour stats for one specific day, the filter can do this, too, by using environment variables.  Filter scripts were not originally meant to accept user input, and having to modify values in the actual code isn&#8217;t flexible, so the solution is to use environment variables.  Here&#8217;s how:</p>
<p><code>DAY=7 mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour</code></p>
<p>The leading &#8220;DAY=7&#8243; temporarily sets the environment variable DAY only during the execution of mk-query-digest.  This way you don&#8217;t pollute your normal environment variables.  The result is now (truncated again):</p>
<pre>
# Item 2: 0.00 QPS, 12.24x concurrency, ID 0xAA27A0C99BFF6710 at byte 301
# This item is included in the report because it matches --limit.
#              pct   total     min     max     avg     95%  stddev  median
# Count         28       2
# Exec time     28  41339s  20664s  20675s  20670s  20675s      8s  20670s
# Time range 2007-12-07 12:02:50 to 2007-12-07 12:59:07
# bytes         28      54      27      27      27      27       0      27
# day           28      14       7       7       7       7       0       7
# error cod      0       0       0       0       0       0       0       0
# month         28      24      12      12      12      12       0      12
# year          28      14       7       7       7       7       0       7
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+  ################################################################
12
</pre>
<p><code>&nbsp;</code><br />
Notice that there are now only 2 queries in the noon hour and that the time range is only in 2007-12-07.  Previously, there was a noon-hour query in 2007-12-08.  Thus we know that the DAY filter worked.</p>
<p>In this fashion, you can group and filter your log as you please.  You can combine multiple filters like:</p>
<p><code>DAY=7 HOUR=12 mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour</code></p>
<p>That will group and analyze only queries from the noon hour of the 7th (December 7, 2007 in this log).  mk-query-digest is so flexible you can even do this:</p>
<p><code>DAY=7 HOUR=12 mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour --no-report --print</code></p>
<p>That will suppress the query analysis and report and simply print all the queries from the noon hour of the 7th in pseudo-slowlog format.</p>
<p>There are, of course, other ways to do this kind of per-time-unit query aggregation, analysis and reporting (e.g. &#8211;since and &#8211;until), but if all you have are pre-existing logs and mk-query-digest, then &#8211;filter can be used to accomplish this task, too.</p>
]]></content:encoded>
			<wfw:commentRss>http://hackmysql.com/blog/2009/10/22/filtering-and-analyzing-queries-by-year-month-hour-and-day-with-mk-query-digest/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OpenSQL Camp 2009 in Portland, Oregon</title>
		<link>http://hackmysql.com/blog/2009/10/21/opensql-camp-2009-in-portland-oregon/</link>
		<comments>http://hackmysql.com/blog/2009/10/21/opensql-camp-2009-in-portland-oregon/#comments</comments>
		<pubDate>Wed, 21 Oct 2009 17:01:41 +0000</pubDate>
		<dc:creator>Daniel Nichter</dc:creator>
				<category><![CDATA[Random]]></category>

		<guid isPermaLink="false">http://hackmysql.com/blog/?p=43</guid>
		<description><![CDATA[There&#8217;s very few spots left for attending OpenSQL Camp 2009.  I&#8217;m going, along with several other Percona employees.  It&#8217;s fun stuff; you should go.  I&#8217;m not really the social type so I&#8217;ll probably be sitting off to the side somewhere coding.
]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s very few spots left for attending <a href="http://opensqlcamp.org/Events/Portland2009/">OpenSQL Camp 2009</a>.  I&#8217;m going, along with several other Percona employees.  It&#8217;s fun stuff; you should go.  I&#8217;m not really the social type so I&#8217;ll probably be sitting off to the side somewhere coding.</p>
]]></content:encoded>
			<wfw:commentRss>http://hackmysql.com/blog/2009/10/21/opensql-camp-2009-in-portland-oregon/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Work continues</title>
		<link>http://hackmysql.com/blog/2009/09/01/work-continues/</link>
		<comments>http://hackmysql.com/blog/2009/09/01/work-continues/#comments</comments>
		<pubDate>Wed, 02 Sep 2009 02:28:24 +0000</pubDate>
		<dc:creator>Daniel Nichter</dc:creator>
				<category><![CDATA[Site Maintenance]]></category>

		<guid isPermaLink="false">http://hackmysql.com/blog/?p=40</guid>
		<description><![CDATA[Believe it or not, I have not abandoned Hack MySQL.  I&#8217;ve just been totally consumed with Maatkit or life.  Slowly and surely, I&#8217;m re-structuring Hack MySQL.  I have &#8220;plans&#8221;; it&#8217;s just a matter of finding time to see them through to fruition.  Until then, the site is in disarray for a [...]]]></description>
			<content:encoded><![CDATA[<p>Believe it or not, I have not abandoned Hack MySQL.  I&#8217;ve just been totally consumed with <a href="http://www.maatkit.org">Maatkit</a> or life.  Slowly and surely, I&#8217;m re-structuring Hack MySQL.  I have &#8220;plans&#8221;; it&#8217;s just a matter of finding time to see them through to fruition.  Until then, the site is in disarray for a little while longer.</p>
]]></content:encoded>
			<wfw:commentRss>http://hackmysql.com/blog/2009/09/01/work-continues/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>hackmysql.com undergoing changes</title>
		<link>http://hackmysql.com/blog/2009/01/24/hackmysqlcom-undergoing-changes/</link>
		<comments>http://hackmysql.com/blog/2009/01/24/hackmysqlcom-undergoing-changes/#comments</comments>
		<pubDate>Sat, 24 Jan 2009 20:30:40 +0000</pubDate>
		<dc:creator>Daniel Nichter</dc:creator>
				<category><![CDATA[Site Maintenance]]></category>

		<guid isPermaLink="false">http://hackmysql.com/?p=33</guid>
		<description><![CDATA[The hackmysql.com website is undergoing changes. Stuff may disappear and/or be relocated at random.
]]></description>
			<content:encoded><![CDATA[<p>The hackmysql.com website is undergoing changes. Stuff may disappear and/or be relocated at random.</p>
]]></content:encoded>
			<wfw:commentRss>http://hackmysql.com/blog/2009/01/24/hackmysqlcom-undergoing-changes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
