Hack MySQL

Filtering and analyzing queries by year, month, hour and day with mk-query-digest

without comments

I originally posted this on the Maatkit discussion list:

A little while ago a user asked in http://groups.google.com/group/maatkit-discuss/browse_thread/thread/256b6c780bdb066d if it was possible to use mk-query-digest to analyze queries per hour. I responded with a skeleton script for use with –filter, but I didn’t actually test this. Today, I filled out the script and tested it and found that it works. The script is available from trunk at:

http://maatkit.googlecode.com/svn/trunk/mk-query-digest/t/samples/filter-add-ymdh-attribs.txt

The test file I’m using is available at:

http://maatkit.googlecode.com/svn/trunk/common/t/samples/binlog005.txt

The filter code does two things: it adds attributes called year, month, day and hour to each event, and it uses environment variables called YEAR, MONTH, DAY and HOUR to filter those newly added attributes. I’ll show how this works later.

The filter works best with binary logs because binlogs reliably timestamp events. If an event does not have a timestamp (as is often the case in a slowlog), then it gets values 0, 0, 0, 24 for year, month, day and hour respectively. Since 0 is a valid hour, 24 is used to indicate that the event had no hour.

The basic usage is to group queries by hour. Let’s say you want to see query stats for each hour. The command line is:

mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour

Notice “–group-by hour”. And the result is (truncated for brevity):

# ########################################################################
# Report grouped by hour
# ########################################################################

# Item 1: 1.50 QPS, 31.01kx concurrency, ID 0x0DB5E4B97FC2AF39 at byte 450
#              pct   total     min     max     avg     95%  stddev  median
# Count         30       3
# Exec time     30  62029s  20661s  20704s  20676s  19861s       0  19861s
# Time range 2007-12-07 13:02:08 to 2007-12-07 13:02:10
# bytes         23      81      27      27      27      27       0      27
# day           25      21       7       7       7       7       0       7
# error cod      0       0       0       0       0       0       0       0
# month         27      36      12      12      12      12       0      12
# year          27      21       7       7       7       7       0       7
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+  ################################################################
13
...
# Item 2: 0.00 QPS, 0.71x concurrency, ID 0xAA27A0C99BFF6710 at byte 301 _
#              pct   total     min     max     avg     95%  stddev  median
# Count         30       3
# Exec time     29  62000s  20661s  20675s  20667s  19861s       0  19861s
# Time range 2007-12-07 12:02:50 to 2007-12-08 12:12:12
# bytes         46     163      22      87   40.75   84.10   25.86   26.08
# day           37      30       7       8    7.50    7.70    0.36    7.70
# error cod      0       0       0       0       0       0       0       0
# month         36      48      12      12      12      12       0      12
# year          36      28       7       7       7       7       0       7
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+  ################################################################
12
...
# Rank Query ID           Response time    Calls   R/Call     Item
# ==== ================== ================ ======= ========== ====
#    1 0x                 62029.0000 30.0%       3 20676.3333 13
#    2 0x                 62000.0000 30.0%       3 20666.6667 12
#    3 0x                 20661.0000 10.0%       1 20661.0000 23
#    4 0x                 20661.0000 10.0%       1 20661.0000 10
#    5 0x                 20661.0000 10.0%       1 20661.0000 08
#    6 0x                 20661.0000 10.0%       1 20661.0000 18

 
Each item corresponds to the queries for that hour. Shown above are hours 13 (1pm) and 12 (noon). Then the profile gives you summarized information about each hour. From this fake binlog we see that 30% of queries occurred in the noon hour. (binlog005.txt is highly contrived; the values are just for demonstration.)

Unless your logs are rotated daily, chances are there will be noon-hour queries for multiple days. If you want to see per-hour stats for one specific day, the filter can do this, too, by using environment variables. Filter scripts were not originally meant to accept user input, and having to modify values in the actual code isn’t flexible, so the solution is to use environment variables. Here’s how:

DAY=7 mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour

The leading “DAY=7″ temporarily sets the environment variable DAY only during the execution of mk-query-digest. This way you don’t pollute your normal environment variables. The result is now (truncated again):

# Item 2: 0.00 QPS, 12.24x concurrency, ID 0xAA27A0C99BFF6710 at byte 301
# This item is included in the report because it matches --limit.
#              pct   total     min     max     avg     95%  stddev  median
# Count         28       2
# Exec time     28  41339s  20664s  20675s  20670s  20675s      8s  20670s
# Time range 2007-12-07 12:02:50 to 2007-12-07 12:59:07
# bytes         28      54      27      27      27      27       0      27
# day           28      14       7       7       7       7       0       7
# error cod      0       0       0       0       0       0       0       0
# month         28      24      12      12      12      12       0      12
# year          28      14       7       7       7       7       0       7
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+  ################################################################
12

 
Notice that there are now only 2 queries in the noon hour and that the time range is only in 2007-12-07. Previously, there was a noon-hour query in 2007-12-08. Thus we know that the DAY filter worked.

In this fashion, you can group and filter your log as you please. You can combine multiple filters like:

DAY=7 HOUR=12 mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour

That will group and analyze only queries from the noon hour of the 7th (December 7, 2007 in this log). mk-query-digest is so flexible you can even do this:

DAY=7 HOUR=12 mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour --no-report --print

That will suppress the query analysis and report and simply print all the queries from the noon hour of the 7th in pseudo-slowlog format.

There are, of course, other ways to do this kind of per-time-unit query aggregation, analysis and reporting (e.g. –since and –until), but if all you have are pre-existing logs and mk-query-digest, then –filter can be used to accomplish this task, too.

Written by Daniel Nichter

October 22nd, 2009 at 3:55 pm

Posted in Maatkit

Tagged with

OpenSQL Camp 2009 in Portland, Oregon

without comments

There’s very few spots left for attending OpenSQL Camp 2009. I’m going, along with several other Percona employees. It’s fun stuff; you should go. I’m not really the social type so I’ll probably be sitting off to the side somewhere coding.

Written by Daniel Nichter

October 21st, 2009 at 11:01 am

Posted in Random

Work continues

without comments

Believe it or not, I have not abandoned Hack MySQL. I’ve just been totally consumed with Maatkit or life. Slowly and surely, I’m re-structuring Hack MySQL. I have “plans”; it’s just a matter of finding time to see them through to fruition. Until then, the site is in disarray for a little while longer.

Written by Daniel Nichter

September 1st, 2009 at 8:28 pm

Posted in Site Maintenance

hackmysql.com undergoing changes

without comments

The hackmysql.com website is undergoing changes. Stuff may disappear and/or be relocated at random.

Written by Daniel Nichter

January 24th, 2009 at 1:30 pm

Posted in Site Maintenance

Hack MySQL status update

without comments

I haven’t posted any update in 2 months not because I haven’t been doing anything but rather the opposite: I have been exceptionally busy. As Baron has noted, I work with Percona doing the maintenance and development for Maatkit, of which release 2325 was just published. Consequently, I haven’t been working on my own scripts, though hopefully that will change, and in some ways it must. Here’s what I have in mind for my own scripts:

mysqlsla: I’m going to create a hybrid between this and mk-log-parser. Whereas mysqlsla does some things that mk-log-parser doesn’t, mk-log-parser has a better, more maintainable code base. Furthermore, I’m currently working on other log handling code which will affect some future plans I had for mysqlsla such as grouping results and filtering by timestamp.

mysqlreport: Apparently, it’s “bug-free” because no one has reported a single problem with it in so long that I couldn’t even tell you when it was last released. Regardless, I have plans for this, too, such as better interaction with other scripts by decoupling the code, data and presentation further.

mysqlidxchk: This has needed serious work since its release which wasn’t possible for various “technical” reasons. But I can fix it now and make it better, more helpful.

In general, I’m also moving my entire code base to Google Code, too, since I’ll be re-using some Maatkit common modules in my own scripts. Plus, that will make it easier to track issues and changes for me and you.

Written by Daniel Nichter

September 19th, 2008 at 1:03 pm