Archive for the ‘mk-query-digest’ tag
Debugging and ripple effects
Like I said earlier, every tiny change that the test suite reveals after code changes is significant. I caught a very subtle “bug” today in recent changes to mk-query-digest (a.k.a. mqd). If you like to read about subtle bugs, read on.
An mqd test on sample file slow023.txt began to differ after some pretty extensive code changes of late:
< # Query 1: 0 QPS, 0x concurrency, ID 0x8E38374648788E52 at byte 0 ________
---
> # Query 1: 0 QPS, 0x concurrency, ID 0x2CFD93750B99C734 at byte 0 ________
The ID which depends on the query’s fingerprint has changed. It’s very important that we don’t suddenly change these on users because these IDs are pivotal in trend analyses with mqd’s --review-history option. First some background info on the recent code changes and then the little story about how I tracked down the source of this change.
mqd internals used to run like this: call parser module (like SlowLogParser) and pass it an array of callbacks which it ran events through. Now this has changed so there’s a single, unified pipeline of “callbacks” (they’re technically no longer callbacks). The first process in the pipeline is usually a parser module which returns each event and then mqd keeps pumping the events through the pipeline (in contrast to before where the parser module did the pumping). So “obviously” this has nothing to do with query fingerprinting or ID making which is done in code that has not changed. Thus, this “bug” was very perplexing at first.
First step: see what value make_checksum(), which makes the query IDs, used to get and gets now by using the Perl debugger:
DB<3> x $item
0 'select count(*) as a from x '
DB<12> x $item
0 'select count(*) as a from x'
The difference is that single trailing space. But why has this space suddenly disappeared in the new (later) rev? Something in fingerprint() must have changed, which is the sub that makes that query. Use the debugger again to step through fingerprint() while a watch is set on the var:
1574: $query =~ s/\A\s+//; # Chop off leading whitespace
DB<6>
Watchpoint 0: $query changed:
old value: ' select count(*) as A from X
'
new value: 'select count(*) as A from X
'
QueryRewriter::fingerprint(bin/mk-query-digest:1575):
1575: chomp $query; # Kill trailing whitespace
DB<6>
QueryRewriter::fingerprint(bin/mk-query-digest:1576):
1576: $query =~ tr[ \n\t\r\f][ ]s; # Collapse whitespace
DB<6>
Watchpoint 0: $query changed:
old value: 'select count(*) as A from X
'
new value: 'select count(*) as A from X '
Notice that the var did not change after the line “# Kill trailing whitespace” was executed. The trailing newline was removed and reduced to a single trailing space when “# Collapse whitespace” was executed. The new rev:
1585: $query =~ s/\A\s+//; # Chop off leading whitespace
DB<4>
Watchpoint 0: $query changed:
old value: ' select count(*) as A from X
'
new value: 'select count(*) as A from X
'
QueryRewriter::fingerprint(../mk-query-digest:1586):
1586: chomp $query; # Kill trailing whitespace
DB<4>
Watchpoint 0: $query changed:
old value: 'select count(*) as A from X
'
new value: 'select count(*) as A from X'
Notice how chomp in the new rev removed all trailing whitespace; the result of chomp has changed, but why? In case you didn’t know, chomp actually chomps any trailing $INPUT_RECORD_SEPARATOR, not just newlines. It just so happens that many of the parser modules change $INPUT_RECORD_SEPARATOR.
The root of this subtle but very important change is due to the fact that the parser modules no longer call the pipeline callbacks. When they did, their changes to $INPUT_RECORD_SEPARATOR were visible to the callbacks, and operations like fingerprint() are part of the callbacks. Now that they do not, their changes to $INPUT_RECORD_SEPARATOR are “invisible” and operations like fingerprint() see a different (i.e. the default) $INPUT_RECORD_SEPARATOR.
Conclusion in brief: an issue of scope at the beginning of mk-query-digest affects chomp causing fingerprint() and make_checksum() to generate different query IDs at the end of the script.
Zero is a big number
I made changes to mk-query-digest yesterday that I didn’t expect to cause any adverse affects. On the contrary, several tests began to fail because a single new but harmless line began to appear in the expected output: “Databases 0″. Perhaps I’m preaching to the choir, as you are all fantastic, thorough and flawless programmers, but as for myself I’ve learned to never take a single failed test for granted.
One time a test failed because some values differed by a millisecond or two. Being curious I investigated and found that our standard deviation equation was just shy of perfect. I fixed it and spent hours cross-checking the myriad tiny values with my TI calculator. Probably no one cared about 0.023 vs. 0.022 but it’s the cultivation of a disposition towards perfection that matters.
My innocuous changes yesterday introduced a case of Perl auto-vivification. Doing:
my ($db_for_show) = $sample->{db} ? $sample->{db} : keys %{$stats->{db}->{unq}};
can auto-vivify $stats->{db}. Before yesterday, this was done before the report for those stats were printed; changes yesterday made this happen after the report. Thus the report did it’s job and reported db or “Databases 0″. It’s been fixed, and just in time since I’m doing Maatkit’s October release today.
Filtering and analyzing queries by year, month, hour and day with mk-query-digest
I originally posted this on the Maatkit discussion list:
A little while ago a user asked in http://groups.google.com/group/maatkit-discuss/browse_thread/thread/256b6c780bdb066d if it was possible to use mk-query-digest to analyze queries per hour. I responded with a skeleton script for use with –filter, but I didn’t actually test this. Today, I filled out the script and tested it and found that it works. The script is available from trunk at:
http://maatkit.googlecode.com/svn/trunk/mk-query-digest/t/samples/filter-add-ymdh-attribs.txt
The test file I’m using is available at:
http://maatkit.googlecode.com/svn/trunk/common/t/samples/binlog005.txt
The filter code does two things: it adds attributes called year, month, day and hour to each event, and it uses environment variables called YEAR, MONTH, DAY and HOUR to filter those newly added attributes. I’ll show how this works later.
The filter works best with binary logs because binlogs reliably timestamp events. If an event does not have a timestamp (as is often the case in a slowlog), then it gets values 0, 0, 0, 24 for year, month, day and hour respectively. Since 0 is a valid hour, 24 is used to indicate that the event had no hour.
The basic usage is to group queries by hour. Let’s say you want to see query stats for each hour. The command line is:
mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour
Notice “–group-by hour”. And the result is (truncated for brevity):
# ######################################################################## # Report grouped by hour # ######################################################################## # Item 1: 1.50 QPS, 31.01kx concurrency, ID 0x0DB5E4B97FC2AF39 at byte 450 # pct total min max avg 95% stddev median # Count 30 3 # Exec time 30 62029s 20661s 20704s 20676s 19861s 0 19861s # Time range 2007-12-07 13:02:08 to 2007-12-07 13:02:10 # bytes 23 81 27 27 27 27 0 27 # day 25 21 7 7 7 7 0 7 # error cod 0 0 0 0 0 0 0 0 # month 27 36 12 12 12 12 0 12 # year 27 21 7 7 7 7 0 7 # Query_time distribution # 1us # 10us # 100us # 1ms # 10ms # 100ms # 1s # 10s+ ################################################################ 13 ... # Item 2: 0.00 QPS, 0.71x concurrency, ID 0xAA27A0C99BFF6710 at byte 301 _ # pct total min max avg 95% stddev median # Count 30 3 # Exec time 29 62000s 20661s 20675s 20667s 19861s 0 19861s # Time range 2007-12-07 12:02:50 to 2007-12-08 12:12:12 # bytes 46 163 22 87 40.75 84.10 25.86 26.08 # day 37 30 7 8 7.50 7.70 0.36 7.70 # error cod 0 0 0 0 0 0 0 0 # month 36 48 12 12 12 12 0 12 # year 36 28 7 7 7 7 0 7 # Query_time distribution # 1us # 10us # 100us # 1ms # 10ms # 100ms # 1s # 10s+ ################################################################ 12 ... # Rank Query ID Response time Calls R/Call Item # ==== ================== ================ ======= ========== ==== # 1 0x 62029.0000 30.0% 3 20676.3333 13 # 2 0x 62000.0000 30.0% 3 20666.6667 12 # 3 0x 20661.0000 10.0% 1 20661.0000 23 # 4 0x 20661.0000 10.0% 1 20661.0000 10 # 5 0x 20661.0000 10.0% 1 20661.0000 08 # 6 0x 20661.0000 10.0% 1 20661.0000 18
Each item corresponds to the queries for that hour. Shown above are hours 13 (1pm) and 12 (noon). Then the profile gives you summarized information about each hour. From this fake binlog we see that 30% of queries occurred in the noon hour. (binlog005.txt is highly contrived; the values are just for demonstration.)
Unless your logs are rotated daily, chances are there will be noon-hour queries for multiple days. If you want to see per-hour stats for one specific day, the filter can do this, too, by using environment variables. Filter scripts were not originally meant to accept user input, and having to modify values in the actual code isn’t flexible, so the solution is to use environment variables. Here’s how:
DAY=7 mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour
The leading “DAY=7″ temporarily sets the environment variable DAY only during the execution of mk-query-digest. This way you don’t pollute your normal environment variables. The result is now (truncated again):
# Item 2: 0.00 QPS, 12.24x concurrency, ID 0xAA27A0C99BFF6710 at byte 301 # This item is included in the report because it matches --limit. # pct total min max avg 95% stddev median # Count 28 2 # Exec time 28 41339s 20664s 20675s 20670s 20675s 8s 20670s # Time range 2007-12-07 12:02:50 to 2007-12-07 12:59:07 # bytes 28 54 27 27 27 27 0 27 # day 28 14 7 7 7 7 0 7 # error cod 0 0 0 0 0 0 0 0 # month 28 24 12 12 12 12 0 12 # year 28 14 7 7 7 7 0 7 # Query_time distribution # 1us # 10us # 100us # 1ms # 10ms # 100ms # 1s # 10s+ ################################################################ 12
Notice that there are now only 2 queries in the noon hour and that the time range is only in 2007-12-07. Previously, there was a noon-hour query in 2007-12-08. Thus we know that the DAY filter worked.
In this fashion, you can group and filter your log as you please. You can combine multiple filters like:
DAY=7 HOUR=12 mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour
That will group and analyze only queries from the noon hour of the 7th (December 7, 2007 in this log). mk-query-digest is so flexible you can even do this:
DAY=7 HOUR=12 mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-attribs.txt --group-by hour --no-report --print
That will suppress the query analysis and report and simply print all the queries from the noon hour of the 7th in pseudo-slowlog format.
There are, of course, other ways to do this kind of per-time-unit query aggregation, analysis and reporting (e.g. –since and –until), but if all you have are pre-existing logs and mk-query-digest, then –filter can be used to accomplish this task, too.