PT BOF at PLMCE 2012
PT BOF at PLMCE 2012, or: I submitted a session for Percona Toolkit Birds of a Feather at Percona Live MySQL Conference & Expo 2012. You should BoF too; hard deadline is Monday, March 12th: submit a BoF session, or submit a Lightning Talk.
Speaking at Percona Live MySQL Conference & Expo 2012
I’m speaking at Percona Live MySQL Conference & Expo 2012. My two talk are: Getting Started with Drizzle 7.1 and Verifying MySQL Replication Safely With pt-table-checksum 2.0. No, there’s no relationship between those topics; they’re just things I know well.
I’ve been stalking Drizzle for many years. When it went GA last year, I began hacking Drizzle, focusing on plugins which give it nearly all its functionality. Recently, I helped overhaul the configuration, administration, and plugin sections of the Drizzle docs. I’m also frequently poking around the plugins’ source code. Consequently, I know a lot about making Drizzle work. My talk with transfer the bulk of the best of that knowledge to you so that you can return to your place of work/hobby/world domination and start using Drizzle 7.1 yourself without having to resolve some of the mysteries I had to resolve by reading the source code.
As for pt-table-checksum 2.0, part of Percona Toolkit, it’s a complete re-write of the venerable pt-table-checksum 1.0 which worked very well for years but required some fine-tuning. Well, in certain cases it didn’t work as well, which required more fine-tuning. So Baron redesigned the entire tool (with help and feedback from a lot of people at Percona), and I programmed it. It works wonderfully, and most of the time you can “just run it” and it will Just Do The Right Thing, but you will nonetheless benefit if you come to this talk and poke and prod its internal with me.
If you’ve never been to this conference before (formerly it was just the “MySQL Conference & Expo”, but it’s always been, afaik, at this time and place), you should really come because it’s very enlightening. It’s the Super Bowl/World Cup/Wimbledon/etc. of MySQL conferences in my humble opinion.
Databases and documentation
The MySQL documentation impresses me. I can’t recall a time when I didn’t find what I was looking for or that its details were lacking. Documenting a database server is difficult; the amount of information to organize and communicate clearly is staggering. I tasted that challenge when I helped update the Drizzle documentation, which was recently updated online.
I think Drizzle will become be a serious alternative or compliment to MySQL. Imho, Drizzle was until now held back by its documentation, or lack thereof, which created an insurmountable stumbling block for even clever DBAs. For example, Drizzle is largely comprised of plugins, but none of those plugins were documented until now.
I use the MySQL documentation as a goal, that to which the Drizzle documentation strives, not necessarily in style but in completeness. Although Drizzle began life as a fork of MySQL, it has evolved to the point where its own success will be, imho, largely dependent upon the completeness of its own documentation.
New algorithm for calculating 95 percentile
The 95 percentile for query response times is and old concept; Peter and Roland blogged about it in 2008. Since then, MySQL tools have calculated the 95 percentile by collecting all values, either exactly or approximately, and returning all_values[int(number_of_values * 0.95)] (that’s an extreme simplification). But recently I asked myself*: must we save all values? The answer is no. I created a new algorithm** for calculating the 95 percentile that is faster, more accurate, and saves only 100 values.***
Firstly, my basis of comparison is the 95 percentile algo used by mk-query-digest. That algo is fast, memory-stable, and very proven in the real world. It works well for any number of values, even hundreds of thousands of values. It saves all values by using base 1.05 buckets and counting the number of values that fall within the range of each bucket. The results are not exact, but the differences are negligible because a 10ms and 13ms response time are indiscernible to a human. Any algo that hopes to handle very large numbers of values must approximate because not even C can store and sort hundreds of thousands of floats (times N many attributes times N many query classes) quickly enough.
So when I finished the new algo, I compared it to the mk-query-digest algo and obtained the following results:
FILE REAL_95 OLD_95 NEW_95 OLD_DIFF NEW_DIFF OLD_TIME NEW_TIME FILE_SIZE OLD_RSS NEW_RSS nums/500k-1-or-2 1.751 1.697 1.784 -0.054 +0.033 12.12 9.37 4500000 3.88M 2.63M nums/100k-1-or-2 1.749 1.697 1.794 -0.052 +0.045 2.42 1.88 900000 3.88M 2.63M nums/50k-trend-1-to-9 6.931 6.652 6.995 -0.279 +0.064 1.24 0.90 450000 3.88M 2.63M nums/25k-trend-1-to-5 3.888 3.704 3.988 -0.184 +0.100 0.64 0.47 225000 3.88M 2.63M nums/21k-1-spike5-1 0.997 0.992 2.002 -0.005 +1.005 0.55 0.42 189000 3.88M 2.63M nums/10k-rand-0-to-20 19.048 18.532 19.054 -0.516 +0.006 0.29 0.21 95079 3.86M 2.62M nums/10k-rand-0-to-10 9.511 9.360 9.525 -0.151 +0.014 0.29 0.21 90000 3.86M 2.62M nums/4k-trend-1-to-7 5.594 5.473 6.213 -0.121 +0.619 0.14 0.09 36000 3.86M 2.63M nums/1k-sub-sec 0.941 0.900 0.951 -0.041 +0.010 0.07 0.04 9000 3.80M 2.62M nums/400-half-10 10.271 9.828 10.273 -0.443 +0.002 0.05 0.03 3800 3.79M 2.62M nums/400-high-low 10.446 10.319 10.446 -0.127 0 0.05 0.03 3800 3.79M 2.62M nums/400-low-high 10.445 10.319 10.475 -0.126 +0.030 0.05 0.03 3800 3.79M 2.63M nums/400-quarter-10 10.254 9.828 10.254 -0.426 0 0.06 0.03 3700 3.79M 2.62M nums/153-bias-50 88.523 88.305 88.523 -0.218 0 0.05 0.03 1500 3.79M 2.62M nums/100-rand-0-to-100 90.491 88.305 90.491 -2.186 0 0.05 0.03 991 3.79M 2.62M nums/105-ats 42.000 42.000 42.000 0 0 0.05 0.03 315 3.75M 2.61M nums/20 19.000 18.532 19.000 -0.468 0 0.04 0.03 51 3.79M 2.62M nums/1 42.000 42.000 42.000 0 0 0.04 0.03 3 3.75M 2.61M
I generated random microsecond values in various files. The first number of the filename indicates the number of values. So the first file has 500k values. The remaining part of the filename hints at the distribution of the values. For example, “50k-trend-1-to-9″ mean 50k values that increase from about 1 second to 9 seconds. Number and distribution of values affects 95 percentile algorithms, so I wanted to simulate several possible combinations.
“REAL_95″ is the real, exact 95 percentile; this is the control by which the “old” (i.e. the mk-query-digest) and new algos are compared. The diffs are comparisons to this control.
Each algo was timed and its memory (rss) measured, too. The time and memory comparisons are a little bias because the mk-query-digest module that implements its 95 percentile algo does more than my test script for the new algo.
The results show that the new algo is about 20% faster in all cases and more accurate in all but one case (“21k-1-spike5-1″). Also, the new algo uses less memory, but again this is a little bias; the important point is that it doesn’t use more memory to get its speed or accuracy increase.
The gains of the new algo are small in these comparisons, but I suspect they’ll be much larger given that the algo is used at least twice for each query. So saving 1 second in the algo can save minutes in data processing when there’s tens of thousands of queries.
Instead of explaining the algorithm exhaustively, I have upload all my code and data so you can reproduce the results on your machine: new-95-percentile-algo.tar.gz. You’ll need to checkout Maatkit, tweak the “require” lines in the Perl files, and tweak the Bash script (cmp-algos.sh), but otherwise I think the experiment should be straight forward. The new algo is in new-algo.pl. (new-algo.py is for another blog post.)
My ulterior motive for this blog post is to get feedback. Is the algorithm sane? Is there a critical flaw that I overlooked? Do you have a real-world example that doesn’t work well? If you’re intrepid or just curious and actually study the algo and have questions, feel free to contact me.
* By “recently asked myself” I mean that some time ago Baron and I wondered if it was possible to calculate 95 percentile without saving all values. At that time, I didn’t think it was feasible, but lately I thought and coded more about the problem.
** By “a new algorithm” I doubt that this has never been attempted or coded before, but I can’t find any examples of a similar algorithm.
*** By “saves only 100 values” I mean ultimately. At certain times, 150 values may be saved, but eventually the extra 50 should be integrated back into the base 100 values.
MySQL tools article
Although I no longer maintain hackmysql.com, I still actively develop MySQL tools. An article I wrote, 10 essential MySQL tools for admin, was published today. I hope no one’s feelings are hurt if their tool isn’t in the list, but it was rather difficult to compile the list given that so many tools are either not actively developed, not tested, or not well documented. Given lag time between writing and publishing, I was not able to write about Yoshinori’s MHA or newer tools. And given length constraints, I was not able to write about more tools. In any case, the world of MySQL tools is alive and well.