Hack MySQL

New algorithm for calculating 95 percentile

with 4 comments

The 95 percentile for query response times is and old concept; Peter and Roland blogged about it in 2008. Since then, MySQL tools have calculated the 95 percentile by collecting all values, either exactly or approximately, and returning all_values[int(number_of_values * 0.95)] (that’s an extreme simplification). But recently I asked myself*: must we save all values? The answer is no. I created a new algorithm** for calculating the 95 percentile that is faster, more accurate, and saves only 100 values.***

Firstly, my basis of comparison is the 95 percentile algo used by mk-query-digest. That algo is fast, memory-stable, and very proven in the real world. It works well for any number of values, even hundreds of thousands of values. It saves all values by using base 1.05 buckets and counting the number of values that fall within the range of each bucket. The results are not exact, but the differences are negligible because a 10ms and 13ms response time are indiscernible to a human. Any algo that hopes to handle very large numbers of values must approximate because not even C can store and sort hundreds of thousands of floats (times N many attributes times N many query classes) quickly enough.

So when I finished the new algo, I compared it to the mk-query-digest algo and obtained the following results:

FILE                         REAL_95     OLD_95     NEW_95  OLD_DIFF NEW_DIFF  OLD_TIME NEW_TIME   FILE_SIZE  OLD_RSS  NEW_RSS
nums/500k-1-or-2               1.751      1.697      1.784    -0.054   +0.033     12.12     9.37     4500000    3.88M    2.63M
nums/100k-1-or-2               1.749      1.697      1.794    -0.052   +0.045      2.42     1.88      900000    3.88M    2.63M
nums/50k-trend-1-to-9          6.931      6.652      6.995    -0.279   +0.064      1.24     0.90      450000    3.88M    2.63M
nums/25k-trend-1-to-5          3.888      3.704      3.988    -0.184   +0.100      0.64     0.47      225000    3.88M    2.63M
nums/21k-1-spike5-1            0.997      0.992      2.002    -0.005   +1.005      0.55     0.42      189000    3.88M    2.63M
nums/10k-rand-0-to-20         19.048     18.532     19.054    -0.516   +0.006      0.29     0.21       95079    3.86M    2.62M
nums/10k-rand-0-to-10          9.511      9.360      9.525    -0.151   +0.014      0.29     0.21       90000    3.86M    2.62M
nums/4k-trend-1-to-7           5.594      5.473      6.213    -0.121   +0.619      0.14     0.09       36000    3.86M    2.63M
nums/1k-sub-sec                0.941      0.900      0.951    -0.041   +0.010      0.07     0.04        9000    3.80M    2.62M
nums/400-half-10              10.271      9.828     10.273    -0.443   +0.002      0.05     0.03        3800    3.79M    2.62M
nums/400-high-low             10.446     10.319     10.446    -0.127        0      0.05     0.03        3800    3.79M    2.62M
nums/400-low-high             10.445     10.319     10.475    -0.126   +0.030      0.05     0.03        3800    3.79M    2.63M
nums/400-quarter-10           10.254      9.828     10.254    -0.426        0      0.06     0.03        3700    3.79M    2.62M
nums/153-bias-50              88.523     88.305     88.523    -0.218        0      0.05     0.03        1500    3.79M    2.62M
nums/100-rand-0-to-100        90.491     88.305     90.491    -2.186        0      0.05     0.03         991    3.79M    2.62M
nums/105-ats                  42.000     42.000     42.000         0        0      0.05     0.03         315    3.75M    2.61M
nums/20                       19.000     18.532     19.000    -0.468        0      0.04     0.03          51    3.79M    2.62M
nums/1                        42.000     42.000     42.000         0        0      0.04     0.03           3    3.75M    2.61M

I generated random microsecond values in various files. The first number of the filename indicates the number of values. So the first file has 500k values. The remaining part of the filename hints at the distribution of the values. For example, “50k-trend-1-to-9″ mean 50k values that increase from about 1 second to 9 seconds. Number and distribution of values affects 95 percentile algorithms, so I wanted to simulate several possible combinations.

“REAL_95″ is the real, exact 95 percentile; this is the control by which the “old” (i.e. the mk-query-digest) and new algos are compared. The diffs are comparisons to this control.

Each algo was timed and its memory (rss) measured, too. The time and memory comparisons are a little bias because the mk-query-digest module that implements its 95 percentile algo does more than my test script for the new algo.

The results show that the new algo is about 20% faster in all cases and more accurate in all but one case (“21k-1-spike5-1″). Also, the new algo uses less memory, but again this is a little bias; the important point is that it doesn’t use more memory to get its speed or accuracy increase.

The gains of the new algo are small in these comparisons, but I suspect they’ll be much larger given that the algo is used at least twice for each query. So saving 1 second in the algo can save minutes in data processing when there’s tens of thousands of queries.

Instead of explaining the algorithm exhaustively, I have upload all my code and data so you can reproduce the results on your machine: new-95-percentile-algo.tar.gz. You’ll need to checkout Maatkit, tweak the “require” lines in the Perl files, and tweak the Bash script (cmp-algos.sh), but otherwise I think the experiment should be straight forward. The new algo is in new-algo.pl. (new-algo.py is for another blog post.)

My ulterior motive for this blog post is to get feedback. Is the algorithm sane? Is there a critical flaw that I overlooked? Do you have a real-world example that doesn’t work well? If you’re intrepid or just curious and actually study the algo and have questions, feel free to contact me.

* By “recently asked myself” I mean that some time ago Baron and I wondered if it was possible to calculate 95 percentile without saving all values. At that time, I didn’t think it was feasible, but lately I thought and coded more about the problem.

** By “a new algorithm” I doubt that this has never been attempted or coded before, but I can’t find any examples of a similar algorithm.

*** By “saves only 100 values” I mean ultimately. At certain times, 150 values may be saved, but eventually the extra 50 should be integrated back into the base 100 values.

Written by Daniel Nichter

August 29th, 2011 at 9:42 pm

Posted in Programming

Tagged with , ,

4 Responses to 'New algorithm for calculating 95 percentile'

Subscribe to comments with RSS or TrackBack to 'New algorithm for calculating 95 percentile'.

  1. Hi Daniel!

    wow – this is great, very interesting stuff :)

    I’m now downloading your code to check out the algorithm – I’m really curious how it works.

    Oh, to get a quick grasp of the result, I calculated the correlation between (REAL_95,OLD_95) and (REAL_95,NEW_95), and the results are 0,9998926355 and 0,9999572588 respectively – I think it’s safe to say both are accurate enough, and it seels the new one is even a bit more accurate than the old one (for this dataset).

    Exciting :)

    Roland Bouman

    30 Aug 11 at 6:37 AM

  2. Is there a reason or conjecture to why the 21k-1-spike5-1 test was so far off?

    Justin Rovang

    30 Aug 11 at 7:46 AM

  3. Thanks for taking a look Roland. :-)

    Daniel Nichter

    30 Aug 11 at 8:06 AM

  4. Justin, I think 21k-1-spike5-1 is off because it’s a perfect edge case. As the filename kind of suggests, the values are:

    10k @ 1s
    1k @ 5s
    10k @ 1s

    So the 5s spike values are exactly 5% of total values (1k/20k). That make the 95%/5% edge something like 1s/5s. But the 5s values aren’t all 5s, some are 2, 3, or 4. So the edge is more like 1s/2s. Then (I should have probably made this explicit in the post), 95% here is actually “95% of values less than or equal to X” whereas technically a percentile should be just “less than X”. Using <= is inherited from mk-query-digest, but perhaps I should re-examine if that's correct.

    Daniel Nichter

    30 Aug 11 at 8:21 AM

Leave a Reply