Hack MySQL

Archive for the ‘benchmark’ tag

mk-table-sync and small tables

without comments

Issue 634 made me wonder how the various mk-table-sync algorithms (Chunk, Nibble, GroupBy and Stream) perform when faced with a small number of rows. So I ran some quick, basic benchmarks.

I used three tables, each with integer primary keys, having 109, 600 and 16k+ rows. I did two runs for each of the four algorithms: the first run used an empty destination table so all rows from the source had to be synced; the second run used an already synced destination table so all rows had to be checked but none were synced. I ran Perl with DProf to get simple wallclock and user time measurements.

Here are the results for the first run:

When the table is really small (109 rows), there’s hardly any difference between the algorithms. As the table becomes larger, the GroupBy and Stream algorithms are much faster than the Chunk and Nibble algorithms. This is actually expected, even though Chunk and Nibble are considered the best and fastest algorithms–see point 3 in the conclusion.

Now here’s the second run:

The small table is again roughly the same for all algorithms. Stream is clearly the fastest but what’s more notable is that GroupBy and Nibble are nearly identical even though Nibble is tremendously more complex than GroupBy. As the table becomes bigger (16k+ rows), mk-table-sync “conventional wisdom” is more clearly illustrated: Chunk and Nibble are extremely faster than GroupBy and Stream.

This was a very quick benchmarking job but from it we can draw some conclusions:

  1. There’s little difference between the algorithms when syncing small tables.
  2. The GroupBy algorithm might be the best choice for small tables since it’s comparable to Chunk and Nibble but internally less complex (it doesn’t use checksums or crypto hashes for example).
  3. If syncing to a destination that is missing a lot of rows, GroupBy and Stream can be much faster because Chunk and Nibble will waste a lot of time checksumming chunks which GroupBy and Stream do not do.
  4. When syncing large tables or tables with few difference, conventional wisdom still holds: Chunk and Nibble are the best choices.

Written by Daniel Nichter

October 23rd, 2009 at 2:32 pm

Posted in Maatkit

Tagged with ,