Hack MySQL

Databases and documentation

without comments

The MySQL documentation impresses me. I can’t recall a time when I didn’t find what I was looking for or that its details were lacking. Documenting a database server is difficult; the amount of information to organize and communicate clearly is staggering. I tasted that challenge when I helped update the Drizzle documentation, which was recently updated online.

I think Drizzle will become be a serious alternative or compliment to MySQL. Imho, Drizzle was until now held back by its documentation, or lack thereof, which created an insurmountable stumbling block for even clever DBAs. For example, Drizzle is largely comprised of plugins, but none of those plugins were documented until now.

I use the MySQL documentation as a goal, that to which the Drizzle documentation strives, not necessarily in style but in completeness. Although Drizzle began life as a fork of MySQL, it has evolved to the point where its own success will be, imho, largely dependent upon the completeness of its own documentation.

Written by Daniel Nichter

November 12th, 2011 at 5:26 pm

Posted in Drizzle

Tagged with ,

New algorithm for calculating 95 percentile

with 4 comments

The 95 percentile for query response times is and old concept; Peter and Roland blogged about it in 2008. Since then, MySQL tools have calculated the 95 percentile by collecting all values, either exactly or approximately, and returning all_values[int(number_of_values * 0.95)] (that’s an extreme simplification). But recently I asked myself*: must we save all values? The answer is no. I created a new algorithm** for calculating the 95 percentile that is faster, more accurate, and saves only 100 values.***

Firstly, my basis of comparison is the 95 percentile algo used by mk-query-digest. That algo is fast, memory-stable, and very proven in the real world. It works well for any number of values, even hundreds of thousands of values. It saves all values by using base 1.05 buckets and counting the number of values that fall within the range of each bucket. The results are not exact, but the differences are negligible because a 10ms and 13ms response time are indiscernible to a human. Any algo that hopes to handle very large numbers of values must approximate because not even C can store and sort hundreds of thousands of floats (times N many attributes times N many query classes) quickly enough.

So when I finished the new algo, I compared it to the mk-query-digest algo and obtained the following results:

FILE                         REAL_95     OLD_95     NEW_95  OLD_DIFF NEW_DIFF  OLD_TIME NEW_TIME   FILE_SIZE  OLD_RSS  NEW_RSS
nums/500k-1-or-2               1.751      1.697      1.784    -0.054   +0.033     12.12     9.37     4500000    3.88M    2.63M
nums/100k-1-or-2               1.749      1.697      1.794    -0.052   +0.045      2.42     1.88      900000    3.88M    2.63M
nums/50k-trend-1-to-9          6.931      6.652      6.995    -0.279   +0.064      1.24     0.90      450000    3.88M    2.63M
nums/25k-trend-1-to-5          3.888      3.704      3.988    -0.184   +0.100      0.64     0.47      225000    3.88M    2.63M
nums/21k-1-spike5-1            0.997      0.992      2.002    -0.005   +1.005      0.55     0.42      189000    3.88M    2.63M
nums/10k-rand-0-to-20         19.048     18.532     19.054    -0.516   +0.006      0.29     0.21       95079    3.86M    2.62M
nums/10k-rand-0-to-10          9.511      9.360      9.525    -0.151   +0.014      0.29     0.21       90000    3.86M    2.62M
nums/4k-trend-1-to-7           5.594      5.473      6.213    -0.121   +0.619      0.14     0.09       36000    3.86M    2.63M
nums/1k-sub-sec                0.941      0.900      0.951    -0.041   +0.010      0.07     0.04        9000    3.80M    2.62M
nums/400-half-10              10.271      9.828     10.273    -0.443   +0.002      0.05     0.03        3800    3.79M    2.62M
nums/400-high-low             10.446     10.319     10.446    -0.127        0      0.05     0.03        3800    3.79M    2.62M
nums/400-low-high             10.445     10.319     10.475    -0.126   +0.030      0.05     0.03        3800    3.79M    2.63M
nums/400-quarter-10           10.254      9.828     10.254    -0.426        0      0.06     0.03        3700    3.79M    2.62M
nums/153-bias-50              88.523     88.305     88.523    -0.218        0      0.05     0.03        1500    3.79M    2.62M
nums/100-rand-0-to-100        90.491     88.305     90.491    -2.186        0      0.05     0.03         991    3.79M    2.62M
nums/105-ats                  42.000     42.000     42.000         0        0      0.05     0.03         315    3.75M    2.61M
nums/20                       19.000     18.532     19.000    -0.468        0      0.04     0.03          51    3.79M    2.62M
nums/1                        42.000     42.000     42.000         0        0      0.04     0.03           3    3.75M    2.61M

I generated random microsecond values in various files. The first number of the filename indicates the number of values. So the first file has 500k values. The remaining part of the filename hints at the distribution of the values. For example, “50k-trend-1-to-9″ mean 50k values that increase from about 1 second to 9 seconds. Number and distribution of values affects 95 percentile algorithms, so I wanted to simulate several possible combinations.

“REAL_95″ is the real, exact 95 percentile; this is the control by which the “old” (i.e. the mk-query-digest) and new algos are compared. The diffs are comparisons to this control.

Each algo was timed and its memory (rss) measured, too. The time and memory comparisons are a little bias because the mk-query-digest module that implements its 95 percentile algo does more than my test script for the new algo.

The results show that the new algo is about 20% faster in all cases and more accurate in all but one case (“21k-1-spike5-1″). Also, the new algo uses less memory, but again this is a little bias; the important point is that it doesn’t use more memory to get its speed or accuracy increase.

The gains of the new algo are small in these comparisons, but I suspect they’ll be much larger given that the algo is used at least twice for each query. So saving 1 second in the algo can save minutes in data processing when there’s tens of thousands of queries.

Instead of explaining the algorithm exhaustively, I have upload all my code and data so you can reproduce the results on your machine: new-95-percentile-algo.tar.gz. You’ll need to checkout Maatkit, tweak the “require” lines in the Perl files, and tweak the Bash script (cmp-algos.sh), but otherwise I think the experiment should be straight forward. The new algo is in new-algo.pl. (new-algo.py is for another blog post.)

My ulterior motive for this blog post is to get feedback. Is the algorithm sane? Is there a critical flaw that I overlooked? Do you have a real-world example that doesn’t work well? If you’re intrepid or just curious and actually study the algo and have questions, feel free to contact me.

* By “recently asked myself” I mean that some time ago Baron and I wondered if it was possible to calculate 95 percentile without saving all values. At that time, I didn’t think it was feasible, but lately I thought and coded more about the problem.

** By “a new algorithm” I doubt that this has never been attempted or coded before, but I can’t find any examples of a similar algorithm.

*** By “saves only 100 values” I mean ultimately. At certain times, 150 values may be saved, but eventually the extra 50 should be integrated back into the base 100 values.

Written by Daniel Nichter

August 29th, 2011 at 9:42 pm

Posted in Programming

Tagged with , ,

MySQL tools article

with 7 comments

Although I no longer maintain hackmysql.com, I still actively develop MySQL tools. An article I wrote, 10 essential MySQL tools for admin, was published today. I hope no one’s feelings are hurt if their tool isn’t in the list, but it was rather difficult to compile the list given that so many tools are either not actively developed, not tested, or not well documented. Given lag time between writing and publishing, I was not able to write about Yoshinori’s MHA or newer tools. And given length constraints, I was not able to write about more tools. In any case, the world of MySQL tools is alive and well.

Written by Daniel Nichter

August 1st, 2011 at 9:44 am

Posted in MySQL

Tagged with , , , ,

Drizzle 7 plugins

with one comment

Last week I wrote about my experience compiling Drizzle 7 on Mac OS X 10.6. Then David Shrewsbury informed me of his nearly identical blog post: Installing Drizzle from source on OS X. Once Drizzle 7 was running on my box, I immediately looked to see what plugins where available because Drizzle uses a lot of plugins and they are one of its notable differences from MySQL. In my humble opinion, Drizzle’s plugins will primarily influence how database professionals evaluate and decide whether or not to use Drizzle because so many of Drizzle’s features are plugins. Therefore, let’s look briefly at some the plugins included with Drizzle 7.

The plugin directory of the Drizzle 7 tarball lists 79 plugins. Each plugin has a plugin.ini file that tells Drizzle its name (title) and description (among other things):

Plugin Title Description
ascii UDF Plugin for ASCII UDF Plugin for returning ASCII value of a character
auth_all Allow All Authentication Default authentication module.
auth_file File-based Authentication Authenticate against a list of username:password entries in a plain text file.
auth_http HTTP Authentication Plugin HTTP based authentications
auth_ldap LDAP Authentication Authenticate against an LDAP server.
auth_pam PAM Authenication Plugin PAM based authenication.
auth_test Authentication Test A hard-coded authentication test plugin
benchmark UDF Plugin for Benchmark UDF Plugin for Benchmark
catalog Catalog related DDL functions. Provides functions for DDL operations on catalogs.
charlength UDF Plugin for CharLength UDF Plugin for returning number of characters in a string
coercibility_function COERCIBILITY() Function Gets coercibility for an expression
collation_dictionary Character And Collation Dictionary Dictionary plugins for character set and collation system.
compression compression UDFs UDF Plugin for compression
connection_id Connection ID UDF UDF Plugin for connection_id function
console Console Client Plugin that provides a command console in the drizzled process.
crc32 crc32 UDF UDF Plugin for crc32
debug Kernel debugger utilities. crash(), assert(), and additional developer style tools.
default_replicator Default Replicator A simple default replicator which replicates all write events to all appliers
drizzle_protocol Drizzle Protocol Module Drizzle protocol module.
errmsg_stderr Error Message Plugin Errmsg Plugin that sends messages to stderr.
error_dictionary Error Code data dictionary tables. List the error codes and messages genearted by Drizzle.
filtered_replicator Filtered Replicator A simple filtered replicator which allows a user to filter out events based on a schema or table name
function_dictionary Function data dictionary tables. List the functions available to SQL developers.
function_engine Data Engine Function Engine used for Table Functions.
gearman_udf Gearman Client UDFs UDF plugin that provides a Gearman client interface
haildb HailDB storage engine Uses the HailDB library to implement a storage engine for Drizzle.
hello_events An example event observer Plugin An example event observer Plugin for Drizzle
hello_world Example Plugin UDF Plugin for Hello World.
hex_functions HEX() and UNHEX() functions converting strings to and from hexadecimal
information_schema_dictionary Information Schema Dictionary Creates dictionary tables for the ANSI information schema tables
innobase InnoDB Storage Engine Transactional Tables using InnoDB
length UDF Plugin for Length UDF Plugin for returning byte length of a string
logging_gearman Gearman Logging Plugin Logging Plugin that logs to Gearman.
logging_query Query Logging Plugin Logging Plugin that logs all queries.
logging_stats Statistics Plugin for Drizzle Statistics Plugin for Drizzle
math_functions Math Functions Math Functions
md5 UDF Plugin for MD5 UDF Plugin for MD5
memcached_functions Memcached Client UDFs UDF plugin for libmemcached
memcached_query_cache Query Cache Query Cache for Memcached
memcached_stats Memcached Stats in DATA_DICTIONARY tables Some DATA_DICTIONARY tables that provide Memcached stats
memory Memory Storage Engine Volatile memory based tables
multi_thread Multi-Thread Scheduler One Thread Per Connection Scheduler
myisam MyISAM Storage Engine Legacy non-transactional MySQL tables
mysql_protocol MySQL Protocol Module MySQL protocol module that is based on the old libmysql code.
mysql_unix_socket_protocol MySQL Unix Socket Protocol Module MySQL Unix Socket Protocol Module
pbms PrimeBase Media Stream Daemon Provides BLOB streaming service for storage engines
performance_dictionary Performance Dictionary Table functions related to performance issues.
protocol_dictionary Protocol Dictionary Provides dictionary tables for protocol counters
rabbitmq RabbitMQ Replicator Replicator plugin that publishes transactions to RabbitMQ
rand_function RAND() function Function for pseudo-random number generation
regex_policy Regex Policy Authorization plugin which implements a regex to object mapping
registry_dictionary Registry Dictionary Provides dictionary tables for plugin system.
replication_dictionary Registry Dictionary Provides dictionary tables for replication system
reverse_function REVERSE function function that reverses a string
rot13 ROT 13 A wonderful ROT 13 implementation
schema_dictionary Schema Dictionary Creates dictionary tables for federated schema data.
schema_engine Schema Engine This implements the default file based Schema engine.
session_dictionary Session Dictionary Dictionary for session information, includes processlist system.
show_dictionary Show Dictionary Creates dictionary tables for SHOW commands
show_schema_proto SHOW_SCHEMA_PROTO(db) Function Shows a text representation of the schema (database) proto
shutdown_function Shutdown function Use this function to shutdown the database.
signal_handler Default Signal Handler Default Signal Handler
simple_user_policy Simple User Policy Authorization plugin which implements a simple policy-based strategy
slave Replication Slave Provides slave replication services
sleep Sleep Function Function Plugin for sleep().
status_dictionary Status Dictionary Dictionary system for status, variables, commands.
storage_engine_api_tester Storage Engine API Tester Test engine that enforces SE API calling conventions very strictly
string_functions String Functions String Functions
substr_functions SUBSTRING() and SUBSTRING_INDEX() functions SUBSTRING and SUBSTRING_INDEX functions
syslog Syslog Syslog interface for query log, error messages, and functions
table_cache_dictionary Table Cached and Table Definition Dictionary A look into the fabulous world of what is in your caches.
tableprototester Table Proto Message Tester Used only to test other bits of the server.
transaction_log Transaction Log Log of Transaction Messages
trigger_dictionary Trigger Dictionary Table for describing what triggers have been loaded.
user_locks User Locks. Provides lock functions to users.
utility_dictionary Utility data dictionary tables. Generate data to be used for tests, etc.
utility_functions Utility Functions Utility Functions
uuid_function UUID() plugin UUID() function as generated by libuuid
version version UDF UDF Plugin for version function

This list demonstrates to what extent features of Drizzle are outside its kernel and provided instead by plugins. Even SLEEP() is a plugin! By comparison, many of these features are built-in with MySQL, like authentication and replication. It is not my intent in this blog post to discuss the pros and cons of this “(almost) everything is a plugin” design decision (I like it actually), so let’s instead play with a plugin to see how it treats us; let’s enable auth_file because I recently read Eric Day’s blog post Writing Authentication Plugins for Drizzle.

Enabling a Plugin: auth_file

For my first attempt, I simply start drizzled with the option --plugin-add=auth_file. That does not work:

$ cat var/drizzled.err 
Could not load auth file: Could not open users file: /opt/drizzle7/etc/drizzle.users

Plugin 'auth_file' init function returned error.


That is, at least, a nice error message: I know what failed and why. I could probably just make that file, but since I don’t know what format it expects my inclination is to consult the Drizzle reference manual because I’m habituated to consulting the MySQL reference manual whenever I have questions. Sadly, however, the Drizzle plugin documentation is severely limited and does not list the plugin, nor does the Drizzle authentication documentation page. Eric’s blog post mentioned earlier describes the file format, but I’m going to pretend to be a regular user and consult the regular sources of information. If the database’s docu doesn’t say how to enable/configure the plugin, my next guess is drizzled --help, but that doesn’t list auth_file. Then I try drizzled --plugin-add=auth_file --help and that does list:

Options used by auth_file:
  --auth-file.users arg (=/opt/drizzle7/etc/drizzle.users)
                                        File to load for usernames and 

It’s nice that --help is dynamic, but it still doesn’t tell me what file format auth_file expects. If I could start drizzled with the plugin, then perhaps I could do SELECT * FROM data_dictionary.plugins WHERE plugin_name='auth_file' but I can’t do that either. So as a naïve user, I’m out of luck unless I guess.

For the sake of making this example work, I’ll stop being the naïve user and instead consult the list of plugins above that says “Authenticate against a list of username:password entries in a plain text file.” and Eric’s blog post mentioned earlier which says the same things. So I create /opt/drizzle7/etc/drizzle.users with one line foo:bar and then attempt to start drizzled with the plugin again:

$ drizzle
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 1
Connection protocol: mysql
Server version: 2011.03.13 Source distribution (drizzle)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle> SELECT * FROM data_dictionary.plugins WHERE plugin_name='auth_file';
| auth_file   | Authentication | YES       | auth_file   | 

Success! But wait… I still connected with my account username (daniel) and no password. Why? Because another authentication module is enabled:

drizzle> SELECT * FROM data_dictionary.plugins WHERE plugin_name LIKE 'auth%';
| auth_all    | Authentication | YES       | Allow-All-Authentication | 
| auth_file   | Authentication | YES       | auth_file                | 

It seems I need to also start drizzled with --plugin-remove=auth_all. This produces the expected result:

$ drizzle
ERROR 1045 (28000): Access denied for user 'daniel' (using password: NO)

$ drizzle --user foo --password=bar
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 5
Connection protocol: mysql
Server version: 2011.03.13 Source distribution (drizzle)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.


Of course not all plugins have to be configured; some are self-evident, like sleep… or are they?

drizzle> SELECT SLEEP(7);
| SLEEP(7) |
|        0 | 
1 row in set (0.000624 sec)

Notice the last line: the query did not sleep for 7 seconds; it returned immediately. It seems that the sleep plugin is broken or I’m not using it correctly. SLEEP(1.0) sleeps for 1.0 seconds, but SLEEP(7.0) returns immediately. SLEEP() is probably not a mission-critical feature of the database, but its failure in Drizzle highlights the need for more testing and documentation of plugins.

A final word on this topic before we move on: without documentation I don’t know how to configure real authentication and security. How does one, for example, grants certain users certain access to certain databases or tables? Are there column-level grants? What privileges are needed to enable or disable dynamic features? In brief: Drizzle authentication/security is either under-developped or under-documented. I hope it’s just the latter because anyone can write docs, fewer can write good authentication modules.

Active Plugins and Their Types

Let’s continue to looking at what information about plugins is exposed inside Drizzle by executing SELECT * FROM data_dictionary.plugins ORDER BY plugin_name:

| PLUGIN_NAME          | PLUGIN_TYPE           | IS_ACTIVE | MODULE_NAME                  |
| abs                  | Function              | YES       | Math Functions               | 
| acos                 | Function              | YES       | Math Functions               | 
| ascii                | Function              | YES       | ascii                        | 
| asin                 | Function              | YES       | Math Functions               | 
| auth_file            | Authentication        | YES       | auth_file                    | 
| console              | Listen                | YES       | console                      | 
| default_replicator   | TransactionReplicator | YES       | default_replicator           | 
| Error_message_stderr | ErrorMessage          | YES       | errmsg_stderr                | 

That is an abbreviated list; the complete list is 169 plugins of various types. Two questions result from this: why are there more plugins here than listed above, and what are the various plugin types? I don’t know the answer to the first question, but I’m sure a Drizzle developer does? My semi-educated guess is that Drizzle also has modules which provide several functions/plugins. Don’t quote me on that though.

The second question can be answered easily, and it gives us an insight into how plugins are organized and what aspects of Drizzle they implement:

drizzle> SELECT plugin_type, COUNT(plugin_name) AS n_plugins
      -> FROM data_dictionary.plugins
      -> GROUP BY plugin_type ORDER BY n_plugins DESC;
| plugin_type           | n_plugins |
| TableFunction         |        79 | 
| Function              |        74 | 
| StorageEngine         |         5 | 
| Listen                |         4 | 
| Authentication        |         1 | 
| Catalog               |         1 | 
| Daemon                |         1 | 
| ErrorMessage          |         1 | 
| Logging               |         1 | 
| Scheduler             |         1 | 
| TransactionReplicator |         1 | 

All the TableFunction plugins are related to the DATA_DICTIONARY and the INFORMATION_SCHEMA, so they’re probably something that end users will just expect to work and wouldn’t seek to replace or tweak. The Function plugins cover a range of functions from HEX() and VERSION() to KILL(). These are probably also aspects of the database that users will just expect to work, but I like that we have the ability to plug in a new RAND(), for example. Similarly, plugin types Catalog, Daemon, ErrorMessage and Scheduler are core aspects of the database that few users will want to tweak. On the other hand, the remaining types allow for a lot of innovation and development.

The Listen plugins relate to protocols that Drizzle can speak. My build includes the console “protocol”, MySQL and Drizzle TCP/IP protocols, and the standard Unix socket protocol. In theory, Drizzle could speak directly to other database technologies, like memcached, with a plugin to enable/translate the foreign “language”. This is nice because several database technologies are often used in conjunction.

The Authentication plugins will probably also become very important and popular because authentication/security is a central aspect of a database. Many big corporations and governments have complex security requirements and there a lot of security/authentication methods. From time to time I hear people criticize MySQL’s authentication as too rudimentary. This is no longer a problem in Drizzle.

StorageEngine plugins will no doubt play a central role in Drizzle’s future just as it does in MySQL’s present. I won’t say any more about this because, for one, it’s already well understood (i.e. the benefits of pluggable storage engines), and, two, Drizzle is new and there’s a lot of storage engines so it remains to be seen which storage engines become the most influential (like how MyISAM was influential in MySQL’s early history, then gave way to InnoDB).

Another important, pluggable aspect of Drizzle that is not well represented by this list but hinted to by TransactionReplicator is replication. Unlike MySQL, replication in Drizzle is pluggable. Like storage engines, pluggable replication opens the door new to innovations like RabbitReplication, but also like storage engines, Drizzle is new so it remains to be seen which system of replication will become the most influential. Or, perhaps, no one system will dominate and users will need to choose the best replication plugin for their particular environment.

Lastly, the plugin type that interests me since I’ve written a couple of log parsers and I’m the primary coder for mk-query-digest: Logging. I’ll be the one to say it: original/order versions of the MySQL statement logs (i.e. the general log and the slow log) sucked. The general log is a nightmare to parse and it’s really nothing more than queries and timestamps–no “actionable” information. The slow log is better since it has information like query execution time, etc., but it wasn’t really until Percona extensively extended the slow log (both in configurability and output) that the slow log allowed us to do a lot of valuable analyses of query events inside MySQL. Therefore I’m very happy that logging in Drizzle is pluggable so that logging can be done well from the start. Let’s see what logging awesomeness Drizzle has for us.

Query Logging with logging_query

Sadly the Drizzle plugin documentation doesn’t yet mention any logging plugins. The Drizzle logging documentation lists Unix Syslog and File logging, but at the time of writing, both of those pages are blank. From the list of plugins at the start of this blog post, we see logging_query which, by its description, seems to act like a general log since it logs “all” queries, not just slow queries. It’s our only option so we’ll try to enable and use it.

I first tried to simply enable the plugin by starting drizzled with --plugin-add. The server started but there was no logging to be found anywhere, and the plugin did not even show as enabled in the DATA_DICTIONARY. A little confused, I consulted:

$ drizzled --plugin-add=logging_query --help
Options used by logging-query:
  --logging-query.enable                Enable logging to CSV file
  --logging-query.filename arg          File to log to
  --logging-query.pcre arg              PCRE to match the query against
  --logging-query.threshold-slow arg (=0)
                                        Threshold for logging slow queries, in 
  --logging-query.threshold-big-resultset arg (=0)
                                        Threshold for logging big queries, for 
                                        rows returned
  --logging-query.threshold-big-examined arg (=0)
                                        Threshold for logging big queries, for 
                                        rows examined

Perhaps I must “doubly” enable the plugin: add it with --plugin-add and turn it on with --logging-query.enable. I try that but still no query log can be found anywhere and Drizzle throws no errors. So, based on reading Ronald’d old blog post about this very topic, Drizzle query logging, I add a third option: --logging-query.filename=$BASEDIR/var/query-log.csv. That makes the plugin come alive: the plugin is listed as enabled in the DATA_DICTIONARY and var/query-log.csv exists:

$ cat var/query-log.csv 
1303070901062988,1,1,"","select @@version_comment limit 1","Query",1077,243,243,1,0,0,0,1,"beatrice.local"
1303070902100576,1,2,"","select * from data_dictionary.plugins where plugin_name='logging_query'","Query",1038665,2498,575,2,170,0,0,1,"beatrice.local"

I won’t repeat what Ronald already said in his aforementioned blog post, but he makes the important note: “Unfortunately the log does not yet provide a header.” Yes, that’s very unfortunate because there is no official documentation so the current query log format is essentially useless to anyone but hackers like Ronald and people like me who care enough to search out the answers that I need.


The Drizzle 7 GA tarball comes with roughly 70 usable plugins in its plugin/ directory. More plugins, over 100, are shown in the DATA_DICTIONARY.plugins table. These plugins represent the majority of features provided by Drizzle 7, from query logging, to user authentication, to storage engines, to replication. At present, many of the provided plugins are either bare-bones implementations of features or not sufficiently documented to be usable by a normal user.

It’s easy to criticize, but I make these observations to help bring attention to areas that need work because I want Drizzle to develop and mature quickly. The pluggable storage engine feature of MySQL helped spur the development of storage engines resulting in innovation and competition that still, today, provides users great, free choices. The pluggable design has been taken to the extreme in Drizzle, extending far beyond just storage engines, to nearly everything the database server can do. I hope this spurs similar advancements in features that are lacking in Drizzle, as well as cool new features no one has imagined yet.

From this brief look at the plugins provided with Drizzle 7, I suggest the follow two things to the Drizzle developers:

  1. Document everything about Drizzle at docs.drizzle.org. Now that Drizzle has a GA release, I think good, complete documentation is the single most important task that will allow average users to actually use Drizzle, thereby giving Drizzle traction in the market, thereby creating the need for further development based on user feedback and bugs.
  2. Develop a standard for how plugins should behave. For example: auth_file has a nice error message, but logging_query does nothing silently until you enable it “just right”.

Thanks to those who did the hard work of making Drizzle pluggable and writing the first generation of plugins.

Written by Daniel Nichter

April 17th, 2011 at 3:05 pm

Compiling Drizzle 7 on Mac OS X 10.6

with 3 comments

Drizzle 7 GA has been released, so I wanted to compile and test it on my Mac running OS X 10.6.7.  Since Drizzle 7 is new, Mac binaries are not available yet.  I’ve compiled MySQL from source more times than I can remember, and Drizzle was forked from MySQL, so I expected the build process to be similar and pain-free, and for the most part it was.  I did not use MacPorts or Homebrew for various reasons, mainly because I know that I will compile, tweak and recompile Drizzle often while hacking on it.  Also, the blog post  Drizzle in the Snow is about building Drizzle on Mac OS X, but it’s out of date (published September 1, 2009).  Thus the need for this blog post.

After describing how I compiled Drizzle 7 on my Mac, I list several suggestions for the Drizzle developers/maintainers based on my experience.

I began, of course, by downloading and extracting the tarball from drizzle.org.  Then I proceeded to read the fine manual, even though the build was surely the standard trio: configure, make, make install.  There are three docs about compiling Drizzle: the README in the tarball, the online Drizzle documentation, and the DrizzleWiki.  The README lists that standard trio and refers to the online docu.  The online docu also lists the standard trio but, of primary importance when compiling from source, it also lists dependencies.  The wiki page on compiling also refers to the online docu, and if you Google for “compile drizzle mac” you’ll find other wiki pages but they are all out of date (e.g. one still says you need to compile libdrizzle first; this is no longer true: libdrizzle comes with Drizzle).  So, the online docu is the best source because we need to know Drizzle’s dependencies.

The online docu does not have specific instructions for compiling Drizzle from source on Mac OS X.  On this platform, the number one pre-requisite is Xcode: Apple’s development package which installs all the basic programs for compiling code (e.g. g++, make, etc.)  I’m using Xcode 3.2 because it’s free (whereas Xcode 4 cost a few dollars).  So, Xcode must be installed first before any other packages can be compiled (but if you already develop on a Mac, you probably already knew this).

From the list of dependencies, I had to download and install boost, gettext, intltool, and protobuf.  These last three are really simple: configure, make, make check, make install.  I did make check because although Mac OS X is a Unix platform, it has its quirks. Those three packages compiled, tested and installed effortlessly, all using the prefix /usr/local/. boost, however, required more attention.

To start, I just followed boost’s docu page Getting Started on Unix Variants, but it didn’t work as easily as the other packages. First, since Drizzle lists only a few boost libs as dependencies, I only compiled those:

./bootstrap.sh \

When that finishes, it tells me:

Bootstrapping is done. To build, run:


That is different from the docu which says to run bjam install. Running just bjam does compile the libs, but it doesn’t install them. Running with the install argument makes bjam hang. So I tried bjam --install which compiles the libs but still doesn’t install the libs. Furthermore, the compiled libs did not have -mt suffixes to indicate that they where multi-threading. I noticed that Drizzle configure checks first for libs with the -mt suffix, so I really wanted to make my boost libs multi-threading. After asking around then filing a boost bug about this, I was informed that the docu was out of date and that I needed to use --layout=tagged. So, finally, to compile and manually install the boost libs for Drizzle I did (all commands):

bootstrap.sh \
bjam --layout=tagged
sudo cp stage/lib/* /usr/local/lib/
sudo mv boost /usr/local/include/

After that, /usr/local/include/boost/ has the boost header files and /usr/local/lib/ has boost libs like:

/usr/local/lib$ ls libboost*
libboost_date_time-mt.a                libboost_regex-mt.a
libboost_date_time-mt.dylib*           libboost_regex-mt.dylib*
libboost_filesystem-mt.a               libboost_system-mt.a
libboost_filesystem-mt.dylib*          libboost_system-mt.dylib*
libboost_iostreams-mt.a                libboost_test_exec_monitor-mt.a
libboost_iostreams-mt.dylib*           libboost_thread-mt.a
libboost_prg_exec_monitor-mt.a         libboost_thread-mt.dylib*
libboost_prg_exec_monitor-mt.dylib*    libboost_unit_test_framework-mt.a
libboost_program_options-mt.a          libboost_unit_test_framework-mt.dylib*

Now Drizzle configure finishes, it doesn’t die saying it can’t find a library it needs. Furthermore, I see that it finds the -mt boost libs, so I hope that makes it happy (because, after all, one selling point for Drizzle is massive concurrency, so I don’t want to build it without being sure it’s using multi-threading libs). But there’s a problem: configure finishes by saying:

Configuration summary for drizzle7 version 7 drizzle

   * Installation prefix:       /usr/local
   * System type:               apple-darwin10.7.0
   * pandora-build version:     0.175
   * Host CPU:                  i386
   * C Compiler:                i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664)
   * C++ Compiler:              i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664)
   * Assertions enabled:        yes
   * Debug enabled:             no
   * Profiling enabled:         no
   * Coverage enabled:          no
   * Warnings as failure:       no

My MacBook Pro has an Intel Core 2 Duo, so the CPU is definitely not i386; it should be x86_64. I’m not a build expert, but I think this information is detected by config.guess by looking at uname values, which on my system returns:

$ uname -a
Darwin MacBook-Pro.local 10.7.0 Darwin Kernel Version 10.7.0:
Sat Jan 29 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386

That probably explains why configure detects the CPU has i386 (and the OS version as 10.7). So to correct this, I ran: configure --build=x86_64-apple-darwin10.6 --prefix=/opt/drizzle7. (I added --prefix because I want to install it elsewhere.) That gives me the results I expect:

Configuration summary for drizzle7 version 7 drizzle

   * Installation prefix:       /opt/drizzle7
   * System type:               apple-darwin10.6
   * pandora-build version:     0.175
   * Host CPU:                  x86_64
   * C Compiler:                i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664)
   * C++ Compiler:              i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664)
   * Assertions enabled:        yes
   * Debug enabled:             no
   * Profiling enabled:         no
   * Coverage enabled:          no
   * Warnings as failure:       no

Again, I’m no build expert, but I hope this has an affect (i.e. the code is compiled with modern optimizations). If nothing else, it gives me piece of mind. So I run make and after some time Drizzle compiles without any problems. More importantly, I run make test and the result:

All 557 tests were successful.
The servers were restarted 54 times
Spent 242.179 of 386 seconds executing testcases

Congratulations to the Drizzle developers! All tests passing straight out of the box on a completely foreign system (i.e. mine) is an accomplishment.

Finally, make install puts everything in the prefix I chose (/opt/drizzle7):

/opt/drizzle7$ ls *
drizzle*       drizzleadmin*  drizzledump*   drizzleimport* drizzleslap*

drizzle7/       libdrizzle-1.0/

drizzle7/                       libdrizzledmessage.0.dylib*
libdrizzle.1.1.0.dylib@         libdrizzledmessage.dylib@
libdrizzle.1.dylib*             libdrizzledmessage.la*
libdrizzle.dylib@               locale/
libdrizzle.la*                  pkgconfig/

drizzled@  drizzled7*


Now to run Drizzle for the “first” time (actually I compiled and ran Drizzle last year on my PC/Ubuntu machine, so this is only my first time on my Mac). I use the command line specified in the README:

/opt/drizzle7$ ./sbin/drizzled \
   --no-defaults \
   --port=3306 \
   --basedir=/opt/drizzle7 \
   --datadir=/tmp/drizzle7/data >> /tmp/drizzle7/drizzled.err 2>&1 &
[1] 72197

[1]+  Exit 1  ./sbin/drizzled [snip] >> /tmp/drizzle7/drizzle.err 2>&1

/opt/drizzle7$ cat /tmp/drizzle7/drizzled.err
unknown option port
Use --help to get a list of available options


Oh no! drizzled failed to start because the --port option doesn’t actually exist. So the README is a little out of day. No problem: just re-run it without --port and it starts without problems:

$ mysql -h 127.1
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 2011.03.13 Source distribution (drizzle)

All in all, the build process was painless. (boost gave me more problems than Drizzle.) Now I can start hacking on Drizzle and comparing it to MySQL (whether or not that’s a fair comparison, people are going to do it often). After a few minutes of looking around Drizzle, I can already see that its “out of the box experience” is quite different from MySQL’s. For example, because I didn’t build Drizzle with extra plugins, the default authentication plugin is auth_all, i.e. no authentication, and the logging_query plugin isn’t available so no slow query logging, either. These are topics and considerations for other blog posts.

Based on my experience compiling Drizzle 7 GA from source on Mac OS X 10.6, here are my recommendations to the Drizzle developers/maintainers:

  1. Remove DrizzleWiki and centralize all documentation at docs.drizzle.org.
  2. Update the tarball README and/or add an INSTALL.
  3. Update docs.drizzle.org (i.e. make everything refer only to Drizzle 7).
  4. Consolidate information on docs.drizzle.org (e.g. “Software Requirements” lists some required libs, then “Minimal Requirements” lists more).
  5. When compilation instructions specific to Mac OS X are written, mention the need for Xcode.
  6. Harass the boost developers to update their documentation.
  7. Provide Mac OS X binaries.

As a developer of a much smaller project, I know this is all easier said than done.  Overall though, good job on Drizzle 7.

Written by Daniel Nichter

April 10th, 2011 at 12:37 pm

Posted in Drizzle

Tagged with , , , ,