mysqlsla Documentation

mysqlsla is a MySQL Statement Log Analyzer. mysqlsla reads one or more logs produced by MySQL and reports helpful information about the queries in those logs. That helpful information is used to optimizing MySQL by determining and fixing bad or slow queries.

Most often mysqlsla is used by consultants and administrators to parse MySQL slow and general log files because these files become too big to comprehend manually. The slow log is especially useful in diagnosing a slow MySQL server. The general log is useful in analyzing and optimizing other more general problems.

mysqlsla also parses "raw" logs. Raw logs are simply flat files with semi-colon new-line (;\n) terminated MySQL statements. These kinds of logs are useful for developers and clever hackers. In a sense, raw logs are a "catch-all": if you need mysqlsla to parse something it doesn't intrinsically support, you can hack your data into a raw log format and then analyze it with mysqlsla.

Limitations

A Note About Options

Technically, command line options are in the form --option, but -option works too.

All options can be abbreviated if the abbreviation is unique. For example, option --top can be abbreviated --to but not --t because --t is ambiguous: it could mean --top, --time-each-query, or --time-all-queries.

Some options also have explicit aliases, such as --databases being aliased --db. Where explicit aliases exist they are noted like "--databases (--db)."

Options that accept multiple values (--general, --slow, --raw, --databases) expect the values to be comma separated with no spaces.

General Options

--user USER
--password
--host ADDRESS
--port PORT
--socket SOCKET
--no-mycnf
--help
--debug
These options mimic most standard applications. --no-mycnf makes mysqlsla not read ~/.my.cnf which it does by default otherwise.
--general LOG
--slow LOG
--raw LOG
One of these options is required. Each option refers to its respective type of log. All options are exclusive (meaning, you can only use one of the three). To specify multiple logs, comma separate the list of log files. For example, to have mysqlsla read two general logs: --general file1.log,file2.log. If mysqlsla can't read a log it prints a warning and continues reading the other logs.
--flat When mysqlsla processes SQL statements, it first "flattens" them, which means they are set to all lowercase. This helps improve query abstraction because otherwise "SELECT * FROM foo;" and "select * from foo;" appear to be different statements in Perl (because Perl hash keys are case-sensitive). Later, mysqlsla re-capitalizes most important SQL keywords like SELECT, FROM, WHERE, ORDER BY, etc. This option prevents mysqlsla from re-capitalizes SQL keywords.
--examples For all analyses except --print-queries and --time-each-query, mysqlsla prints the abstracted form of queries. This options causes mysqlsla to print random, not-abstracted, examples of the queries instead.
--milliseconds Print time values in milliseconds if less than 1 second. For example: 0.014539 s is printed as 14.539 ms. Normally, MySQL logs all time values in seconds, but with the MySQL slow query log millisecond patch time values are logged in milliseconds. With that patch it is necessary to use this option, otherwise many time values may show only as 0 s.

Extra Analyses

--print-queries (--pq) Print ALL valid queries from all logs. This is useful if you want to see what mysqlsla actually read from the logs. This option is not affected by --top.
--explain (--ex) For each unique query, mysqlsla will print the output from EXPLAIN and the number of rows produced and read. This only works for SELECT statements. This option is also subject to whether or not mysqlsla knows which database the query uses. For general and raw logs, this is usually never a problem. For slow logs, however, it is often a problem. See also --databases.
--time-each-query (--te) Time how long it takes each query to execute. No status indicator is given. Therefore, if a query takes a really long time to execute, mysqlsla might appear to be frozen, but it is really only waiting for the query to finish. See also --avg and --filter.
--time-all-queries (--ta) Time how long it takes ALL queries in a general or raw log to execute. This option only works with general or raw logs. The analysis has only one result: the average total execution time of all queries. The current time run (or percentage complete of all time runs) is printed while the queries are being timed. The resulting average is an average of all the time runs. See also --avg, --percent, and --filter.

Sorting

The --sort command line options is perhaps the most important option. It causes mysqlsla to sort its results by a given value in descending order. Since many values are derived from MySQL logs, it is important to be able to sort the results by the value you are most interested in. Also, since different logs provide different kinds of information, there are two classes of sort values: one for all types of logs, and one only for slow logs.
Log TypeSort Value
All (slow, general, and raw) Value: Description
  • c: Count (number of times query occurs) (DEFAULT)
  • rp: Rows produced (calculated from EXPLAIN)
  • rr: Rows read (calculated from EXPLAIN)
  • e: Query execution time
  • ce: Approximate total query execution time (c * e)
Slow Only
  • t: Total query execution time (DEFAULT)
  • at: Average query execution time
  • l: Total query lock time
  • al: Average query lock time
  • rs: Average row sent
  • rs_max: Maximum rows sent
  • re: Average rows examined
  • re_max: Maximum rows examined

Analysis Options

--avg N (--n) For --time-each-query and --time-all-queries mysqlsla by default makes 1 time run. This option causes mysqlsla to make N time runs and average the results. If N is 20 or more, --percent is automatically invoked.
--databases D (--db) In order to EXPLAIN a query, mysqlsla must know which database the query uses. For general and raw logs, mysqlsla should always know this (because general logs always log which database and you can put USE statements in raw logs). For slow logs, however, the database is not always logged. Therefore, you can gives mysqlsla hints as to which databases queries with unknown databases belong to with the --databases command. mysqlsla will try to EXPLAIN a query with an unknown database against each given database. The first database to make EXPLAIN for the query work will be accepted. In the rare case that two databases can be used to EXPLAIN the query, only the first given and successful will be used.
--distribution (--pd) Technically this might be considered an extra analysis. For slow logs only, this option causes mysqlsla to list the distribution of query execution times under the line for Nth percent averages. The distribution of query execution times means how often (percentage) the query took n seconds to execute. Each distribution is listed like "%: n". For example, a listing of "40%: 15" means that 40% of all the times that the query was logged it took 15 seconds to execute. The last line of the distributions says what percentage of unique execution times the above listed distributions account for. For example, "79% of total" means that the above listed distributions account for 79% of all the various, unique execution times. See also --mp and --np.
--filter S By default mysqlsla allows the following SQL statements: DELETE, DO, INSERT, REPLACE, SELECT, TRUNCATE, UPDATE, USE, CALL, SET, START, SHOW, ROLLBACK, COMMIT. Any SQL statement not recognized (whether filtered or not) is discarded. This is to prevent reading junk statements. The default filter can be changed with this option. The parameter S is a comma-separated list of above SQL statements or * (for ALL statements) preceded by + to allow the statement or - to discard the statement. For example, to allow only INSERT and UPDATE: --filter -*,+INSERT,+UPDATE. To discard everything: --filter -*. To discard only CALL: --filter -CALL.

NOTE: A filter = -*,+SELECT,+USE is automatically set when either the --time-all-queries or --time-each-query options are used! This is meant as a safety feature to prevent people from accidentally damaging a real database by timing queries against it because timing queries actually runs the queries against the database. To override this safety, specify the filter that you want. REMEMBER: If, for example, a DELETE statement is timed, it is also actually executed on the database!
--flush-qc This option causes mysqlsla to FLUSH QUERY CACHE; before starting the analyses.
--grep P This option causes mysqlsla to keep only statements that match the Perl regular expression pattern P (case insensitive). It is applied after statement filtering. Since P is put directly into a pattern match (m/P/io), you may need to escape some special characters in the pattern like parenthesis.
--mp N Do not show time distributions (from --distribution) with a percentage less than N. Default is 5.
--no-report (--nr) If you wish to only --print-queries and/or --time-all-queries, this option causes the normal report to be suppressed.
--np N Show at most N time distributions (from --distribution). This option applies before --mp. Default is 10.
--nth-percent N (--nthp) For slow logs only, the third line in the report is the values of the slow query times re-calculated to include only the best N percent of values. The default for N is 95, therefore the line by default looks like:
95% of Time : 8127 total, 75.25 avg, 3 to 162 max

This line says that 95% percent of all the times that this slow query was logged, it took 75.25 seconds on average to execute, ranging between 3 and 162 seconds, for a total time cost of 8127 seconds.

These "nth percent" values may give a more realistic representation of the data because they exclude the worst times (the high-end 5%) which may only be rare, one-time occurrences.

The --nth-percent option sets the percentage of the best times to calculate for this line.
--only-databases X Analyze only queries using databases X, where X is a comma-separated list of database names (example: db1,db2,db3). This option overrides the --databases option. This option only works with general and raw logs (slow logs usually do not log the database).
--only-users X Analyze only queries belonging to users X, where X is a comma-separated list of user names (example: root,user123,bob). This option only works with general and slow logs.
--only-hosts X Analyze only queries belonging to hosts X, where X is a comma-separated list of host addresses, either by name or IP (example: localhost,192.168.0.1,box.domain.com). This option only works with general and slow logs.
--only-ids X Analyze only queries belonging to connection IDs X, where X is a comma-separated list of connection IDs (example: 23,45,67). This option only works with general logs.
--percent For --time-each-query and --time-all-queries mysqlsla by default makes 1 time run. If more time runs are specified with --avg, these time runs can be counted as a percentage complete with this option. This only affects --time-all-queries (--time-each-query doesn't indicate which time run its currently on). The percentage complete is listed as %25, 50%, 75%. This option is implicitly invoked if 20 or more time runs are specified with --avg.
--top N By default mysqlsla lists only the top 10 queries. This option limits or increases the output to the top N queries.

What To Do About Bug and Errors

If mysqlsla breaks, send me a message with a copy of its output including any error messages. I will probably ask for your log file because it is easiest to debug mysqlsla with the log file that is causing mysqlsla to break.
(Doc rev: Oct 27 2007)