Hack MySQL

Book review: Optimizing Oracle Peformance

with 8 comments

Optimizing Oracle Performance by Cary Millsap and Jeff Holt uses Oracle to make its points, but these points apply also to MySQL. The primary lesson I took away from this book is: all else aside, optimize/fix the user-action that provides the most economic benefit to the company; do this by profiling just that action and optimizing/fixing the most time-consuming events even if they are “idle” or “wait” events.

The authors call the aforementioned approach to performance optimization “Method R”. It’s meant to be deterministic and teachable unlike “Method C”–the conventional method–whereby one uses their best judgment and experience to find the cause(s) of problems and fix them. I agree, and Method R is fundamentally, imho, just the scientific method in practice. Therefore, I like Method R because it puts “science” back into “computer science.”

The book also discusses queueing theory. It’s a whirlwind tour (~60 pages) but the authors provide everything you need to get started, including helper scripts and Excel worksheets. I’m pretty sure that I’ll be working with this theory more in my job; when I do, I’ll begin with what the authors have given me (“stand on the shoulders of giants“).

One criticism/clarification comes to mind: Method R is reactive. Let’s say your MySQL configuration is terrible so you’re not getting the most from your server as you could. Method R may indirectly expose this only if the configuration is the root cause of a slow user-action. So the configuration is only examined and fixed if the user deems their action unacceptably slow. However, users don’t always complain; sometimes they just “live with it” because they don’t care or they don’t think it can be fixed or they’re afraid to complain or it’s always been that way so they’re not even aware that things could be better. Thus, I think a more holistic view of performance optimization requires both a proactive method and a reactive method. Method R is a great reactive method, but someone should be checking stuff even when there doesn’t seem to be a problem. The authors don’t say “Method R is all you ever need to do”–I’m just making a clarification here.

Oracle extended SQL traces are used throughout the book to investigate performance issues. Does MySQL have anything similar? Nothing as cohesive comes to my mind (correct me if I’m wrong). I think we can achieve the same thing via microslow logs, the community PROFILE feature, session status values, and scripts to glue it all together. That’s a lot a of disparate pieces. I’d rather have MySQL extended SQL traces (in a format more easily parsable than Oracle’s).

In summary, Optimizing Oracle Performance is a must-read for any database professional. I think its emphases on providing the business the most “bang for its buck” and the deterministic nature of Method R are timeless and timely lessons for those of us who earn our livings by engaging in the science of computing.

Written by Daniel Nichter

January 6th, 2010 at 11:52 am

8 Responses to 'Book review: Optimizing Oracle Peformance'

Subscribe to comments with RSS or TrackBack to 'Book review: Optimizing Oracle Peformance'.

  1. > Does MySQL have anything similar?

    We’re on the way. The next version of MySQL
    has instrumentation, including timing and
    counting, for some mutex and disk input/output.
    That’s comparable to certain Oracle Wait Interface
    tables.

    Blog postings in order of appearance:
    http://blogs.mysql.com/peterg/2009/02/05/mysql-performance-schema/
    http://blogs.mysql.com/peterg/2009/02/06/mysql-performance-schema-2/
    http://blogs.mysql.com/peterg/2009/02/09/mysql-performance-schema-3/
    http://blogs.mysql.com/peterg/2009/02/10/mysql-performance-schema-4/
    http://blogs.mysql.com/peterg/2009/02/11/mysql-performance-schema-5/
    http://blogs.mysql.com/peterg/2009/02/12/mysql-performance-schema-6/
    http://blogs.mysql.com/peterg/2009/02/13/todo-mysql-performance-schema-7/

    Specifications:
    WL#2360: Performance Schema
    http://forge.mysql.com/worklog/task.php?id=2360
    WL#2515: Performance statements
    http://forge.mysql.com/worklog/task.php?id=2515
    WL#3249: SHOW PROCESSLIST should show memory
    http://forge.mysql.com/worklog/task.php?id=3249
    WL#4674: PERFORMANCE_SCHEMA Setup For Actors
    http://forge.mysql.com/worklog/task.php?id=4674
    WL#4678: PERFORMANCE_SCHEMA Instrumenting File IO
    http://forge.mysql.com/worklog/task.php?id=4678
    WL#4813 PERFORMANCE_SCHEMA Instrumenting Stages
    http://forge.mysql.com/worklog/task.php?id=4813
    WL#4816: PERFORMANCE_SCHEMA Summaries
    http://forge.mysql.com/worklog/task.php?id=4816
    WL#4895: PERFORMANCE_SCHEMA Instrumenting Table IO
    http://forge.mysql.com/worklog/task.php?id=4895
    WL#4878: PERFORMANCE_SCHEMA Trace
    http://forge.mysql.com/worklog/task.php?id=4878
    WL#4896: PERFORMANCE_SCHEMA Instrumenting Net IO
    http://forge.mysql.com/worklog/task.php?id=4896
    WL#2333: SHOW ENGINE … LOCK STATUS
    http://forge.mysql.com/worklog/task.php?id=2333

    Peter Gulutzan

    6 Jan 10 at 1:34 PM

  2. Peter,

    That’s excellent! I’ve not been following MySQL 6.0 so this is pleasant news. Keep up the good work. This is going to open a lot of doors for MySQL performance optimization.

    Daniel Nichter

    6 Jan 10 at 2:20 PM

  3. MySQL will have something equivalent when the performance schema data described by PeterG can generate a trace of events in timestamp order so we can see where a connection ran and waited while executing a query. And the trace must be done for code above the storage engine and code in the storage engine.

    InnoDB/Oracle has yet to comment on whether they will instrument their code.

    Mark Callaghan

    6 Jan 10 at 3:07 PM

  4. Hi Daniel,

    I’m also now reading the book. I like it, but I want to further support your observation that it is reactive.
    I’ve also left notes on Baron’s post: http://www.xaprb.com/blog/2009/12/09/response-time-optimization-in-systems-that-are-queued/, where I try to convey a point discussing with Fernando.

    If I understand correctly from my reading so far, method R will not require me to do *anything* if I’m observing a relaxed MySQL machine a customer has asked me to review.
    Perhaps I can see 50% of the queries are using full table scans, and queries EXPLAIN says “using join buffer” or other “atrocities”. According to method R, dealing with these is working by “hunch” and is inappropriate.
    I disagree. I do not consider this a hunch. IT is a fundamental check up.

    When you build a building, you must have the foundations and pillars set up straight. If they’re not, and you only have 2 floors, all me be well. But counting on adding another 20 floors is a different matter.
    So even while you have only 2 floors, and all is well, it is proper and important to do what basic fixed would make the foundations stronger.

    In a sense, working by Method R only (and again, I’m still reading through the book, I hope I’m not commenting prematurely), is a “recipe for disaster”. The author tells us companies only call up on his hotsos company when all hell breaks loose and they’re in emergency. Method R probably works best then.
    But how (if!) will method R work when a customer is intelligent and careful enough to call up on consultant advice while still developing the application? Before it’s even online? To have the consultant share his opinion: “Is this a good design?”. So, sure, load tests and all; but this still leaves territory untouched by Method R.

    My 2 cents

    Regards

    Shlomi Noach

    6 Jan 10 at 9:50 PM

  5. We’ve been working a lot over the past five years or so on the “reactive vs. preventative” issue. Granted, the book doesn’t really say anything about the subject of problem prevention, so it’s left to the imagination of the reader how to tackle problems in that domain.

    But in reality, the only reliable way I know to prevent performance problems is to measure your code prior to going production with it. Doing it any other way—like trying to guess where your code is probably slow—just doesn’t work. It’s the same today as Knuth observed several decades ago.

    So, the only step that changes in Method R when you need a preventive context is step 1, the targeting step. We actually list at our Method R web site as “Identify the task that’s the most important to you.” Note the subtle change here from what you see on p20 of the book: “Select the user action for which the business needs improved performance.”

    This subtle change makes it easier to see the relevance of Method R in the union of reactive and preventive contexts. For example, the definition of important to you might be…

    …a task that’s giving your business a performance problem today. That’s what the book talks about.

    …a business-critical task that’s not executing slowly right now, but if it were to slow down, it would damage your business. You need to measure tasks like this periodically and engage your problem repair skills any time you notice a negative trend.

    …a task that’s not business-critical, but that has a long run-time. This kind of task is important, not matter what its business priority is, because it has the most potential for getting in the way of everything else.

    …a task that consumes a lot of a resource that you’re trying to conserve. Most often in my experience, this resource is CPU, because that’s the most expensive resource in the machine. (A lot of my customers have the biggest, fastest CPUs that humans can make; to upgrade from there, they have to await new inventions, which is an expensive kind of wait, both in time and, ultimately, money.)

    …a task that you’re writing today for production use sometime in the future. In this context, Method R is simply measuring to find your “critical code” in the Knuth sense.

    —Cary Millsap

    Cary Millsap

    7 Jan 10 at 11:38 AM

  6. Hi Cary,
    Many thanks.
    Though, even as you suggest, it would be impossible to actually measure the performance of that task which is most important to the customer, since real-life load tests may not be achieved; it is therefore back to estimations, is it not?
    But I’m not talking about estimation per se. I’m referring to basic, reasonable table_query design. Of course, that query which only runs twice per day, is not that important. And of course, that query which is used extensively throughout the day should be taken care of. And I see this as the very basic question a consultant should ask the customer.
    If I understand correctly your comment, it’s basically the same thing.
    Only in preactive mode it cannot be truely measured, so it’s not technically deterministic and correct.

    Again, I enjoy reading through your book!

    Regards

    Shlomi Noach

    7 Jan 10 at 12:49 PM

  7. Shlomi,

    I agree with your idea that it’s difficult to create an environment in which you’ll foresee all your production problems in advance of actually encountering those problems in production. But I think the following points are important:

    …You’ll catch a lot more problems if you try to catch them prior to production than if you don’t even try. I think a lot of people don’t even try because they feel it would be futile to execute any plan that’s provably imperfect from the outset!

    …You’ll never catch all your problems in pre-production testing. That’s why you need a reliable and fast method for solving the problems that leak through your pre-production testing processes.

    Somewhere in the middle between “no testing” and “full production emulation” is the *right* amount of testing.

    My main point about Method R, though, is that *profiling* is an absolutely essential part of the software life cycle: in production, yes, but also long before then, early in the development of the software too. That, in fact, (and this was Knuth’s point) is where profiling is the *most* important.

    —Cary

    Cary Millsap

    7 Jan 10 at 1:16 PM

  8. Hi Cary,

    100% agree to all of the above

    Shlomi

    Shlomi Noach

    7 Jan 10 at 10:37 PM

Leave a Reply