Hack MySQL

Archive for the ‘time’ tag

Detecting invalid and zero temporal values

without comments

I’ve been thinking a lot about invalid and zero temporal values and how to detect them with MySQL date and time functions because mk-table-checksum has to handle “everything” correctly and efficiently. The requirements are complex because we have to take into account what MySQL allows to be stored verses what it allows to be used in certain operations and functions, how it sorts a mix of real and invalid temporal values for MIN() and MAX(), how to detect a temporal value as equivalent to zero, and how different MySQL versions might affect any of the aforementioned.

At base, the four guiding requirements are:

  1. Detect and discard invalid time, date, and datetime values
  2. Detect zero-equivalent temporal values
  3. Do #1 and #2 using only MySQL functions
  4. Work in MySQL 4.0 and newer

My tests cases for invalid temporal values are:

  • 00:00:60
  • 00:60:00
  • 999-00-00
  • 999-01-01
  • 0000-00-00
  • 2009-00-00
  • 2009-13-00
  • 999-00-00 00:00:00
  • 999-01-01 00:00:00
  • 0000-00-00 00:00:00
  • 1000-00-00 00:00:00
  • 2009-00-00 00:00:00
  • 2009-13-00 00:00:00
  • 2009-05-26 00:00:60
  • 2009-05-26 00:60:00
  • 2009-05-26 24:00:00

And my test cases for first real temporal values are:

  • 00:00:00
  • 00:00:01
  • 1000-01-01
  • 2009-01-01
  • 1000-01-01 00:00:00
  • 2009-01-01 00:00:00

And there is only one real zero-equivalent temporal value: 00:00:00.

So the first requirement is to find a MySQL function that returns NULL for all those invalid values, and that function is TO_DAYS with one exception:

mysql> SELECT TO_DAYS('999-01-01 00:00:00');
+-------------------------------+
| TO_DAYS('999-01-01 00:00:00') |
+-------------------------------+
|                        364878 | 
+-------------------------------+

That date is only valid if years before 1000 are handled but the MySQL manual says that,

TO_DAYS() is not intended for use with values that precede the advent of the Gregorian calendar (1582)

so we’re already way past the limit of its intended use and, moreover, the supported lower limit of a date or datetime is 1000-01-01, so says the manual. It’s reasonable to not bother with pre-year 1000 dates so I’ll overlook this.

Excepting pre-year 1000 dates, TO_DAYS() returns NULL for all the invalid values. By contrast, UNIX_TIMESTAMP() returns zero for all the invalid values and TIME_TO_SEC() returns a mix of NULL, zero, and values. So the apparent winner for requirement #1 is TO_DAYS(), but…

Requirement #2 complicates the issue because the time 00:00:00 is valid and zero-equivalent but TO_DAYS() returns NULL for it. We need a hack that handles all the cases, and here it is:

SELECT IF(TIME_FORMAT(?,'%H:%i:%s')=?, TIME_TO_SEC(?), TO_DAYS(?))

That says, basically: if the value is a time then evaluate it with TIME_TO_SEC(), else evaluate it with TO_DAYS(). It works so well in fact that it satisfies all four requirements. 00:00:00 evaluates to zero, all the invalid values evaluate to NULL, and all the valid values evaluate to various non-null values. I have to use TIME_FORMAT() instead of just TIME() because TIME() wasn’t introduced until MySQL v4.1 (fourth requirement).

The hack works because of this (substituting TIME() for TIME_FORMAT()):

mysql> SELECT TIME('00:00:00');
+------------------+
| TIME('00:00:00') |
+------------------+
| 00:00:00         | 
+------------------+

mysql> SELECT TIME('00-00-00');
+------------------+
| TIME('00-00-00') |
+------------------+
| 00:00:00         | 
+------------------+

mysql> SELECT TIME('2010-05-26');
+--------------------+
| TIME('2010-05-26') |
+--------------------+
| 00:20:10           | 
+--------------------+

mysql> SELECT TIME('2010-05-26 10:10:10');
+-----------------------------+
| TIME('2010-05-26 10:10:10') |
+-----------------------------+
| 10:10:10                    | 
+-----------------------------+

As you can see, TIME() (or TIME_FORMAT()) returns the exact same value if the given value is a time, otherwise it interprets the value–which is a date or datetime–as a time causing it to return a different value than the given value. Thus we discern time values from date and datetime values and evaluate them separately with TIME_TO_SEC().

I tested on MySQL v4.0, 4.1, 5.0 and 5.1 and all pass. The only difference is 4.0 verses the others for the pre-year 1000 dates, but I’m ignoring these anyway.

Of course all the preceding could have been accomplished in code by looking at the column type and choosing the correct MySQL function to evaluate the value and check if it’s zero-equivalent, but I was curious to see if it could be done using only MySQL since, after all, it is MySQL that permits these silly, invalid temporals values.

If you know a simpler, more elegant solution that meets the four requirements and passes all the tests, please share!

Written by Daniel Nichter

May 26th, 2010 at 5:48 pm