How should we store future time in non UTC in a database and still be performant?

An example of a practical approach on how to store data in both local time, and a normalised time (UTC), and still be performant, but counter the timezone shift.

How should we store future time in non UTC in a database and still be performant?

This is right on the heels of my previous article about the UTC thing:

Saying “well, store the time in local timezone, then!” is all good and well, until you get to additional, and fun, questions, such as:

  • but how will I query for all time events within a one hour slot across all timezones?
  • how do I figure out from the database, which event is earlier if I don’t have the same frame of reference of UTC time?
  • how can I get all events for a single day?
  • how will I update datetime values if a DST rule changes?

And they are ALL super valid questions, and for the purposes of “killing performance”, I’m going to assume the database has a few hundred million entries.

Of course, you could store all things in a TIMESTAMP field, and call it a day, however that still has a lot of issues. See https://javorszky.co.uk/2016/06/06/today-i-learned-about-mysql-and-timezones/ for that. Plus the added problem of “let’s not store future events in UTC”.

So here’s what I would do

Store three pieces of information:

Timezone table structure as at 12th November, 2018. Last column, ”Notes“, omitted. Columns from left to right: CC, Coordinates, TZ, Comments, Format, UTC offset, UTC DST offset.
  • and the UTC equivalent of those two. This is a helper column, and will need to be tended to.

Having all three pieces of data means I can query by the UTC data, which I know will be approximate. To counter that, if I want anything for a specific day, I can add the previous and next days to the UTC restriction, and then refine the data once I got them back from the database. It’s a lot easier to individually check a tiny subset of data than all of them.

For example if the data looks like this

id | local_time          | tz_string       | utc_time
---|---------------------|-----------------|--------------------
1  | 2018-11-12 14:12:02 | Europe/London   | 2018-11-12 14:12:02
2  | 2018-11-12 15:12:02 | Europe/Budapest | 2018-11-12 14:12:02

I can use utc_time like this:

SELECT * FROM table WHERE utc_time > "2018-11-12 00:00:00" AND utc_time <= "2018-11-12 23:59:59"

Then, if I suspect that the UTC and the local times might have shifted, I can use local_time and tz_string to create a new DateTime object, and get the correct UTC back, and compare it with what I have.

$datetime = new DateTime( $local_time, new DateTimeZone( $tz_string );
$datetime->setTimeZone( 'UTC' );
if ( $utc_time === $datetime->format( 'Y-m-d H:i:s' ) ) {
    // utc time was correct
} else {
    // $utc_time = $datetime->format( 'Y-m-d H:i:s' );
}

Another approach

If you have a queue driver on your web application, you can schedule a recurring job to fetch all entries by local time that are scheduled for a day in the future (or 2 days, or whatever), and then do the above, and refresh the UTC times for them.

It’s a lot less likely that the UTC / local times will point at two different times in the future a day or two out as opposed to half a year out.

That way you can make the assumption that the UTC data is correct, and can architect your application in that case, because you’ve made reasonable steps that timezone shifts are accounted for.

Of course, your use case might permit deviations, or you might be comfortable with time being off.

As with most things in programming the answer is “it depends”.

Photo by Davide Cantelli on Unsplash