This might sound like a strong opinion, but I’m just going to put it out there: Python should make
tzinfo mandatory on all
To be fair, that’s just an overzealous suggestion prompted by my frustration after spending two full days debugging timestamp misbehaviors. There are plenty of practical reasons to keep timezone-agnostic
datetimes around. Some projects will never need timestamp localization, and requiring them to use
tzinfo everywhere will only needlessly complicate things. However, if you think you might ever need to deal with timezones in your application, then you must plan to deal with them from the start. My real proposition is that a team should assess its needs and set internal standards regarding the use of timestamps before beginning a project. That’s more reasonable, I think.
If you’re handling timestamps in Python, chances are you are using its standard
datetime class. The
datetime honestly has a pretty great feature set: it lets you do arithmetic with dates, stringify dates, etc.; pretty much anything you need to do with a date,
datetime will do for you. However, lots of problems arise when you use “naive”
datetime objects, i.e.,
datetimes without any timezone awareness.
Python 2.x had a similar problem differentiating between different types of strings. It’s a long story, but essentially whether a string contained binary or text, it was still a string. People who knew what they were doing with strings didn’t have a problem, but it was far from idiot-proof. In fact, you didn’t really need to be an idiot to fall into the trap – just naive. This caused lots of problems, so eventually Python 3.x decided to make
bytes into totally different things.
Naivety is also detrimental in the use of the
datetime. The only place where it works as intended, without a hassle, is in an application where you never have to do any kind of localization or timezone conversion. Once you start trying to convert naive
datetimes between timezones, you’ll find that you’ve been shot in the foot. My personal opinion is that footguns should not exist, or at least not in the standard libraries of high-level languages like Python.
I wish I could seriously propose that Python eliminate the naive
datetime, but this would only cause problems. Naive
datetimes are great, since they don’t ever require you to look at a timezone database (
tzdb). Once you start dealing with timezones, you have to worry about the
tzdb being up to date. If you don’t have complete control over the environment your code is running in, then you can expect inconsistent behavior between users. Whether this is a problem depends on the nature of your project, and I’m not about to enumerate all the possibilities — you can weigh the consequences yourself.
In short, I propose that anyone starting a new project should decide – at its very beginning – what to do with timestamps. In most cases, I think that naive
datetimes should be avoided altogether — explicit timezone information (
tzinfo) should be included absolutely anywhere
datetimes are used. You should use naive
datetimes only if you will never need to convert between timezones, you can’t trust users to have an up-to-date
tzdb, and having inconsistent tzdbs between users would likely create other problems.
Unfortunately, I didn’t have the foresight to disallow naive
datetimes in my project at its inception; therefore, I ran into a problem two years down the road at which point I had to do a lot of refactoring. The remainder of this article details the problems I encountered and the subsequent process of eliminating all naive
datetimes from my codebase.
When I first started using
datetimes I didn’t know any better. I simply called
datetime.now() whenever I needed a timestamp. At that time (no pun intended), my app was only displaying times for a single timezone. Eventually, I realized that I should be converting timestamps to users’ local timezones, and my naivety came back to bite me in the ass.
If you didn’t know already,
datetime.now() gives you the current time in your local timezone. However, it does not have this timezone information attached by default: it gives you a naive
I tried to convert one of these naive
datetimes using the
pytz library (which handles timezone magic):
>>> import pytz >>> from datetime import datetime >>> now = datetime.now() >>> now datetime.datetime(2017, 1, 14, 15, 15, 11, 475618) >>> pytz.timezone("America/New_York").localize(now) datetime.datetime(2017, 1, 14, 15, 15, 11, 475618, tzinfo=<DstTzInfo 'America/New_York' EST-1 day, 19:00:00 STD>) >>> pytz.timezone("Australia/Sydney").localize(now) datetime.datetime(2017, 1, 14, 15, 15, 11, 475618, tzinfo=<DstTzInfo 'Australia/Sydney' AEDT+11:00:00 DST>)
Note that my local timezone is MST; however, the
datetime has no idea about this and therefore doesn’t actually do any conversion when I ask for another timezone. All of the
datetimes it returned are the same, except for their attached
My first idea was to inject my local timezone into all the naive
>>> now = datetime.now(pytz.timezone("America/Denver")) >>> now datetime.datetime(2017, 1, 14, 15, 20, 8, 410761, tzinfo=<DstTzInfo 'America/Denver' MST-1 day, 17:00:00 STD>) >>> pytz.timezone("America/New_York").normalize(now) datetime.datetime(2017, 1, 14, 17, 20, 8, 410761, tzinfo=<DstTzInfo 'America/New_York' EST-1 day, 19:00:00 STD>) >>> pytz.timezone("Australia/Sydney").normalize(now) datetime.datetime(2017, 1, 15, 9, 20, 8, 410761, tzinfo=<DstTzInfo 'Australia/Sydney' AEDT+11:00:00 DST>)
Now the conversion works. However, there are also lots of places in my codebase where I’m accepting or returning Unix timestamps. If you don’t know, Unix timestamps are always UTC. If you don’t ask otherwise,
datetime will convert them into local time for you, again without a
>>> import time >>> unixtime = time.time() >>> unixtime 1484432537.234377 >>> datetime.fromtimestamp(unixtime) datetime.datetime(2017, 1, 14, 15, 22, 17, 234377)
This isn’t so bad, we can fix it mostly the same way as the
>>> datetime.fromtimestamp(unixtime, pytz.timezone("America/Denver")) datetime.datetime(2017, 1, 14, 15, 22, 17, 234377, tzinfo=<DstTzInfo 'America/Denver' MST-1 day, 17:00:00 STD>)
You can even convert it to another timezone:
>>> datetime.fromtimestamp(unixtime, pytz.timezone("America/New_York")) datetime.datetime(2017, 1, 14, 17, 22, 17, 234377, tzinfo=<DstTzInfo 'America/New_York' EST-1 day, 19:00:00 STD>)
But what if we want to convert a localized
datetime into a Unix timestamp? If you’re familiar with the C
strftime API, you’ll be tempted to use
>>> datetime.now().strftime("%s") '1484432862'
That time, we got the correct result. But watch this:
>>> t = time.time() >>> datetime.fromtimestamp(time.time(), pytz.timezone("America/Denver")).strftime("%s") '1484433034' >>> datetime.fromtimestamp(time.time(), pytz.timezone("America/New_York")).strftime("%s") '1484440239'
What’s going on here? We created a single Unix timestamp (
t), and converted it to two separate
datetimes in two different timezones. We already know that conversion from Unix time into any timezone works correctly. We should have gotten the same result when we converted back. However, it turns out that you can only convert a
datetime to a Unix timestamp if it is in your local timezone.
strftime("%s") is unsupported in Python. It ends up just stripping the
tzinfo, thereby creating a naive timestamp in an arbitrary timezone, and calling the C
strftime which assumes it’s being given a local timestamp. Obviously this doesn’t work.
Now how do you create a Unix timestamp the correct way? It’s ugly:
>>> t = datetime.now(pytz.timezone("America/Denver")) >>> (t - datetime(1970, 1, 1, tzinfo=pytz.UTC)).total_seconds() 1484433334.448718
In short, you need to take a timezone-aware
datetime, subtract the Unix epoch from it (thereby obtaining a
timedelta), and convert it to seconds. Luckily for us, any arithmetic done with timezone-aware
datetimes is automatically converted to UTC.
Fortunately, it fails if you pass it a naive
>>> t = datetime.now() >>> (t - datetime(1970, 1, 1, tzinfo=pytz.UTC)).total_seconds() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can't subtract offset-naive and offset-aware datetimes
Unfortunately, I’m sure a lot of beginners are still going to get screwed, since the most popular StackOverflow answers for this situation give you incorrect solutions like the following:
>>> t = datetime.now() >>> (t - datetime(1970, 1, 1)).total_seconds() 1484408399.491469
It doesn’t fail, since both timestamps are naive. However, the result is wrong: since I used my local time, the result is the number of seconds since 1970-1-1 in my timezone, rather than in UTC.
Upon discovering how difficult it is to do anything nontrivial with timestamps correctly, I decided to eliminate naive
datetimes from my codebase altogether and standardize an API for doing common tasks with timezone-aware
datetimes. This would help prevent other contributors to my project from shooting themselves in the foot (and by extension, shooting me).
timehelper class I created is meant to be used any time you want to:
- Get the current time,
- Localize and format a timestamp,
- Parse a Unix timestamp, or
- Create a Unix timestamp.
Any use of the builtin
datetime functions to do these things will now result in a failed code review, because they’re all nearly impossible to get right.
timehelper itself is very simple:
import pytz, psycopg2 from datetime import datetime class timehelper(object): @staticmethod def localize_and_format(tz, fmt, dt): # disallow naive datetimes if dt.tzinfo is None: raise ValueError("Passed datetime object has no tzinfo") # workaround for psycopg2 tzinfo if isinstance(dt.tzinfo, psycopg2.tz.FixedOffsetTimezone): dt.tzinfo._utcoffset = dt.tzinfo._offset return pytz.timezone(tz).normalize(dt).strftime(fmt) @staticmethod def now(): return datetime.utcnow().replace(tzinfo=pytz.UTC) @staticmethod def to_posix(dt): return (dt - datetime(1970, 1, 1, tzinfo=pytz.UTC)).total_seconds() @staticmethod def from_posix(p): return datetime.fromtimestamp(p, pytz.UTC)
Its usage is simple, too:
- Instead of calling
datetime.now(), just call
timehelper.now(). You’ll automatically be given a timezone-aware UTC
datetime. The goal of this is to use UTC everywhere within the codebase.
- To convert from a Unix timestamp to a UTC
- To convert from a
datetimeto a Unix timestamp, use
- To localize a timestamp to a timezone and format it at the same time, use
timehelper.localize_and_format(). I decided to always localize and format together in order to help enforce the goal of using UTC everywhere.
You might notice that there’s some special magic in the
localize_and_format() method for dealing with
tzinfo objects created by
psycopg2. For some reason, its API has a slight mismatch against that of
pytz. If you aren’t using
psycopg2, you can strip out that
if statement. But if you are, make sure all the timestamp-containing columns in PostgreSQL are declared as
timestamp with time zone, rather than simply
timestamp. This is another footgun; traditionally, Postgres used timezones implicitly, but this was reverted in order to comply with SQL standards.
It took me several hours of research to figure out how to properly deal with timestamps in Python. Its
datetime API is full of gotchas, and a naive developer can easily succumb to its apathy. It turns out that I had many subtle bugs in my codebase before I revisited all code pertaining to timestamps.
As it’s unlikely that naive
datetimes will ever actually be removed from Python, I recommend that everyone create standards for
datetime manipulation within their projects. Doing so may prevent tricky bugs and large rewrites later on.
If you happen to stumble upon this article in your own search for
datetime incantations, feel free to use my above
timehelper class. Consider it public domain.