I hope the days of hating on Python 3 are gone and we all agree that it is the way forward. We should switch to it sooner or later, and I hope that happens sooner. That is why I try to use Python 3 on all my projects. Unfortunately, some code bases are stuck with Python 2 due to some third party library dependencies. The situation has improved a lot over the past couple of years -- even Django is almost there with a Python 3 port now in beta. Yet there's the occasional library that hasn't been brought over to Python 3 and it drags the whole project behind. What should we do then? Shrug it off and keep pounding at our keyboards creating more unportable Python 2 code and thus making the situation even worse? Fortunately, there is a better way -- craft your Python 2 code so that it looks a lot like Python 3 and is easy to port.

A Glimpse of Hope

So, I was reading the latest Python 2 vs. Python 3 debate on Reddit last week... Usually those discussions quickly deteriorate into flame wars, but, surprisingly, that one didn't. A user brought up the idea of coding Python 2 as it was Python 3 by enabling as much of the __future__ imports as possible. It turned out many people did exactly that, and the group reached consensus about the most productive set of future imports. Here, bask in its full glory:

1
from __future__ import print_function, division, absolute_import, unicode_literals

Simple, and yet a pretty significant change. I've been using it for all my new code for the past two weeks or so and I am pretty happy with it. I even turned it into a Vim snippet, so that I can easily insert it in new files. Let's go over each option and see what it does.

print_function

This is probably the difference between the two versions of the language that will ruin your life most often. A lot of code is sprinkled with print statements and unfortunately those have different syntax in both pythons. In Python 2, print is a statement:

1
print "Hello"

Whereas in Python 3 it is a function:

1
print("Hello")

Adding the print_function future import will make the latter work in Python 2. There, problem solved.

division

Python 2 is a very traditional language when it comes to integer division. And I mean it in the most annoying way that will trip every novice. Here's how:

>>> 3 / 2
1

While making a lot of sense if you know how the machine deals with integer arithmetics, it isn't too convenient. That is why Python 3 has made it easier:

>>> 3 / 2
1.5

You can get the same division behavior by using the division future import. Oh, and to get the old truncating division behavior, use the // operator.

absolute_import

This one fixes an annoying problem with Python 2 imports. Imagine that you have a module in the foo.bar.baz package and it uses this import to get to the top-level module:

1
import foo

That works fine until somebody sticks a foo.py file below the baz subfolders. Now Python can't figure out which of the two modules you mean and will import the inner one. Python 3 fixes that problem by making imports absolute by default and requiring that you explicitly specify relative imports with the from . import foo syntax. And that is what the absolute_import future option does in Python 2.

unicode_literals

This one is superficially very easy to explain and understand. And it also leads to tons of small obstacles that will break your code. C'est la vie! By enabling unicode_literals in Python 2, you get the Python 3 behavior of turning all "normal" single- or double-quote strings into Unicode strings. That is:

1
"Hello, world!"

Is now equivalent to:

1
u"Hello, world!"

If you want your old-school 8-bit "strings" back, you need to use the binary prefix: b"Hello, world!". And that's all about the unicode_literals option. Oh, and, by the way, it opens Pandora's box:

  1. You will most certainly get 8-bit data from an API while assuming it's a string. BOOM!
  2. And you will try to pass Unicode string data to code that expects 8-bit binary. BANG!

What is the way out of that mess? What has been working for me is to stop thinking about binary data as having any string-like features at all. In fact, many other languages have it just like that, for example the byte arrays on the JVM or the CLR. I even made it a point to stop using the "string" term at all when meaning 8-bit binary data to avoid any confusion.

Now I am taking extra care when reading from files, sockets and the outside world and treat all my binary data accordingly. Usually, transforming it into a string is a manner of decoding it with the right encoding. Say I'm reading UTF-8 data off a socket. To turn it into a string, I need a line like:

1
binary_blob.decode("utf-8")

And to write a Unicode string back as binary, requires that I encode it first:

1
unicode_text.encode("utf-8")

That's all it usually takes. I still get caught by surprise when working with some unknown API, but I've gotten better at guessing where we need strings and where - binary.

Exception Handling Blocks

Here's a bonus one that doesn't need a special __future__ import -- handling exceptions. The oldschool Python syntax:

1
2
3
4
try:
    int("ooops!")
except ValueError, e:
    print("Bad value: %s" % e)

is invalid in Python 3. There we need an as keyword instead of the comma:

1
2
3
4
try:
    int("ooops!")
except ValueError as e:
    print("Bad value: %s" % e)

Luckily the Python 3 syntax works on the latest Python 2 releases, so there is no reason to use the older one anymore.

Is That All?

So, if I use the above options and change my coding style, my code will magically get forward compatible with Python 3? Well, I won't lie to you and tell you that everything will just work when you bring it to Python 3. You'll likely hit some other problems, but they should certainly be solvable. The most important thing is that the bulk of your code should be Python 3 ready and moving to the new version of the language will not be such an unthinkable task.

We also didn't talk about some important tools and resources that will help you when porting your code to Python 3. In my opinion the most important ones are 2to3 and six. 2to3 is a tool that will try to fix incompatible Python 2 idioms into valid Python 3. six is a compatibility layer that is helpful if you need to maintain code that runs both on Python 2 and Python 3 and needs to use packages that have been renamed or changed. Both 2to3 and six need their own blog posts, so we'll cover them separately. Stay tuned.