Skip to content

Python 2 to 3 Transition

Sean Kelly edited this page Apr 17, 2019 · 1 revision

By now if you've run any lately-patched Python 2.7 interpreter or associated utilities, you may have seen this admonition:

DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date.

January 1st 2020 is coming sooner than we know it, and it's important to move our Python-based applications to Python 3 sooner than later in order to leverage the (no longer new) features of Python 3 but also to take advantage of the latest security updates and patches to the Python ecosystem.

Note: Python 3 has been available since December 3rd, 2008, so we are really far behind.

What's in Python 3

Aside from gaining access to continued support, bug fixes, and security patches, we actually do gain a number of interesting new features by migrating to Python 3. This section briefly highlights these features. (Of course, calling these "new" is a misnomer.)

Keyword-Only Arguments

Python 2 makes it too easy to pass unintended arguments to a keyword-named function definition:

>>> def unintended(x, y, doNotPass=False):  
...         if doNotPass:  
...            raise NotImplementedError("Don't pass true!")  
...         else:  
...             return x * y  
>>> unintended(2, 3)  
6  
>>> unintended(2, 3, 4)  
Traceback (most recent call last):  
File "", line 1, in   
NotImplementedError: Don't pass true!

Now we can avoid the problem:

>>> def unintended(x, y, *, doNotPass=False):  
...         if doNotPass:  
...            raise NotImplementedError("Don't pass true!")  
...         else:  
...             return x * y  
>>> unintended(2, 3)  
6  
>>> unintended(2, 3, 4)  
Traceback (most recent call last):  
File "", line 1, in   
TypeError: unintended() takes 2 positional arguments but 3 were given 

Chained Exceptions

Before, in Python 2, we would do this:

def jpl_rename(original, newname):  
    try:  
        os.rename(original, newname + '_jpl')  
    except OSError:  
        raise NotImplementedError('Cannot rename')

The problem is that the original traceback from the OSError is lost:

>>> jpl_rename('123', '/etc/456')  
Traceback (most recent call last):  
File "", line 1, in   
File "", line 5, in jpl_rename  
NotImplementedError: cannot rename

But in Python 3 we get far more useful output:

>>> jpl_rename('123', '/etc/456')  
Traceback (most recent call last):  
File "", line 3, in jpl_rename  
File "/usr/lib/python3.5/os.py", line 250, in rename  
PermissionError: [Errno 13] Permission denied: '/etc/456_jpl'  

During handling of the above exception, another exception occurred:  

Traceback (most recent call last):  
File "", line 1, in   
File "", line 5, in jpl_rename  
NotImplementedError: cannot rename 

Fuller Exception Hierarchy

Note in the above we captured OSError, but in Python 3 a PermissionError was raised. PermissionError is a new subclass of OSError that preserves the errno variable. In Python 2, we would have to make this ugly code:

def jpl_rename(original, newname):  
    try:  
        os.rename(original, newname + '_jpl')  
    except OSError as ex:  
        if e.errno in (errno.EPERM, errno.EACCES):  
            raise NotImplementedError('cannot rename: permission denied')  
        else:  
            raise

In Python 3 it's a lot more obvious:

def jpl_rename(original, newname):  
    try:  
        os.rename(original, newname + '_jpl')  
    except PermissionError:  
        raise NotImplementedError('cannot rename: permission denied')

See all the exceptions, new and old.

Flexible Unpacking

Python 2 supports this:

>>> x, y = range(2)  
>>> x  
0  
>>> y  
1

Python 3 supports this:

>>> x, y, *remaining = range(5)  
>>> x  
0  
>>> y  
1  
>>> remaining  
[2, 3, 4]  
>>> x, *remaining, y = range(5)  
>>> x  
0  
>>> remaining  
[1, 2, 3]  
>>> y  
4

This works with any generator.

Iterators

Speaking of generators, let's look at their close cousin: iterators. In Python 2, we naively do this:

def naive(n):  
    sum = 0  
    for i in range(n + 1):  
        sum += i  
    return sum

Then profile the code using the timeit module:

% python2.7 -m timeit 'from n import naive; naive(1000000)'  
10 loops, best of 3: 33.9 msec per loop  
% python2.7 -m timeit 'from n import naive; naive(10000000)'  
10 loops, best of 3: 353 msec per loop  
% python2.7 -m timeit 'from n import naive; naive(100000000)'  
10 loops, best of 3: 3.88 sec per loop  
% python2.7 -m timeit 'from n import naive; naive(1000000000)'  
Traceback (most recent call last):  
…  
OSError: [Errno 28] No space left on device

Python 2 has some ugly workarounds, including (but not limited to):

  • dict.itervalues
  • itertools
  • itertools.zip
  • xrange

In Python 3, the workaround are no longer needed. The implementations of dict.values, map, zip, and range use memory efficiently. (Thanks to Aaron Meurer for pointing this out.)

The "yield from" Statement

Speaking of generators, you might have written code like this in Python 2 to turn a parser into a generator:

for (subject, predicate, object) in rdf_parser.parse_statements():  
    yield (subject, predicate, object)

Now you can simply do this in Python 3:

yield from rdf_parser.parse_statements()

and get tuples generated from a single line. This also lets you do coroutines easily.

Asynchronous I/O

Speaking of coroutines, the "yield from" statement saves state between generations, enabling coroutine and other "magic" in Python 3.

For more details, see the asyncio module which is part of the Python 3 standard library.

Chances are you won't even need this, but it's there for some advanced I/O including: coroutines and tasks, streams, synchronization, subprocesses, and queues.

New Features in the Standard Library

Speaking of the standard library, Python 3 includes the following modules:

  • enum. Enumerated types, at last.
  • faulthandler. By doing "import faulthandler; faulthandler.enable()", you can get tracebacks even when the Python interpreter crashes, even if you "kill -9" it. And if you can't modify the code that's causing your interpreter to crash, just run it with "python3.7 -X faulthandler…".
  • functools.lru_cache. A decorator that adds a least-recently-used cache to any function. It even supports post-mortem analysis.
  • ipaddress. Construct IPv4 and complex v6 addresses from simple strings.

Types That Shouldn't Be Sorted

In Python 2, you can get away with strange logic like these:

>>> 'python' > 456789  
True  
>>> None > max  
False  
>>> sorted(('4', 5, '6'))  
[5, '4', '6']

Whereas in Python 3 we get:

>>> 'python' > 456789  
Traceback (most recent call last):  
File "<stdin>", line 1, in <module>  
TypeError: '>' not supported between instances of 'str' and 'int'  
>>> None > max  
Traceback (most recent call last):  
File "<stdin>", line 1, in <module>  
TypeError: '>' not supported between instances of 'NoneType' and 'builtin_function_or_method'  
>>> sorted(('4', 5, '6'))  
Traceback (most recent call last):  
File "<stdin>", line 1, in <module>  
TypeError: '<' not supported between instances of 'int' and 'str'

Nice. 

Miscellaneous Features

Here are a few more of the "fun" features in Python 3.

Function Annotations

Function annotations let you identify what something takes and returns:

def max_of_ints(a: int, b: int) -> int:  
    return a if a > b else b

Currently these are just documentary (they go into an annotations dict). 

Matrix Multiplication

The new operator is "@" (yes the same used for decorators), but you can still use "dot"; you can override "@" by defining "matmul" in your class.

Filesystem Paths

Python 2 makes handling paths overly wordy with the "os.path" module.  Python 3 provides the "pathlib" module and its "Path" object, which overrides the "/" operator and makes tests like "exists" and "is_dir" and transformations like "absolute" and "chmod" built-in to objects of "Path":

>>> from pathlib import Path  
>>> etc = Path('/etc')  
>>> passwd = etc / 'passwd'  
>>> if passwd.exists():  
...    hack(passwd) 

Unicode Cleanup

Unicode and byte-strings were bolted onto Python 2. In Python 3 things are cleaner:

  • In Python 2:
    • "str" is often used for character strings but act like byte strings.
    • "unicode" is strictly a character string.
  • In Python 3:
    • "str" is a character string, Unicode or otherwise. The "unicode" type is gone.
    • "bytes" are byte strings.

Unicode Identifiers

Python 3 gives you Unicode support for identifiers:

>>> seven = shtatë = 七 = 7

That's English, Albanian, and Chinese Han for 7, by the way. But you still can't use emoji  ☹️.

Print Function

Finally, the print statement is no more. Use the print function instead:

>>> print('Hello, world.')  
Hello, world.  
>>> print('Hello, stderr.', file=sys.stderr)  
Hello, stderr.

The "nonlocal" Statement

We lose a statement, we gain a statement. Saying "nonlocal x" lets you assign to "x" which is in an outer (but not global) scope.

Dictionary Comprehensions

We've loved list comprehensions since Python 2.0. Now we can create dicts on the fly:

d = {k: v, for k, v in stuff}

Set Literals

Before:

s = set((1, 2, 3))

Now:

s = {1, 2, 3}

Just like real mathematical sets. Note that "{}" is not an empty set but an empty dictionary. You still have to say "set()" for the empty set.

"But what about set comprehensions?" Yes, we get:

s = {i for i in stuff}

complete with transformations and conditions you can attach to the syntax just as with list (and dict) comprehensions.

The "<>" is Gone

Sorry, Chris Mattmann. You have to use "!=" exclusively now :)