-
Notifications
You must be signed in to change notification settings - Fork 0
Python 2 to 3 Transition
By now if you've run any lately-patched Python 2.7 interpreter or associated utilities, you may have seen this admonition:
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date.
January 1st 2020 is coming sooner than we know it, and it's important to move our Python-based applications to Python 3 sooner than later in order to leverage the (no longer new) features of Python 3 but also to take advantage of the latest security updates and patches to the Python ecosystem.
Note: Python 3 has been available since December 3rd, 2008, so we are really far behind.
Aside from gaining access to continued support, bug fixes, and security patches, we actually do gain a number of interesting new features by migrating to Python 3. This section briefly highlights these features. (Of course, calling these "new" is a misnomer.)
Python 2 makes it too easy to pass unintended arguments to a keyword-named function definition:
>>> def unintended(x, y, doNotPass=False): ... if doNotPass: ... raise NotImplementedError("Don't pass true!") ... else: ... return x * y >>> unintended(2, 3) 6 >>> unintended(2, 3, 4) Traceback (most recent call last): File "", line 1, in NotImplementedError: Don't pass true!
Now we can avoid the problem:
>>> def unintended(x, y, *, doNotPass=False): ... if doNotPass: ... raise NotImplementedError("Don't pass true!") ... else: ... return x * y >>> unintended(2, 3) 6 >>> unintended(2, 3, 4) Traceback (most recent call last): File "", line 1, in TypeError: unintended() takes 2 positional arguments but 3 were given
Before, in Python 2, we would do this:
def jpl_rename(original, newname): try: os.rename(original, newname + '_jpl') except OSError: raise NotImplementedError('Cannot rename')
The problem is that the original traceback from the OSError is lost:
>>> jpl_rename('123', '/etc/456') Traceback (most recent call last): File "", line 1, in File "", line 5, in jpl_rename NotImplementedError: cannot rename
But in Python 3 we get far more useful output:
>>> jpl_rename('123', '/etc/456') Traceback (most recent call last): File "", line 3, in jpl_rename File "/usr/lib/python3.5/os.py", line 250, in rename PermissionError: [Errno 13] Permission denied: '/etc/456_jpl' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "", line 5, in jpl_rename NotImplementedError: cannot rename
Note in the above we captured OSError, but in Python 3 a PermissionError was raised. PermissionError is a new subclass of OSError that preserves the errno variable. In Python 2, we would have to make this ugly code:
def jpl_rename(original, newname): try: os.rename(original, newname + '_jpl') except OSError as ex: if e.errno in (errno.EPERM, errno.EACCES): raise NotImplementedError('cannot rename: permission denied') else: raise
In Python 3 it's a lot more obvious:
def jpl_rename(original, newname): try: os.rename(original, newname + '_jpl') except PermissionError: raise NotImplementedError('cannot rename: permission denied')
See all the exceptions, new and old.
Python 2 supports this:
>>> x, y = range(2) >>> x 0 >>> y 1
Python 3 supports this:
>>> x, y, *remaining = range(5) >>> x 0 >>> y 1 >>> remaining [2, 3, 4] >>> x, *remaining, y = range(5) >>> x 0 >>> remaining [1, 2, 3] >>> y 4
This works with any generator.
Speaking of generators, let's look at their close cousin: iterators. In Python 2, we naively do this:
def naive(n): sum = 0 for i in range(n + 1): sum += i return sum
Then profile the code using the timeit module:
% python2.7 -m timeit 'from n import naive; naive(1000000)' 10 loops, best of 3: 33.9 msec per loop % python2.7 -m timeit 'from n import naive; naive(10000000)' 10 loops, best of 3: 353 msec per loop % python2.7 -m timeit 'from n import naive; naive(100000000)' 10 loops, best of 3: 3.88 sec per loop % python2.7 -m timeit 'from n import naive; naive(1000000000)' Traceback (most recent call last): … OSError: [Errno 28] No space left on device
Python 2 has some ugly workarounds, including (but not limited to):
- dict.itervalues
- itertools
- itertools.zip
- xrange
In Python 3, the workaround are no longer needed. The implementations of dict.values, map, zip, and range use memory efficiently. (Thanks to Aaron Meurer for pointing this out.)
Speaking of generators, you might have written code like this in Python 2 to turn a parser into a generator:
for (subject, predicate, object) in rdf_parser.parse_statements(): yield (subject, predicate, object)
Now you can simply do this in Python 3:
yield from rdf_parser.parse_statements()
and get tuples generated from a single line. This also lets you do coroutines easily.
Speaking of coroutines, the "yield from" statement saves state between generations, enabling coroutine and other "magic" in Python 3.
For more details, see the asyncio module which is part of the Python 3 standard library.
Chances are you won't even need this, but it's there for some advanced I/O including: coroutines and tasks, streams, synchronization, subprocesses, and queues.
Speaking of the standard library, Python 3 includes the following modules:
- enum. Enumerated types, at last.
- faulthandler. By doing "import faulthandler; faulthandler.enable()", you can get tracebacks even when the Python interpreter crashes, even if you "kill -9" it. And if you can't modify the code that's causing your interpreter to crash, just run it with "python3.7 -X faulthandler…".
- functools.lru_cache. A decorator that adds a least-recently-used cache to any function. It even supports post-mortem analysis.
- ipaddress. Construct IPv4 and complex v6 addresses from simple strings.
In Python 2, you can get away with strange logic like these:
>>> 'python' > 456789 True >>> None > max False >>> sorted(('4', 5, '6')) [5, '4', '6']
Whereas in Python 3 we get:
>>> 'python' > 456789
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'str' and 'int'
>>> None > max
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'NoneType' and 'builtin_function_or_method'
>>> sorted(('4', 5, '6'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'int' and 'str'
Nice.
Here are a few more of the "fun" features in Python 3.
Function annotations let you identify what something takes and returns:
def max_of_ints(a: int, b: int) -> int: return a if a > b else b
Currently these are just documentary (they go into an annotations dict).
The new operator is "@" (yes the same used for decorators), but you can still use "dot"; you can override "@" by defining "matmul" in your class.
Filesystem Paths
Python 2 makes handling paths overly wordy with the "os.path" module. Python 3 provides the "pathlib" module and its "Path" object, which overrides the "/" operator and makes tests like "exists" and "is_dir" and transformations like "absolute" and "chmod" built-in to objects of "Path":
>>> from pathlib import Path >>> etc = Path('/etc') >>> passwd = etc / 'passwd' >>> if passwd.exists(): ... hack(passwd)
Unicode Cleanup
Unicode and byte-strings were bolted onto Python 2. In Python 3 things are cleaner:
- In Python 2:
- "str" is often used for character strings but act like byte strings.
- "unicode" is strictly a character string.
- In Python 3:
- "str" is a character string, Unicode or otherwise. The "unicode" type is gone.
- "bytes" are byte strings.
Python 3 gives you Unicode support for identifiers:
>>> seven = shtatë = 七 = 7
That's English, Albanian, and Chinese Han for 7, by the way. But you still can't use emoji
Finally, the print statement is no more. Use the print function instead:
>>> print('Hello, world.') Hello, world. >>> print('Hello, stderr.', file=sys.stderr) Hello, stderr.
We lose a statement, we gain a statement. Saying "nonlocal x" lets you assign to "x" which is in an outer (but not global) scope.
We've loved list comprehensions since Python 2.0. Now we can create dicts on the fly:
d = {k: v, for k, v in stuff}
Before:
s = set((1, 2, 3))
Now:
s = {1, 2, 3}
Just like real mathematical sets. Note that "{}" is not an empty set but an empty dictionary. You still have to say "set()" for the empty set.
"But what about set comprehensions?" Yes, we get:
s = {i for i in stuff}
complete with transformations and conditions you can attach to the syntax just as with list (and dict) comprehensions.
Sorry, Chris Mattmann. You have to use "!=" exclusively now :)
Copyright © 2019–2021 California Institute of Technology.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.