Skip to content

Commit

Permalink
API: rewrite fetching functions in script download
Browse files Browse the repository at this point in the history
"API" change for module `download`. The module
`download` is not considered part of the API of
the package `dd`.


- API: rewrite `download.fetch()`:

  - API: rename parameter to `filename` (was named `fname`)

  - API: do not return any value (was returning the filename)

  - DOC: add docstring to `fetch()`

  - UI: print more detailed messages

  - BUG: catch a `None` that can be returned by the
    function `urllib.request.urlopen()` in rare circumstances.
    Quoting the [documentation][urlopen]:

    > Note that `None` may be returned if no handler handles the
    > request (though the default installed global `OpenerDirector`
    > uses `UnknownHandler` to ensure this never happens).

  - BUG: call the `close()` method of the
    [`http.client.HTTPResponse`][http_response] instance that
    is returned from the function [`urllib.request.urlopen()`][urlopen]
    Do so using a `with` statement,
    which [is supported by `HTTPResponse` objects][http_with]
    read [examples][howto_urllib2].

    In order to handle `URLError` exceptions separately from
    local-file related exceptions, `urllib.request.urlopen()` is called
    within a `try` statement, and the response is later used in
    a `with` statement, within which the method
    [`HTTPResponse.read()`][http_read] is called.

    The `HTTPResponse` and opened file are used as
    two context managers within a single `with` statement,
    by writing two [`with_item`s][with_item].

  - UI: catch `urllib.error.URLError` and chain it with
    a `RuntimeError` that points to relevant documentation.
    [PEP 3134](https://www.python.org/dev/peps/pep-3134/)
    introduced exception chaining.

    Exception chaining [happens automatically within `except` sections](
        https://docs.python.org/3/tutorial/errors.html#exception-chaining),
    but the message differs from explicit exception chaining
    (i.e., `raise RuntimeError('...') from url_error`).
    This is why explicit exception chaining has been used.

  - API: check if CUDD tarball already downloaded,
    and with expected hash. If yes, then do not
    re-download.

    NOTE: if hash found different, raise an error,
    instead of re-downloading.

- REF: extract part of function `download.fetch()` as the
  new function `download._assert_sha()` (which checks
  the SHA-256, and raises a more detailed exception message)


## Writing to a file before checking the hash

Note that first writing the downloaded data to a file, and then reading
the file into a `bytes` object, to check the hash value could be avoided,
by instead using the `bytes` object returned by the method
`HTTPResponse.read()` to check the hash, and then write the `bytes` object
to a file.

Nonetheless, first writing to a file, then reading from the file to check
the hash facilitates diagnosing the causes of errors. For example,
if the hash does not match, or any other exception is raised in
Python code, the downloaded data has been already written to disk.


## `ConnectionError` upon reading

Note that the method `HTTPResponse.read()` can raise
a [`ConnectionError`][connection_error]. This is not expected to
happen in the script `download`, because `read()` is called almost
immediately after the `HTTPResponse` is created. More details
are described next.

From experimenting outside the script `download`,
a [`ConnectionResetError`][connetion_reset_error] is observed
when a relatively long time interval ensues between the call to
the function `urllib.request.urlopen()`, and the call to
`HTTPResponse.read()` of the `HTTPResponse` object that has been
returned by `urlopen()`.


## About importing `urllib.request`

Note that `import urllib` does not import `urllib.request`.
Within `ipython`, `urllib.request` *is* imported upon startup.

[http_response]: https://docs.python.org/3/library/http.client.html#http.client.HTTPResponse
[http_with]: https://docs.python.org/3/library/http.client.html#httpresponse-objects
[http_read]: https://docs.python.org/3/library/http.client.html#http.client.HTTPResponse.read
[urlopen]: https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen
[howto_urllib2]: https://docs.python.org/3/howto/urllib2.html
[connection_error]: https://docs.python.org/3/library/exceptions.html#ConnectionError
[connetion_reset_error]: https://docs.python.org/3/library/exceptions.html#ConnectionResetError
[with_item]: https://docs.python.org/3/reference/compound_stmts.html#the-with-statement
  • Loading branch information
johnyf committed Nov 27, 2023
1 parent 12aeec7 commit 8fc3193
Showing 1 changed file with 179 additions and 29 deletions.
208 changes: 179 additions & 29 deletions download.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
"""Retrieve and build dependencies of C extensions."""
import collections.abc as _abc
import ctypes
import hashlib
import os
import shutil
import subprocess
import sys
import tarfile
import textwrap as _tw
import typing as _ty
import urllib.error
import urllib.request


Expand Down Expand Up @@ -176,40 +180,184 @@ def _copy_extern_licenses(args):
os.remove(included)


def _join(paths):
"""Return `list` of paths, after joining each.
def _join(
paths:
_abc.Iterable[
_abc.Iterable[str]]
) -> list[str]:
"""Return paths, after joining each.
@param paths: container of pieces of paths,
each path is obtained by joining its pieces
using `os.path.join`
@type paths: `list` of `list` of `str`
@return: `list` of paths
@rtype: `list` of `str`
Flattens a list-of-lists to a list.
"""
return [os.path.join(*x) for x in paths]


def fetch(url, sha256, fname=None):
print(f'++ download: {url}')
u = urllib.request.urlopen(url)
if fname is None:
fname = CUDD_TARBALL
with open(fname, 'wb') as f:
f.write(u.read())
with open(fname, 'rb') as f:
s = f.read()
h = hashlib.sha256(s)
def fetch(
url:
str,
sha256:
str,
filename:
str
) -> None:
"""Download file from `url`, and check its hashes.
@param sha256:
SHA-256 hash value of file that
will be downloaded
"""
if os.path.isfile(filename):
print(
f'File `{filename}` already present, '
'checking hash.')
_check_file_hash(filename, sha256)
return
print(f'Attempting to download file from URL: {url}')
try:
response = urllib.request.urlopen(url)
if response is None:
raise urllib.error.URLError(
'`urllib.request.urlopen` returned `None` '
'when attempting to open the URL: '
f'{url}')
except urllib.error.URLError as url_error:
raise RuntimeError(_tw.dedent(f'''
An exception was raised when attempting
to open the URL:
{url}
In case the error message from `urllib` is
about SSL certificates, please confirm that
your installation of Python has the required
SSL certificates. How to ensure this can differ,
depending on how Python is installed
(building from source or using an installer).
CPython's `--with-openssl` (of `configure`)
is relevant when building CPython from source.
When using an installer of CPython, a separate
post-installation step may be needed,
as described in CPython's documentation.
Relevant information:
<https://www.python.org/downloads/>
For downloading CUDD, an alternative is to
download by other means the file at the URL:
{url}
unpack it, and then run:
```python
import download
download.make_cudd()
```
Once CUDD compilation has completed, run:
```
export DD_CUDD=1 DD_CUDD_ZDD=1;
pip install .
```
i.e., without the option `DD_FETCH`.
''')) from url_error
with response, open(filename, 'wb') as f:
f.write(response.read())
print(
'Completed downloading from URL '
'(may have resulted from redirection): '
f'{response.url}\n'
'Wrote the downloaded data to file: '
f'`{filename}`\n'
'Will now check the hash value (SHA-256) of '
f'the file: `{filename}`')
_check_file_hash(filename, sha256)


def _check_file_hash(
filename:
str,
sha256:
str
) -> None:
"""Assert `filename` has given hash."""
with open(filename, 'rb') as f:
data = f.read()
_assert_sha(data, sha256, 256, filename)
print(
'Checked hash value (SHA-256) of '
f'file `{filename}`, and is as expected.')


def _assert_sha(
data:
bytes,
expected_sha_value:
str,
algo:
_ty.Literal[
256,
512],
filename:
str |
None=None
) -> None:
"""Assert `data` hash is `expected_sha_value`.
If the hash of `data`, as computed using the algorithm
specified in `algo`, is not `expected_sha_value`,
then raise an `AssertionError`.
The hash value is computed using the functions:
- `hashlib.sha256()` if `algo == 256`
- `hashlib.sha512()` if `algo == 512`
@param data:
bytes, to compute the hash of them
(as accepted by `hashlib.sha512()`)
@param expected_sha_value:
hash value (SHA-256 or SHA-512),
must correspond to `algo`
@param algo:
hashing algorithm
@param filename:
name of file whose hash
is being checked, optional argument,
if present then it will be used
in message of the `AssertionError`
"""
match algo:
case 256:
h = hashlib.sha256(data)
case 512:
h = hashlib.sha512(data)
case _:
raise ValueError(
f'unknown algorithm: {algo = }')
x = h.hexdigest()
if x != sha256:
raise AssertionError((x, sha256))
print('-- done downloading.')
return fname
if x == expected_sha_value:
return
if filename is None:
fs = ''
else:
fs = f'`{filename}` '
raise AssertionError(
f'The computed SHA-{algo} hash value '
f'of the downloaded file {fs}does not match '
'the expected hash value.'
f'\nComputed SHA-{algo}: {x}'
f'\nExpected SHA-{algo}: {expected_sha_value}')


def untar(fname):
"""Extract contents of tar file `fname`."""
print(f'++ unpack: {fname}')
with tarfile.open(fname) as tar:
def untar(
filename:
str
) -> None:
"""Extract contents of tar file `filename`."""
print(f'++ unpack: {filename}')
with tarfile.open(filename) as tar:
tar.extractall()
print('-- done unpacking.')

Expand All @@ -222,10 +370,12 @@ def make_cudd():
subprocess.call(['make', '-j4'], cwd=path)


def fetch_cudd():
def fetch_cudd(
) -> None:
"""Retrieve, unpack, patch, and compile CUDD."""
fname = fetch(CUDD_URL, CUDD_SHA256)
untar(fname)
filename = CUDD_TARBALL
fetch(CUDD_URL, CUDD_SHA256, filename)
untar(filename)
make_cudd()


Expand Down

0 comments on commit 8fc3193

Please sign in to comment.