API: rewrite fetching functions in script download

"API" change for module `download`. The module `download` is not considered part of the API of the package `dd`. - API: rewrite `download.fetch()`: - API: rename parameter to `filename` (was named `fname`) - API: do not return any value (was returning the filename) - DOC: add docstring to `fetch()` - UI: print more detailed messages - BUG: catch a `None` that can be returned by the function `urllib.request.urlopen()` in rare circumstances. Quoting the [documentation][urlopen]: > Note that `None` may be returned if no handler handles the > request (though the default installed global `OpenerDirector` > uses `UnknownHandler` to ensure this never happens). - BUG: call the `close()` method of the [`http.client.HTTPResponse`][http_response] instance that is returned from the function [`urllib.request.urlopen()`][urlopen] Do so using a `with` statement, which [is supported by `HTTPResponse` objects][http_with] read [examples][howto_urllib2]. In order to handle `URLError` exceptions separately from local-file related exceptions, `urllib.request.urlopen()` is called within a `try` statement, and the response is later used in a `with` statement, within which the method [`HTTPResponse.read()`][http_read] is called. The `HTTPResponse` and opened file are used as two context managers within a single `with` statement, by writing two [`with_item`s][with_item]. - UI: catch `urllib.error.URLError` and chain it with a `RuntimeError` that points to relevant documentation. [PEP 3134](https://www.python.org/dev/peps/pep-3134/) introduced exception chaining. Exception chaining [happens automatically within `except` sections]( https://docs.python.org/3/tutorial/errors.html#exception-chaining), but the message differs from explicit exception chaining (i.e., `raise RuntimeError('...') from url_error`). This is why explicit exception chaining has been used. - API: check if CUDD tarball already downloaded, and with expected hash. If yes, then do not re-download. NOTE: if hash found different, raise an error, instead of re-downloading. - REF: extract part of function `download.fetch()` as the new function `download._assert_sha()` (which checks the SHA-256, and raises a more detailed exception message) ## Writing to a file before checking the hash Note that first writing the downloaded data to a file, and then reading the file into a `bytes` object, to check the hash value could be avoided, by instead using the `bytes` object returned by the method `HTTPResponse.read()` to check the hash, and then write the `bytes` object to a file. Nonetheless, first writing to a file, then reading from the file to check the hash facilitates diagnosing the causes of errors. For example, if the hash does not match, or any other exception is raised in Python code, the downloaded data has been already written to disk. ## `ConnectionError` upon reading Note that the method `HTTPResponse.read()` can raise a [`ConnectionError`][connection_error]. This is not expected to happen in the script `download`, because `read()` is called almost immediately after the `HTTPResponse` is created. More details are described next. From experimenting outside the script `download`, a [`ConnectionResetError`][connetion_reset_error] is observed when a relatively long time interval ensues between the call to the function `urllib.request.urlopen()`, and the call to `HTTPResponse.read()` of the `HTTPResponse` object that has been returned by `urlopen()`. ## About importing `urllib.request` Note that `import urllib` does not import `urllib.request`. Within `ipython`, `urllib.request` *is* imported upon startup. [http_response]: https://docs.python.org/3/library/http.client.html#http.client.HTTPResponse [http_with]: https://docs.python.org/3/library/http.client.html#httpresponse-objects [http_read]: https://docs.python.org/3/library/http.client.html#http.client.HTTPResponse.read [urlopen]: https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen [howto_urllib2]: https://docs.python.org/3/howto/urllib2.html [connection_error]: https://docs.python.org/3/library/exceptions.html#ConnectionError [connetion_reset_error]: https://docs.python.org/3/library/exceptions.html#ConnectionResetError [with_item]: https://docs.python.org/3/reference/compound_stmts.html#the-with-statement
tulip-control · Nov 27, 2023 · 8fc3193 · 8fc3193
1 parent 12aeec7
commit 8fc3193
Showing 1 changed file with 179 additions and 29 deletions.
diff --git a/download.py b/download.py
@@ -1,11 +1,15 @@
 """Retrieve and build dependencies of C extensions."""
+import collections.abc as _abc
 import ctypes
 import hashlib
 import os
 import shutil
 import subprocess
 import sys
 import tarfile
+import textwrap as _tw
+import typing as _ty
+import urllib.error
 import urllib.request
 
 
@@ -176,40 +180,184 @@ def _copy_extern_licenses(args):
             os.remove(included)
 
 
-def _join(paths):
-    """Return `list` of paths, after joining each.
+def _join(
+        paths:
+            _abc.Iterable[
+                _abc.Iterable[str]]
+        ) -> list[str]:
+    """Return paths, after joining each.
 
-    @param paths: container of pieces of paths,
-        each path is obtained by joining its pieces
-        using `os.path.join`
-    @type paths: `list` of `list` of `str`
-    @return: `list` of paths
-    @rtype: `list` of `str`
+    Flattens a list-of-lists to a list.
     """
     return [os.path.join(*x) for x in paths]
 
 
-def fetch(url, sha256, fname=None):
-    print(f'++ download: {url}')
-    u = urllib.request.urlopen(url)
-    if fname is None:
-        fname = CUDD_TARBALL
-    with open(fname, 'wb') as f:
-        f.write(u.read())
-    with open(fname, 'rb') as f:
-        s = f.read()
-    h = hashlib.sha256(s)
+def fetch(
+        url:
+            str,
+        sha256:
+            str,
+        filename:
+            str
+        ) -> None:
+    """Download file from `url`, and check its hashes.
+
+    @param sha256:
+        SHA-256 hash value of file that
+        will be downloaded
+    """
+    if os.path.isfile(filename):
+        print(
+            f'File `{filename}` already present, '
+            'checking hash.')
+        _check_file_hash(filename, sha256)
+        return
+    print(f'Attempting to download file from URL:  {url}')
+    try:
+        response = urllib.request.urlopen(url)
+        if response is None:
+            raise urllib.error.URLError(
+                '`urllib.request.urlopen` returned `None` '
+                'when attempting to open the URL:  '
+                f'{url}')
+    except urllib.error.URLError as url_error:
+        raise RuntimeError(_tw.dedent(f'''
+            An exception was raised when attempting
+            to open the URL:
+                {url}
+
+            In case the error message from `urllib` is
+            about SSL certificates, please confirm that
+            your installation of Python has the required
+            SSL certificates. How to ensure this can differ,
+            depending on how Python is installed
+            (building from source or using an installer).
+
+            CPython's `--with-openssl` (of `configure`)
+            is relevant when building CPython from source.
+
+            When using an installer of CPython, a separate
+            post-installation step may be needed,
+            as described in CPython's documentation.
+
+            Relevant information:
+                <https://www.python.org/downloads/>
+
+            For downloading CUDD, an alternative is to
+            download by other means the file at the URL:
+                {url}
+            unpack it, and then run:
+
+            ```python
+            import download
+
+            download.make_cudd()
+            ```
+
+            Once CUDD compilation has completed, run:
+
+            ```
+            export DD_CUDD=1 DD_CUDD_ZDD=1;
+            pip install .
+            ```
+
+            i.e., without the option `DD_FETCH`.
+            ''')) from url_error
+    with response, open(filename, 'wb') as f:
+        f.write(response.read())
+    print(
+        'Completed downloading from URL '
+        '(may have resulted from redirection):  '
+        f'{response.url}\n'
+        'Wrote the downloaded data to file:  '
+        f'`{filename}`\n'
+        'Will now check the hash value (SHA-256) of '
+        f'the file:  `{filename}`')
+    _check_file_hash(filename, sha256)
+
+
+def _check_file_hash(
+        filename:
+            str,
+        sha256:
+            str
+        ) -> None:
+    """Assert `filename` has given hash."""
+    with open(filename, 'rb') as f:
+        data = f.read()
+    _assert_sha(data, sha256, 256, filename)
+    print(
+        'Checked hash value (SHA-256) of '
+        f'file `{filename}`, and is as expected.')
+
+
+def _assert_sha(
+        data:
+            bytes,
+        expected_sha_value:
+            str,
+        algo:
+            _ty.Literal[
+                256,
+                512],
+        filename:
+            str |
+            None=None
+        ) -> None:
+    """Assert `data` hash is `expected_sha_value`.
+
+    If the hash of `data`, as computed using the algorithm
+    specified in `algo`, is not `expected_sha_value`,
+    then raise an `AssertionError`.
+
+    The hash value is computed using the functions:
+    - `hashlib.sha256()` if `algo == 256`
+    - `hashlib.sha512()` if `algo == 512`
+
+    @param data:
+        bytes, to compute the hash of them
+        (as accepted by `hashlib.sha512()`)
+    @param expected_sha_value:
+        hash value (SHA-256 or SHA-512),
+        must correspond to `algo`
+    @param algo:
+        hashing algorithm
+    @param filename:
+        name of file whose hash
+        is being checked, optional argument,
+        if present then it will be used
+        in message of the `AssertionError`
+    """
+    match algo:
+        case 256:
+            h = hashlib.sha256(data)
+        case 512:
+            h = hashlib.sha512(data)
+        case _:
+            raise ValueError(
+                f'unknown algorithm:  {algo = }')
     x = h.hexdigest()
-    if x != sha256:
-        raise AssertionError((x, sha256))
-    print('-- done downloading.')
-    return fname
+    if x == expected_sha_value:
+        return
+    if filename is None:
+        fs = ''
+    else:
+        fs = f'`{filename}` '
+    raise AssertionError(
+        f'The computed SHA-{algo} hash value '
+        f'of the downloaded file {fs}does not match '
+        'the expected hash value.'
+        f'\nComputed SHA-{algo}:  {x}'
+        f'\nExpected SHA-{algo}:  {expected_sha_value}')
 
 
-def untar(fname):
-    """Extract contents of tar file `fname`."""
-    print(f'++ unpack: {fname}')
-    with tarfile.open(fname) as tar:
+def untar(
+        filename:
+            str
+        ) -> None:
+    """Extract contents of tar file `filename`."""
+    print(f'++ unpack: {filename}')
+    with tarfile.open(filename) as tar:
         tar.extractall()
     print('-- done unpacking.')
 
@@ -222,10 +370,12 @@ def make_cudd():
     subprocess.call(['make', '-j4'], cwd=path)
 
 
-def fetch_cudd():
+def fetch_cudd(
+        ) -> None:
     """Retrieve, unpack, patch, and compile CUDD."""
-    fname = fetch(CUDD_URL, CUDD_SHA256)
-    untar(fname)
+    filename = CUDD_TARBALL
+    fetch(CUDD_URL, CUDD_SHA256, filename)
+    untar(filename)
     make_cudd()