Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check download #7

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Check download #7

wants to merge 5 commits into from

Conversation

MclTTI
Copy link
Contributor

@MclTTI MclTTI commented Oct 2, 2024

Added completeness check for downloaded files in the year_retrieve function of the CDS_retrieve module.

@MclTTI
Copy link
Contributor Author

MclTTI commented Oct 2, 2024

Pending tests...

CDS_retriever.py Outdated Show resolved Hide resolved
@MclTTI
Copy link
Contributor Author

MclTTI commented Oct 13, 2024

This solution does not seem to be working as expected.

According to the log files, the condition at line 132 is not being met, even when a file download fails (see the error log below). Despite the failure, the check for file completeness does not seem to catch the error.

Error log

Process Process-30:
Traceback (most recent call last):
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/urllib3/response.py", line 737, in _error_catcher
    yield
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/urllib3/response.py", line 883, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(594358144 bytes read, 930845456 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/requests/models.py", line 820, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/urllib3/response.py", line 1043, in stream
    data = self.read(amt=amt, decode_content=decode_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/urllib3/response.py", line 963, in read
    data = self._raw_read(amt)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/urllib3/response.py", line 861, in _raw_read
    with self._error_catcher():
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/urllib3/response.py", line 761, in _error_catcher
    raise ProtocolError(arg, e) from e
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(594358144 bytes read, 930845456 more expected)', IncompleteRead(594358144 bytes read, 930845456 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/iotti/projects/CDS-retriever/CDS_retriever.py", line 135, in year_retrieve
    else:
      ^^^
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/cads_api_client/legacy_api_client.py", line 178, in retrieve
    return submitted if target is None else submitted.download(target)
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/cads_api_client/legacy_api_client.py", line 147, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/cads_api_client/processing.py", line 641, in download
    multiurl.download(
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/multiurl/downloader.py", line 111, in download
    return Downloader(url, **kwargs).download(target)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/multiurl/base.py", line 129, in download
    total = self.transfer(f, pbar)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/multiurl/http.py", line 125, in transfer
    for chunk in stream(chunk_size=self.chunk_size):
  File "/home/iotti/opt/miniconda3/envs/CDS_retrieve/lib/python3.12/site-packages/requests/models.py", line 822, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(594358144 bytes read, 930845456 more expected)', IncompleteRead(594358144 bytes read, 930845456 more expected))

@MclTTI
Copy link
Contributor Author

MclTTI commented Oct 20, 2024

I have modified the block to check downloaded files, introducing specific exception handling for the IncompleteRead exception that may be raised during execution (commit 0346d0f). This exception should trigger another download attempt, up to a maximum number of attempts. If the maximum is reached, the exception should be passed to the main function to block its execution (see #12)

@MclTTI
Copy link
Contributor Author

MclTTI commented Oct 21, 2024

The last solution (along with #12) seems to be working well. However, I was unable to capture the specific exception raised by urllib.

Screenshot from 2024-10-21 11-02-47

@oloapinivad
Copy link
Owner

Testing this one extensively is very hard, if you had some positive feedback I think we can merge

@MclTTI
Copy link
Contributor Author

MclTTI commented Oct 24, 2024

On my side, this implementation works as expected (see the last log above).
If you’d like, you can introduce an error in the try block within the download section of 'year_retrieve.'
This should trigger the general except clause.
Unfortunately, I couldn’t capture the exact exception that occurred during execution when the download failed (I tried to capture an 'urllib3.exceptions.IncompleteRead' exception, which is what the cdsapi raises during a failed download; see the first log above).

Nevertheless, the code handles a general exception, restarting the download a fixed number of times, after which another exception is raised (see the class definition at line 14).

The key point is that this last exception raised by 'year_retrieve' should be reported to the calling function (see #12 ) and should block the execution of the main program.

@oloapinivad
Copy link
Owner

Sorry for my processing time! I can merge this if you agree!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants