Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle timeouts #171

Open
mathieu-clement opened this issue Feb 11, 2015 · 6 comments
Open

Handle timeouts #171

mathieu-clement opened this issue Feb 11, 2015 · 6 comments

Comments

@mathieu-clement
Copy link

For some reason most of the time my uploads fail due to timeouts:

boto.glacier.exceptions.UnexpectedHTTPResponseError: Expected 204, got (408, code=RequestTimeoutException, message=Request timed out.)

My workaround so far is to do this:

  1. Perform a normal upload, without --resume and --uploadid.
  2. Get the upload ID from glacier-cmd listmultipart
  3. Put the command in a loop:
while true
do
    glacier-cmd --resume --uploadid "D9651-5d4f..." the_other_arguments
    sleep 120
done

This way when the upload timeouts it resumes automatically after 2 minutes.
(you probably want to use large part sizes with that kind of setup)
4. At some point glacier-cmd prints str: Can not resume upload of this data as no existing job with this uploadid could be found. meaning the upload is finished.
5. Press (and hold) Ctrl-C to get out of the loop. (or rewrite the script to detect the "success" message or maybe the return status of the command)

It would be nice if glacier-cmd handled timeouts itself. Glacier-cmd is useless to me without this workaround.
Otherwise, except for trying to print hundreds of columns, from what I have seen it works pretty well.

@wvmarle
Copy link
Contributor

wvmarle commented Feb 12, 2015

This has always been an issue, the problem appears to be on Amazon's side - when actively coding on this project I did several attempts for automatic retries within the code, so you don't see these errors (until it got like five timeouts in a row, indicating another issue). It happens time and again, I have never been able to find a pattern with these timeout errors.

@gburca
Copy link
Contributor

gburca commented May 13, 2015

It seems like the code to retry on HTTP 408 is commented out in the main branch. I've enabled a tweaked version of it in gburca/amazon-glacier-cmd-interface@85ef4aa but I haven't run into the 408's recently so I can't say for sure if it fixes the issue. @tiktaktok, if you want to try the patch, please enable logging. I'd be curious to know what values of "retry" and "total retries" you're seeing.

@hagleyj
Copy link

hagleyj commented May 14, 2015

I just updated to gburca/amazon-glacier-cmd-interface@85ef4aa and I am seeing the same errors still

This is what I see in the console

Traceback (most recent call last):e 8.85 MB/s, average 7.09 MB/s, ETA Tu
File "/usr/bin/glacier-cmd", line 9, in
load_entry_point('glacier==0.2dev', 'console_scripts', 'glacier-cmd')()
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/glacie
r.py", line 929, in main
args.func(args)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/glacie
r.py", line 156, in wrapper
return fn(_args, *_kwargs)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/glacie
r.py", line 309, in upload
args.name, args.partsize, args.uploadid, args.resume)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie
rWrapper.py", line 65, in wrapper
ret = fn(_args, *_kwargs)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie
rWrapper.py", line 232, in glacier_connect_wrap
return func(_args, *_kwargs)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie
rWrapper.py", line 65, in wrapper
ret = fn(_args, *_kwargs)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie
rWrapper.py", line 253, in sdb_connect_wrap
return func(_args, *_kwargs)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie
rWrapper.py", line 65, in wrapper
ret = fn(_args, *_kwargs)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie
rWrapper.py", line 1157, in upload
writer.write(part)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/glacie
rcorecalls.py", line 129, in write
data)
File "/usr/lib/python2.6/site-packages/boto-2.29.1-py2.6.egg/boto/glacier/layer1.py", line 1278, in upload_part
response_headers=response_headers)
File "/usr/lib/python2.6/site-packages/boto-2.29.1-py2.6.egg/boto/glacier/layer1.py", line 118, in make_request
raise UnexpectedHTTPResponseError(ok_responses, response)
boto.glacier.exceptions.UnexpectedHTTPResponseError: Expected 204, got (408, code=RequestTimeoutException, message=Request timed out.)

@pchug
Copy link

pchug commented May 19, 2015

I have encountered this issue quite repeatedly lately. I was able to identify is a simple little "fix" for this yesterday night and I have been uploading gigs of backlog since with no 408 Request timed out messages.

The fix is really a configuration change for boto. Just define a [Boto] section in your environment's configuration file and set the num_retries to some small number. The default value happens to be None, as in, no retries will be performed. See http://docs.pythonboto.org/en/latest/boto_config_tut.html#boto for more information about the configuration file.

I happen to have my own code written to Layer1 of Boto, and this configuration tweak works like a charm.

@AkshivBaluja
Copy link

@pchug - I have tried using number of tries as 10, 15 , 5 , still get the same error. Could you tell the changes you have made that made it work ?

I am not using the amazon-glacier-cmd-interface, but instead a custom script that recursively handles any exception faced in upload parts and resumes from the last uploaded part. With the exception handling it does resume, only to get the 408 Request timed out error again. Once in a blue moon, it starts again, only to get interrupted after a short error free dream run.

I have used the script to upload TBs of data, and rarely did we get this error, when operating in Tokyo, and eu-west, but it is quite frequent in the Frankfurt , eu-central region.

@williamoverton
Copy link

Any workarounds for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants