Handle timeouts #171

mathieu-clement · 2015-02-11T17:05:30Z

For some reason most of the time my uploads fail due to timeouts:

boto.glacier.exceptions.UnexpectedHTTPResponseError: Expected 204, got (408, code=RequestTimeoutException, message=Request timed out.)

My workaround so far is to do this:

Perform a normal upload, without --resume and --uploadid.
Get the upload ID from glacier-cmd listmultipart
Put the command in a loop:

while true
do
    glacier-cmd --resume --uploadid "D9651-5d4f..." the_other_arguments
    sleep 120
done

This way when the upload timeouts it resumes automatically after 2 minutes.
(you probably want to use large part sizes with that kind of setup)
4. At some point glacier-cmd prints str: Can not resume upload of this data as no existing job with this uploadid could be found. meaning the upload is finished.
5. Press (and hold) Ctrl-C to get out of the loop. (or rewrite the script to detect the "success" message or maybe the return status of the command)

It would be nice if glacier-cmd handled timeouts itself. Glacier-cmd is useless to me without this workaround.
Otherwise, except for trying to print hundreds of columns, from what I have seen it works pretty well.

The text was updated successfully, but these errors were encountered:

wvmarle · 2015-02-12T06:10:32Z

This has always been an issue, the problem appears to be on Amazon's side - when actively coding on this project I did several attempts for automatic retries within the code, so you don't see these errors (until it got like five timeouts in a row, indicating another issue). It happens time and again, I have never been able to find a pattern with these timeout errors.

gburca · 2015-05-13T03:26:20Z

It seems like the code to retry on HTTP 408 is commented out in the main branch. I've enabled a tweaked version of it in gburca/amazon-glacier-cmd-interface@85ef4aa but I haven't run into the 408's recently so I can't say for sure if it fixes the issue. @tiktaktok, if you want to try the patch, please enable logging. I'd be curious to know what values of "retry" and "total retries" you're seeing.

hagleyj · 2015-05-14T19:40:22Z

I just updated to gburca/amazon-glacier-cmd-interface@85ef4aa and I am seeing the same errors still

This is what I see in the console

Traceback (most recent call last):e 8.85 MB/s, average 7.09 MB/s, ETA Tu
File "/usr/bin/glacier-cmd", line 9, in
load_entry_point('glacier==0.2dev', 'console_scripts', 'glacier-cmd')()
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/glacie
r.py", line 929, in main
args.func(args)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/glacie
r.py", line 156, in wrapper
return fn(_args, *_kwargs)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/glacie
r.py", line 309, in upload
args.name, args.partsize, args.uploadid, args.resume)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie
rWrapper.py", line 65, in wrapper
ret = fn(_args, *_kwargs)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie
rWrapper.py", line 232, in glacier_connect_wrap
return func(_args, *_kwargs)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie
rWrapper.py", line 65, in wrapper
ret = fn(_args, *_kwargs)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie
rWrapper.py", line 253, in sdb_connect_wrap
return func(_args, *_kwargs)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie
rWrapper.py", line 65, in wrapper
ret = fn(_args, *_kwargs)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/Glacie
rWrapper.py", line 1157, in upload
writer.write(part)
File "/usr/lib/python2.6/site-packages/glacier-0.2dev-py2.6.egg/glacier/glacie
rcorecalls.py", line 129, in write
data)
File "/usr/lib/python2.6/site-packages/boto-2.29.1-py2.6.egg/boto/glacier/layer1.py", line 1278, in upload_part
response_headers=response_headers)
File "/usr/lib/python2.6/site-packages/boto-2.29.1-py2.6.egg/boto/glacier/layer1.py", line 118, in make_request
raise UnexpectedHTTPResponseError(ok_responses, response)
boto.glacier.exceptions.UnexpectedHTTPResponseError: Expected 204, got (408, code=RequestTimeoutException, message=Request timed out.)

pchug · 2015-05-19T00:20:24Z

I have encountered this issue quite repeatedly lately. I was able to identify is a simple little "fix" for this yesterday night and I have been uploading gigs of backlog since with no 408 Request timed out messages.

The fix is really a configuration change for boto. Just define a [Boto] section in your environment's configuration file and set the num_retries to some small number. The default value happens to be None, as in, no retries will be performed. See http://docs.pythonboto.org/en/latest/boto_config_tut.html#boto for more information about the configuration file.

I happen to have my own code written to Layer1 of Boto, and this configuration tweak works like a charm.

AkshivBaluja · 2016-03-10T05:40:18Z

@pchug - I have tried using number of tries as 10, 15 , 5 , still get the same error. Could you tell the changes you have made that made it work ?

I am not using the amazon-glacier-cmd-interface, but instead a custom script that recursively handles any exception faced in upload parts and resumes from the last uploaded part. With the exception handling it does resume, only to get the 408 Request timed out error again. Once in a blue moon, it starts again, only to get interrupted after a short error free dream run.

I have used the script to upload TBs of data, and rarely did we get this error, when operating in Tokyo, and eu-west, but it is quite frequent in the Frankfurt , eu-central region.

williamoverton · 2017-09-11T15:12:51Z

Any workarounds for this?

gburca mentioned this issue May 13, 2015

Upload fails with 408 error #95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle timeouts #171

Handle timeouts #171

mathieu-clement commented Feb 11, 2015

wvmarle commented Feb 12, 2015

gburca commented May 13, 2015

hagleyj commented May 14, 2015

pchug commented May 19, 2015

AkshivBaluja commented Mar 10, 2016

williamoverton commented Sep 11, 2017

Handle timeouts #171

Handle timeouts #171

Comments

mathieu-clement commented Feb 11, 2015

wvmarle commented Feb 12, 2015

gburca commented May 13, 2015

hagleyj commented May 14, 2015

pchug commented May 19, 2015

AkshivBaluja commented Mar 10, 2016

williamoverton commented Sep 11, 2017