Skip to content
This repository has been archived by the owner on Sep 23, 2023. It is now read-only.

qiime-deploy behind proxy #57

Open
phuseman opened this issue Jan 27, 2014 · 6 comments
Open

qiime-deploy behind proxy #57

phuseman opened this issue Jan 27, 2014 · 6 comments

Comments

@phuseman
Copy link

In my work environment I have to use a proxy. This causes some problems downloading software with qiime-deploy.
The environment variables ($http_proxy, $https_proxy, $ftp_proxy,. $all_proxy) are all set appropriately but the download_file() function in lib/util.py fails for some downloads. The following tools are not deployed:

drisee, ea-utils, tornado, pyzmq, setuptools, MySQL-python, pyqi, sphinx, biom-format, emperor, pynast, tax2tree, qiime, qiime-galaxy, galaxy

(It might have something to do with the https:// links?!)

I could replace the Python urllib downloading with a nasty hack to use wget instead but maybe one of you could investigate this and probably fix this.

Another thing: Behind the proxy, I have problems to access git://github... urls. This can be circumvented by using https://github... instead.
Quite automatically this will be done with the following git config:

git config --global url.https://github.com/.insteadOf git://github.com/

However, it would be easier for people if you would put the https urls of github directly in the accroding qiime-deploy-conf files.

Best,
Peter

@phuseman
Copy link
Author

Short follow up:
It seems that the retrieve() function in urllib.URLopener() is not able to download https:// links via proxy:

Python 2.7.5+ (default, Sep 19 2013, 13:48:49) 
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> urllib.getproxies()
{'ftp': 'http://proxy:3128', 'all': 'http://proxy:3128', 'http': 'http://proxy:3128', 'https': 'http://proxy:3128'}
# Proxy is set
>>> test = urllib.URLopener()
>>> test.retrieve("http://www.google.com", "test.html")
('test.html', <httplib.HTTPMessage instance at 0x7f22ed118878>)
# ^^^ normal links do work
>>> test.retrieve("https://www.google.de", "test_https.html")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib.py", line 240, in retrieve
    fp = self.open(url, data)
  File "/usr/lib/python2.7/urllib.py", line 208, in open
    return getattr(self, name)(url)
  File "/usr/lib/python2.7/urllib.py", line 359, in open_http
    return self.http_error(url, fp, errcode, errmsg, headers)
  File "/usr/lib/python2.7/urllib.py", line 376, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "/usr/lib/python2.7/urllib.py", line 381, in http_error_default
    raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', 501, 'Not Implemented', <httplib.HTTPMessage instance at 0x7f22eae8bfc8>)
# ^^^ https links are not implemented

@antgonza
Copy link
Member

In the past when I have encountered this problem, I will download the
failing packages to my computer, then start an http server (a couple of
clicks in Mac), change the config file to look for those packages in my
computer and deploy; not pretty but a solution. Anyway, agree that there
should be a better way to handle this ...

@phuseman
Copy link
Author

Here is the nasty hack to use wget instead (though this might not be that helpful for mac users)

diff --git a/lib/util.py b/lib/util.py
index 18f515f..9bff11a 100644
--- a/lib/util.py
+++ b/lib/util.py
@@ -462,8 +462,10 @@ def download_file(URL, dest_dir, local_file, num_retries = 4):
     rc = 1
     while download_failed > 0:
         try:
-            tmpLocalFP, headers = url_opener.retrieve(URL, \
-                                                      tmpDownloadFP)
+#            tmpLocalFP, headers = url_opener.retrieve(URL, \
+#                                                      tmpDownloadFP)
+            downlStr = 'wget %s -O %s' % (URL, tmpDownloadFP)
+            (downlStatus, downlOut) = commands.getstatusoutput(downlStr)
             os.rename(tmpDownloadFP, localFP)
             rc = 0
         except IOError, msg:

I am not confident enough in python but might it be possible to use urllib2 or the requests package?
For example like sketched here: http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python

Best,
Peter

@phuseman
Copy link
Author

I found out that it works when using the plain FancyURLopener class. I provided a fix, see my pull request.

Best,
Peter

@phuseman
Copy link
Author

It seems that my fix did not solve the problem, sorry. Maybe a solution using urllib2 or the requests package works better.

Peter

@phuseman phuseman reopened this Jan 29, 2014
@phuseman
Copy link
Author

Ok, I did some more investigation. Downloading with urllib tries to send the following:

GET https://google.com HTTP/1.0
User-Agent: Python-urllib/1.17

The proxy answers:

HTTP/1.0 501 Not Implemented
Server: squid/2.7.STABLE7

Downloading with wget, however sends:

CONNECT google.com:443 HTTP/1.1
User-Agent: Wget/1.14 (linux-gnu)

thus establishing a proper https connection:

HTTP/1.0 200 Connection established

I found a solution involving urllib2 that seems to work better. After testing I will generate a pull request with the fix.

Best,
Peter

phuseman added a commit to phuseman/qiime-deploy that referenced this issue Jan 30, 2014
…sible

Replaced the urllib method of downloading files by code that uses urllib2. It seems that urllib is not able to download files from https urls.
See here qiime#57
phuseman added a commit to phuseman/qiime-deploy that referenced this issue Feb 6, 2014
…sible

Replaced the urllib method of downloading files by code that uses urllib2. It seems that urllib is not able to download files from https urls.
See here qiime#57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants