-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support setting proxy from environment variables for request session #823
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, but we'll need to discuss whether or not this is an appropriate/safe way to address #501. As I've commented there, you might be able to address the issue by exporting the appropriate env var, as there are cases where they are automatically used, and thus no need to explicitly set them on the session. Specifically, requests
uses urllib
, which recognizes the proxy env vars.
For cases where the proxy env vars are not used, that's because the session's trust_env
property is set to False
, indicating that the env should not be trusted. Therefore, the code in this PR ignores the trust_env
property, which is perhaps not the most secure approach to solving this.
Thanks for informing! I didn't notice the So how about adding a method like |
It's set to
The auth logic in this library has arguably become a bit "messy," so we'll need to make a thorough review of the code to determine a reasonable approach, which might very well be roughly as simple as your suggestion. |
I have double checked the proxy env vars, and it didn't work. I guess the case is that I'm using earthaccess/earthaccess/auth.py Lines 200 to 216 in 4aa1223
|
Can you try creating a |
I tried both My test script import earthaccess
earthaccess.login(strategy="netrc")
session = earthaccess.get_requests_https_session()
print(session.headers)
print(session.trust_env) Outputs
|
Don't bother looking at that for the moment. That may be misleading. Instead, try to do the thing that currently isn't working for you. For example, if you have a script to do something with earthaccess that currently fails due to not making use of the proxy, and the script is named
where HOST and PORT are whatever is appropriate for your proxy. I am able so run a local proxy on my laptop at http://localhost:8080 and I am able to login and download via earthaccess. |
I have a simple download script: import argparse
import earthaccess
import os
def download(date_start, date_end, shortname):
earthaccess.login()
results = earthaccess.search_data(
short_name=shortname,
cloud_hosted=True,
temporal=(date_start, date_end),
)
session = earthaccess.get_requests_https_session()
print("http_proxy:", os.environ["http_proxy"])
print("https_proxy:", os.environ["https_proxy"])
print(session.headers)
print(session.trust_env)
earthaccess.download(results, f"download/{shortname}")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--shortname", type=str)
parser.add_argument("--date-start", type=str)
parser.add_argument("--date-end", type=str)
args = parser.parse_args()
download(args.date_start, args.date_end, args.shortname) And I run with below command on Windows Powershell, however, the download has failed with corrupted files, while $env:HTTP_PROXY="http://127.0.0.1:7890"; $env:HTTPS_PROXY="http://127.0.0.1:7890"; python .\test_scripts.py --shortname GOCI_L2_OC --date-start 2015-01-01 --date-end 2015-01-02 Outputs
However, when I switched to the script provided in #501 (comment), it can download 16 complete files through the same proxy. |
Thank you. That's all helpful information. |
@Isotr0py, I believe this might suffice as a workaround for you for the moment, so please let me know if this works for you. I've tweaked your sample code from above. The basic idea is that I'm setting the proxies on the session obtained from This might be a way around modifying earthaccess code directly, so let me know if it does the trick. import argparse
import os
from functools import cache
from typing import Callable
from typing_extensions import ParamSpec
import earthaccess
import requests
P = ParamSpec("P")
def set_proxies(f: Callable[P, requests.Session]) -> Callable[P, requests.Session]:
def go(*args: P.args, **kwargs: P.kwargs) -> requests.Session:
s = f(*args, **kwargs)
s.proxies.update(
{
k: v
for k in ("HTTP_PROXY", "HTTPS_PROXY")
if (v := os.environ.get(k, os.environ.get(k.casefold())))
}
)
return s
return go
def download(date_start, date_end, shortname):
earthaccess.login()
results = earthaccess.search_data(
short_name=shortname,
cloud_hosted=True,
temporal=(date_start, date_end),
)
auth: earthaccess.Auth = earthaccess.__store__.auth
auth.get_session = cache(set_proxies(auth.get_session))
session = earthaccess.get_requests_https_session()
print("http_proxy:", os.environ.get("http_proxy"))
print("https_proxy:", os.environ.get("https_proxy"))
print(f"{session.proxies=}")
print(session.headers)
print(session.trust_env)
earthaccess.download(results, f"download/{shortname}")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--shortname", type=str)
parser.add_argument("--date-start", type=str)
parser.add_argument("--date-end", type=str)
args = parser.parse_args()
download(args.date_start, args.date_end, args.shortname) |
@chuckwondo The above script didn't work, because the In fact, this script worked well after I modified the def set_proxies(f: Callable[P, requests.Session]) -> Callable[P, requests.Session]:
def go(*args: P.args, **kwargs: P.kwargs) -> requests.Session:
s = f(*args, **kwargs)
# session use 'http' and `https` as proxies dict keys
s.proxies.update(
{
k.split("_")[0].lower(): v
for k in ("HTTP_PROXY", "HTTPS_PROXY")
if (v := os.environ.get(k, os.environ.get(k.casefold())))
}
)
return s
return go Outputs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Isotr0py, the example I provided was for you to use instead of this code change. I'm glad it works for you (with the small tweak you made to it).
We still do not want to make this particular PR change internally because, as I mentioned earlier, this completely ignores cases where the session's trust_env
property is set to False
. We want to respect that setting and instead find a better way to give the caller more control over the session settings themselves.
Please close this PR, as this is a larger issue that the earthaccess team needs to discuss and find a solution for.
Awesome! I'm glad my suggestion (with your minor tweak) worked for you! Personally, I'd probably tweak it like so, but just a personal preference: def set_proxies(f: Callable[P, requests.Session]) -> Callable[P, requests.Session]:
def go(*args: P.args, **kwargs: P.kwargs) -> requests.Session:
s = f(*args, **kwargs)
s.proxies.update(
{
scheme: v
for scheme in ("http", "https")
if (
v := os.environ.get(
k := f"{scheme}_proxy", os.environ.get(k.upper())
)
)
}
)
return s
return go |
Fix #501
Motivation
earthaccess
is a little bit tricky refer to the example code in Allow custom proxy settings with requests sessions #501.Description:
Pull Request (PR) draft checklist - click to expand
contributing documentation
before getting started.
title such as "Add testing details to the contributor section of the README".
Example PRs: #763
example
closes #1
. SeeGitHub docs - Linking a pull request to an issue.
CHANGELOG.md
with details about your change in a section titled## Unreleased
. If such a section does not exist, please create one. FollowCommon Changelog for your additions.
Example PRs: #763
README.md
with details of changes to theearthaccess interface, if any. Consider new environment variables, function names,
decorators, etc.
Click the "Ready for review" button at the bottom of the "Conversation" tab in GitHub
once these requirements are fulfilled. Don't worry if you see any test failures in
GitHub at this point!
Pull Request (PR) merge checklist - click to expand
Please do your best to complete these requirements! If you need help with any of these
requirements, you can ping the
@nsidc/earthaccess-support
team in a comment and wewill help you out!
Request containing "pre-commit.ci autofix" to automate this.
📚 Documentation preview 📚: https://earthaccess--823.org.readthedocs.build/en/823/