Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread-local Session Management and Cookie Reuse to Address EDL DSE issue #909

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

hailiangzhang
Copy link

@hailiangzhang hailiangzhang commented Dec 21, 2024

This pull request addresses a DSE issue reported by Earthdata Login (EDL) during data download requests involving large amount of granules through EarthAccess. One issue was traced to EarthAccess creating a new session for each download instead of reusing the existing session cookie, which is intended to reduce the number of requests to EDL by the DAAC data distribution endpoint (TEA). This PR introduces changes to enhance session cookie handling within each thread, avoiding EDL requests from TEA in subsequent authenticated data downloads.

Key modifications in this PR include:

  • Thread-local Session Management: Added _clone_session_in_local_thread method to clone and store sessions in thread-local storage.
  • Cookie Reuse in Downloads: Adapted _download_file method to utilize thread-local session cookies and reduce redundant session creation within each thread.
  • Session Handling: Updated get_requests_session to retrieve an existing session or raise an error if none exists.

Notes:

  • This PR does not address the cookie reuse issue for same-region access. I can create a separate PR for that later.
  • I did not add or adjust unit tests because the current test coverage is limited and does not include the modified methods.
  • I have issues running integration tests, so I am uncertain about their status with these changes.
  • I didn't adjust the following PR template since I am not if my PR is fully applicable to it. Please advise if adjustments or compliance with the template are required.
Pull Request (PR) draft checklist - click to expand
  • Please review our
    contributing documentation
    before getting started.
  • Populate a descriptive title. For example, instead of "Updated README.md", use a
    title such as "Add testing details to the contributor section of the README".
    Example PRs: #763
  • Populate the body of the pull request with:
  • Update CHANGELOG.md with details about your change in a section titled
    ## Unreleased. If such a section does not exist, please create one. Follow
    Common Changelog for your additions.
    Example PRs: #763
  • Update the documentation and/or the README.md with details of changes to the
    earthaccess interface, if any. Consider new environment variables, function names,
    decorators, etc.

Click the "Ready for review" button at the bottom of the "Conversation" tab in GitHub
once these requirements are fulfilled. Don't worry if you see any test failures in
GitHub at this point!

Pull Request (PR) merge checklist - click to expand

Please do your best to complete these requirements! If you need help with any of these
requirements, you can ping the @nsidc/earthaccess-support team in a comment and we
will help you out!

  • Add unit tests for any new features.
  • Apply formatting and linting autofixes. You can add a GitHub comment in this Pull
    Request containing "pre-commit.ci autofix" to automate this.
  • Ensure all automated PR checks (seen at the bottom of the "conversation" tab) pass.
  • Get at least one approving review.

📚 Documentation preview 📚: https://earthaccess--909.org.readthedocs.build/en/909/

Copy link

Binder 👈 Launch a binder notebook on this branch for commit 9930d84

I will automatically update this comment whenever this PR is modified

@hailiangzhang
Copy link
Author

hailiangzhang commented Dec 23, 2024

Following my earlier note, This PR does not address the cookie reuse issue for same-region access. I can create a separate PR for that later., after looking closer into the code, for same-region direct S3 access, it appears we are not using multi-threading, and authentication occurs only once. Therefore, it should not trigger my EDL DSE (unless I am overlooking something).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants