Skip to content

Commit

Permalink
replace cookie approach with netrc in amsr
Browse files Browse the repository at this point in the history
  • Loading branch information
srinijammula committed Dec 14, 2024
1 parent 74a217e commit 0eecc93
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 48 deletions.
73 changes: 35 additions & 38 deletions book/chapters/amsr.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -172,13 +172,11 @@
"id": "9268fe3d",
"metadata": {},
"source": [
"## 3.6.2 Automated Script for Secure File Download with Wget from a List of URLs\n",
"## 3.6.2 Automated Script for Secure File Download with netrc from a List of URLs\n",
"\n",
"The follwing shell script is to automate the downloading of files by looping through the links we obtained from `3.6.1`.\n",
"\n",
"- `work_dir`: Specifies the working directory where the download links and files will be stored.\n",
"- `input_file`: The text file containing the list of URLs to be downloaded.\n",
"- `base_wget_command`: The core command used to download files with options for authentication, session management, and secure connections.\n",
"- `output_directory`: The folder where the downloaded files will be saved.\n",
"- Loop: Iterates over each URL in the input file, ensuring all files are downloaded.\n"
]
Expand All @@ -188,15 +186,16 @@
"id": "c7d4a959",
"metadata": {},
"source": [
"> **Note**: Before proceeding with the download, ensure you have logged in to the [Earthdata](https://urs.earthdata.nasa.gov) website. You will need to retrieve the session cookies to authenticate your `wget` requests.\n",
"> **Note**: Before proceeding with the download, ensure you have registered in to the [Earthdata](https://urs.earthdata.nasa.gov) website. You need to include your username and password in the netrc file.\n",
"> \n",
"> 1. Visit the [Earthdata Login](https://urs.earthdata.nasa.gov) and log in with your credentials.\n",
"> 2. After logging in, use a browser extension like [EditThisCookie](https://chromewebstore.google.com/detail/editthiscookie/fngmhnnpilhplaeedifhccceomclgfbg) to export the session cookies.\n",
"> 3. If the cookies are exported in JSON format, convert them to the standard `wget` format. You can use tools like the [Cookie converter](http://www.linuxonly.nl/docs/60/159_Convert_cookies_txt_format.html) or manually extract the relevant cookies.\n",
"> 4. Save the cookies in a file named `cookies.txt` in the appropriate format as required by `wget`.\n",
"> 1. Create a .netrc file in your home directory. You can use command `nano ~/.netrc`.\n",
"> 2. Copy and paste this text in your netrc,```machine urs.earthdata.nasa.gov\n",
">login <your-username>\n",
">password <your-password>```\n",
">. Change the username and password placeholders to yours.\n",
"> 3. Save the file.\n",
"> \n",
"> The cookies are crucial for authenticating your requests and ensuring successful downloads.\n",
"> In the code snippet below, the output is shown for only 6 files."
"> The netrc file is crucial for authenticating your requests and ensuring successful downloads."
]
},
{
Expand All @@ -219,35 +218,32 @@
}
],
"source": [
"%%bash\n",
"#!/bin/bash\n",
"\n",
"# Specify the file containing the download links\n",
"input_file=\"../data/download_links_updated.txt\"\n",
"\n",
"# Specify the cookies file location\n",
"cookies_file=\"../data/cookies.txt\"\n",
"\n",
"# Ensure the cookies file exists (assumes it's been generated previously)\n",
"if [ ! -f \"$cookies_file\" ]; then\n",
" echo \"Cookies file not found: $cookies_file\"\n",
" exit 1\n",
"fi\n",
"\n",
"# Specify the base wget command with common options\n",
"base_wget_command=\"wget --load-cookies $cookies_file --save-cookies mycookies.txt --keep-session-cookies --no-check-certificate --progress=bar:force --quiet\"\n",
"\n",
"# Specify the output directory for downloaded files\n",
"output_directory=\"../data/gridmet_test_run/amsr\"\n",
"\n",
"# Ensure the output directory exists\n",
"mkdir -p \"$output_directory\"\n",
"input_file = \"../data/download_links.txt\"\n",
"output_directory = \"../data/gridmet_test_run/amsr\"\n",
"os.makedirs(output_directory, exist_ok=True)\n",
"\n",
"# Check if the input file exists\n",
"if not os.path.exists(input_file):\n",
" print(f\"Input file not found: {input_file}\")\n",
" exit(1)\n",
"\n",
"# Read URLs from the input file\n",
"with open(input_file, \"r\") as f:\n",
" urls = [line.strip() for line in f if line.strip()]\n",
"\n",
"for url in urls:\n",
" # Automatically use .netrc for authentication\n",
" response = requests.get(url, stream=True)\n",
" if response.status_code == 200:\n",
" file_name = os.path.join(output_directory, os.path.basename(url))\n",
" with open(file_name, \"wb\") as file:\n",
" for chunk in response.iter_content(chunk_size=1024):\n",
" file.write(chunk)\n",
" print(f\"Downloaded: {file_name}\")\n",
" else:\n",
" print(f\"Failed to download {url}. HTTP Status Code: {response.status_code}\")\n",
"\n",
"# Loop through each line (URL) in the input file and download it using wget\n",
"while IFS= read -r url || [[ -n \"$url\" ]]; do\n",
" echo \"Downloading: $url\"\n",
" $base_wget_command -P \"$output_directory\" \"$url\"\n",
"done < \"$input_file\"\n"
"print(\"All files downloaded successfully.\")"
]
},
{
Expand Down Expand Up @@ -1529,6 +1525,7 @@
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"import csv\n",
"import h5py\n",
"import shutil\n",
Expand Down
6 changes: 0 additions & 6 deletions book/data/amsr/mycookies.txt

This file was deleted.

4 changes: 0 additions & 4 deletions book/data/cookies.txt

This file was deleted.

0 comments on commit 0eecc93

Please sign in to comment.