Merge pull request #48 from geo-smart/amsr-netrc

replace cookie approach with netrc in amsr
geo-smart · Dec 14, 2024 · a8169d2 · a8169d2
2 parents 9761f89 + 0eecc93
commit a8169d2
Show file tree

Hide file tree

Showing 3 changed files with 35 additions and 48 deletions.
diff --git a/book/chapters/amsr.ipynb b/book/chapters/amsr.ipynb
@@ -172,13 +172,11 @@
    "id": "9268fe3d",
    "metadata": {},
    "source": [
-    "## 3.6.2 Automated Script for Secure File Download with Wget from a List of URLs\n",
+    "## 3.6.2 Automated Script for Secure File Download with netrc from a List of URLs\n",
     "\n",
     "The follwing shell script is to automate the downloading of files by looping through the links we obtained from `3.6.1`.\n",
     "\n",
-    "- `work_dir`: Specifies the working directory where the download links and files will be stored.\n",
     "- `input_file`: The text file containing the list of URLs to be downloaded.\n",
-    "- `base_wget_command`: The core command used to download files with options for authentication, session management, and secure connections.\n",
     "- `output_directory`: The folder where the downloaded files will be saved.\n",
     "- Loop: Iterates over each URL in the input file, ensuring all files are downloaded.\n"
    ]
@@ -188,15 +186,16 @@
    "id": "c7d4a959",
    "metadata": {},
    "source": [
-    "> **Note**: Before proceeding with the download, ensure you have logged in to the [Earthdata](https://urs.earthdata.nasa.gov) website. You will need to retrieve the session cookies to authenticate your `wget` requests.\n",
+    "> **Note**: Before proceeding with the download, ensure you have registered in to the [Earthdata](https://urs.earthdata.nasa.gov) website. You need to include your username and password in the netrc file.\n",
     "> \n",
-    "> 1. Visit the [Earthdata Login](https://urs.earthdata.nasa.gov) and log in with your credentials.\n",
-    "> 2. After logging in, use a browser extension like [EditThisCookie](https://chromewebstore.google.com/detail/editthiscookie/fngmhnnpilhplaeedifhccceomclgfbg) to export the session cookies.\n",
-    "> 3. If the cookies are exported in JSON format, convert them to the standard `wget` format. You can use tools like the [Cookie converter](http://www.linuxonly.nl/docs/60/159_Convert_cookies_txt_format.html) or manually extract the relevant cookies.\n",
-    "> 4. Save the cookies in a file named `cookies.txt` in the appropriate format as required by `wget`.\n",
+    "> 1. Create a .netrc file in your home directory. You can use command `nano ~/.netrc`.\n",
+    "> 2. Copy and paste this text in your netrc,```machine urs.earthdata.nasa.gov\n",
+    ">login <your-username>\n",
+    ">password <your-password>```\n",
+    ">. Change the username and password placeholders to yours.\n",
+    "> 3. Save the file.\n",
     "> \n",
-    "> The cookies are crucial for authenticating your requests and ensuring successful downloads.\n",
-    "> In the code snippet below, the output is shown for only 6 files."
+    "> The netrc file is crucial for authenticating your requests and ensuring successful downloads."
    ]
   },
   {
@@ -219,35 +218,32 @@
     }
    ],
    "source": [
-    "%%bash\n",
-    "#!/bin/bash\n",
-    "\n",
-    "# Specify the file containing the download links\n",
-    "input_file=\"../data/download_links_updated.txt\"\n",
-    "\n",
-    "# Specify the cookies file location\n",
-    "cookies_file=\"../data/cookies.txt\"\n",
-    "\n",
-    "# Ensure the cookies file exists (assumes it's been generated previously)\n",
-    "if [ ! -f \"$cookies_file\" ]; then\n",
-    "    echo \"Cookies file not found: $cookies_file\"\n",
-    "    exit 1\n",
-    "fi\n",
-    "\n",
-    "# Specify the base wget command with common options\n",
-    "base_wget_command=\"wget --load-cookies $cookies_file --save-cookies mycookies.txt --keep-session-cookies --no-check-certificate --progress=bar:force --quiet\"\n",
-    "\n",
-    "# Specify the output directory for downloaded files\n",
-    "output_directory=\"../data/gridmet_test_run/amsr\"\n",
-    "\n",
-    "# Ensure the output directory exists\n",
-    "mkdir -p \"$output_directory\"\n",
+    "input_file = \"../data/download_links.txt\"\n",
+    "output_directory = \"../data/gridmet_test_run/amsr\"\n",
+    "os.makedirs(output_directory, exist_ok=True)\n",
+    "\n",
+    "# Check if the input file exists\n",
+    "if not os.path.exists(input_file):\n",
+    "    print(f\"Input file not found: {input_file}\")\n",
+    "    exit(1)\n",
+    "\n",
+    "# Read URLs from the input file\n",
+    "with open(input_file, \"r\") as f:\n",
+    "    urls = [line.strip() for line in f if line.strip()]\n",
+    "\n",
+    "for url in urls:\n",
+    "    # Automatically use .netrc for authentication\n",
+    "    response = requests.get(url, stream=True)\n",
+    "    if response.status_code == 200:\n",
+    "        file_name = os.path.join(output_directory, os.path.basename(url))\n",
+    "        with open(file_name, \"wb\") as file:\n",
+    "            for chunk in response.iter_content(chunk_size=1024):\n",
+    "                file.write(chunk)\n",
+    "        print(f\"Downloaded: {file_name}\")\n",
+    "    else:\n",
+    "        print(f\"Failed to download {url}. HTTP Status Code: {response.status_code}\")\n",
     "\n",
-    "# Loop through each line (URL) in the input file and download it using wget\n",
-    "while IFS= read -r url || [[ -n \"$url\" ]]; do\n",
-    "    echo \"Downloading: $url\"\n",
-    "    $base_wget_command -P \"$output_directory\" \"$url\"\n",
-    "done < \"$input_file\"\n"
+    "print(\"All files downloaded successfully.\")"
    ]
   },
   {
@@ -1529,6 +1525,7 @@
    "outputs": [],
    "source": [
     "import os\n",
+    "import requests\n",
     "import csv\n",
     "import h5py\n",
     "import shutil\n",

diff --git a/book/data/amsr/mycookies.txt b/book/data/amsr/mycookies.txt
diff --git a/book/data/cookies.txt b/book/data/cookies.txt