Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download script for different SICE datasets #1

Open
jasonebox opened this issue Dec 19, 2021 · 4 comments
Open

download script for different SICE datasets #1

jasonebox opened this issue Dec 19, 2021 · 4 comments

Comments

@jasonebox
Copy link

the SICE dataverse currently has 5 datasets https://dataverse01.geus.dk/dataverse/sice?q=&types=datasets&sort=dateSort&order=desc&page=1

so the script should discriminate between them and/or the header of the script should list what they are.

@AdrienWehrle
Copy link
Member

The download script is currently directly in the GEUS-SICE/SICE repository. Indeed, I think only one script is needed for that and therefore, there is no need for a full new directory IMO.
We could simply give the persistent Ids of each data set here: https://github.com/GEUS-SICE/SICE/blob/master/SICE_dataverse_download.py#L24

@mankoff
Copy link
Contributor

mankoff commented Dec 19, 2021

In early 2022 we're completing a dataverse upgrade that should make this script unnecessary. One-line wget commands will work to fetch datasets, and support all of the features contained therein - inclusion and exclusion wildcards to fetch/skip certain years, file names, etc.

@mankoff
Copy link
Contributor

mankoff commented Jun 29, 2022

The upgraded dataverse provides a dirIndex view to the data files. This makes it easy to download everything with wget or a web-browser downloader such as DownThemAll, and reduces the need for more complicated bash or Python scripts to fetch data.

Please see http://doi.org/10.22008/promice/data/ice_discharge/d/v02 for the text I'm currently putting in the Notes section on all of the datasets. I include the HTML version of the text below. Chose "Edit" from "..." menu on this comment to see raw HTML.

This wiki page (by @BaptisteVandecrux ?) should also be updated:
https://github.com/GEUS-SICE/dataverse-io/wiki/How-to:-download-all-files-in-a-dataverse


Direct link to most recent files: https://dataverse.geus.dk/api/datasets/:persistentId/dirindex?persistentId=doi:10.22008/promice/data/ice_discharge/d/v02




wget download command:

wget -r -e robots=off -nH --cut-dirs=3 --content-disposition "https://dataverse.geus.dk/api/datasets/:persistentId/dirindex?persistentId=doi:10.22008/promice/data/ice_discharge/d/v02"

@mankoff mankoff closed this as completed Jun 29, 2022
@mankoff
Copy link
Contributor

mankoff commented Jun 29, 2022

Actually, re-opening until the relevant text/pages/code have been updated.

@mankoff mankoff reopened this Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants