Skip to content

Commit

Permalink
Merge pull request #498 from nsidc/update-documentation
Browse files Browse the repository at this point in the history
Improve documentation structure
  • Loading branch information
mfisher87 authored May 8, 2024
2 parents a678b4b + 52e70c0 commit 5a3180e
Show file tree
Hide file tree
Showing 18 changed files with 421 additions and 1,504 deletions.
154 changes: 11 additions & 143 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# _earthaccess_

<p align="center">
<img alt="earthaccess, a python library to search, download or stream NASA Earth science data with just a few lines of code" src="https://user-images.githubusercontent.com/717735/205517116-7a5d0f41-7acc-441e-94ba-2e541bfb7fc8.png" width="70%" align="center" />
</p>
Expand Down Expand Up @@ -30,166 +32,32 @@

</p>

## **Overview**

*earthaccess* is a **python library to search, download or stream NASA Earth science data** with just a few lines of code.


In the age of cloud computing, the power of open science only reaches its full potential if we have easy-to-use workflows that facilitate research in an inclusive, efficient and reproducible way. Unfortunately —as it stands today— scientists and students alike face a steep learning curve adapting to systems that have grown too complex and end up spending more time on the technicalities of the tools, cloud and NASA APIs than focusing on their important science.

During several workshops organized by [NASA Openscapes](https://nasa-openscapes.github.io/events.html), the need to provide easy-to-use tools to our users became evident. Open science is a collaborative effort; it involves people from different technical backgrounds, and the data analysis to solve the pressing problems we face cannot be limited by the complexity of the underlying systems. Therefore, providing easy access to NASA Earthdata regardless of the data storage location (hosted within or outside of the cloud) is the main motivation behind this Python library.

## **Installing earthaccess**

You will need Python 3.8 or higher installed.

Install the latest release using conda

```bash
conda install -c conda-forge earthaccess
```
`earthaccess` is a python library to **search for**, and **download** or **stream** NASA Earth science data with just a few lines of code.

Using Pip

```bash
pip install earthaccess
```
Visit [our documentation](https://earthaccess.readthedocs.io/en/latest) to learn more!

Try it in your browser without installing anything! [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nsidc/earthaccess/main)


## **Usage**


With *earthaccess* we can login, search and download data with a few lines of code and even more relevant, our code will work the same way if we are running it in the cloud or from our laptop. ***earthaccess*** handles authentication with [NASA's Earthdata Login (EDL)](https://urs.earthdata.nasa.gov), search using NASA's [CMR](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html) and access through [`fsspec`](https://github.com/fsspec/filesystem_spec).

The only requirement to use this library is to open a free account with NASA [EDL](https://urs.earthdata.nasa.gov).


### **Authentication**

By default, `earthaccess` with automatically look for your EDL account credentials in two locations:

1. A `~/.netrc` file
2. `EARTHDATA_USERNAME` and `EARTHDATA_PASSWORD` environment variables

If neither of these options are configured, you can authenticate by calling the `earthaccess.login()` method
and manually entering your EDL account credentials.

```python
import earthaccess

earthaccess.login()
```

Note you can pass `persist=True` to `earthaccess.login()` to have the EDL account credentials you enter
automatically saved to a `~/.netrc` file for future use.


Once you are authenticated with NASA EDL you can:

* Get a file from a DAAC using a `fsspec` session.
* Request temporary S3 credentials from a particular DAAC (needed to download or stream data from an S3 bucket in the cloud).
* Use the library to download or stream data directly from S3.
* Regenerate CMR tokens (used for restricted datasets)


### **Searching for data**

Once we have selected our dataset we can search for the data granules using *doi*, *short_name* or *concept_id*.
If we are not sure or we don't know how to search for a particular dataset, we can start with the ["Introducing NASA earthaccess"](https://nsidc.github.io/earthaccess/tutorials/demo/#querying-for-datasets) tutorial or through the [NASA Earthdata Search portal](https://search.earthdata.nasa.gov/). For a complete list of search parameters we can use visit the extended [API documentation](https://earthaccess.readthedocs.io/en/latest/user-reference/api/api/).

```python

results = earthaccess.search_data(
short_name='SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205',
cloud_hosted=True,
bounding_box=(-10, 20, 10, 50),
temporal=("1999-02", "2019-03"),
count=10
)


```
## How to Get Started with `earthaccess`

Now that we have our results we can do multiple things: We can iterate over them to get HTTP (or S3) links, we can download the files to a local folder, or we can open these files and stream their content directly to other libraries e.g. xarray.
Visit [our quick start guide](https://earthaccess.readthedocs.io/en/latest/quick-start.html) to learn how to install and see a simple example of using `earthaccess`.

### **Accessing the data**

**Option 1: Using the data links**

If we already have a workflow in place for downloading our data, we can use *earthaccess* as a search-only library and get HTTP links from our query results. This could be the case if our current workflow uses a different language and we only need the links as input.

```python

# if the data set is cloud hosted there will be S3 links available. The access parameter accepts "direct" or "external", direct access is only possible if you are in the us-west-2 region in the cloud.
data_links = [granule.data_links(access="direct") for granule in results]

# or if the data is an on-prem dataset
data_links = [granule.data_links(access="external") for granule in results]

```

> Note: *earthaccess* can get S3 credentials for us, or auhenticated HTTP sessions in case we want to use them with a different library.
**Option 2: Download data to a local folder**

This option is practical if you have the necessary space available on disk. The *earthaccess* library will print out the approximate size of the download and its progress.
```python
files = earthaccess.download(results, "./local_folder")

```

**Option 3: Direct S3 Access - Stream data directly to xarray**

This method works best if you are in the same Amazon Web Services (AWS) region as the data (us-west-2) and you are working with gridded datasets (processing level 3 and above).

```python
import xarray as xr

files = earthaccess.open(results)

ds = xr.open_mfdataset(files)

```

And that's it! Just one line of code, and this same piece of code will also work for data that are not hosted in the cloud, i.e. located at NASA storage centers.


> More examples coming soon!

### Compatibility
## Compatibility

Only **Python 3.8+** is supported.


## How to Contribute to `earthaccess`

If you want to contribute to `earthaccess` checkout the [Contributing Guide](https://earthaccess.readthedocs.io/en/latest/contributing/).

## Contributors

[![Contributors](https://contrib.rocks/image?repo=nsidc/earthaccess)](https://github.com/nsidc/earthaccess/graphs/contributors)

## Contributing Guide
### Contributors

Welcome! 😊👋
[![Contributors](https://contrib.rocks/image?repo=nsidc/earthaccess)](https://github.com/nsidc/earthaccess/graphs/contributors)

> Please see the [Contributing Guide](CONTRIBUTING.md).

### [Project Board](https://github.com/nsidc/earthdata/discussions).

### Glossary

<a href="https://www.earthdata.nasa.gov/learn/glossary">NASA Earth Science Glossary</a>

## License

earthaccess is licensed under the MIT license. See [LICENSE](LICENSE.txt).

## Level of Support

<div><img src="https://raw.githubusercontent.com/nsidc/earthdata/main/docs/nsidc-logo.png" width="84px" align="left" text-align="middle"/>
<br>
This repository is supported by a joint effort of NSIDC, NASA DAACs, and the Earth science community, and we welcome any contribution in the form of issue submissions, pull requests, or discussions. Issues labeled as https://github.com/nsidc/earthaccess/labels/good%20first%20issue are a great place to get started.
</div>

62 changes: 62 additions & 0 deletions docs/contributing/development.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Development environment setup

1. Fork [nsidc/earthaccess](https://github.com/nsidc/earthaccess)
1. Clone your fork (`git clone [email protected]:{my-username}/earthaccess`)

`earthaccess` uses Poetry to build and publish the package to PyPI, the defacto Python repository. In order to develop new features or fix bugs etc. we need to set up a virtual environment and install the library locally. We can accomplish this with Conda and Poetry, or just with Poetry. Both workflows achieve the same result.

### Using Conda

If we have `mamba` (or `conda`) installed, we can use the environment file included in the `ci` folder. This will install all the libraries we need (including Poetry) to start developing `earthaccess`:

```bash
mamba env update -f ci/environment-dev.yml
mamba activate earthaccess-dev
poetry install
```

After activating our environment and installing the library with Poetry we can run Jupyter lab and start testing the local distribution or we can use `make` to run the tests and lint the code.
Now we can create a feature branch and push those changes to our fork!

### Using Poetry

If we want to use Poetry, first we need to [install it](https://python-poetry.org/docs/#installation). After installing Poetry we can use the same workflow we used for Conda, first we install the library locally:

```bash
poetry install
```

and now we can run the local Jupyter Lab and run the scripts etc. using Poetry:

```bash
poetry run jupyter lab
```

!!! note

You may need to use `poetry run make ...` to run commands in the environment.

### Managing Dependencies

If you need to add a new dependency, you should do the following:

- Run `poetry add <package>` for a required (non-development) dependency
- Run `poetry add --group=dev <package>` for a development dependency, such
as a testing or code analysis dependency

Both commands add an entry to `pyproject.toml` with a version that is
compatible with the rest of the dependencies. However, `poetry` pins versions
with a caret (`^`), which is not what we want. Therefore, you must locate the
new entry in `pyproject.toml` and change the `^` to `>=`. (See
[poetry-relax](https://github.com/zanieb/poetry-relax) for the reasoning behind
this.)

In addition, you must also add a corresponding entry to
`ci/environment-mindeps.yaml`. You'll notice in this file that required
dependencies should be pinned exactly to the versions specified in
`pyproject.toml` (after changing `^` to `>=` there), and that development
dependencies should be left unpinned.

Finally, for _development dependencies only_, you must add an entry to
`ci/environment-dev.yaml` with the same version constraint as in
`pyproject.toml`.
107 changes: 9 additions & 98 deletions docs/contributing/index.md
Original file line number Diff line number Diff line change
@@ -1,78 +1,18 @@
# Contributing

When contributing to this repository, please first discuss the change you wish to make via issue,
email, or any other method with the owners of this repository before making a change.
When contributing to this repository, please first discuss the change you wish to make
with the community and maintainers via
[a GitHub issue](https://github.com/nsidc/earthaccess/issues),
[a GitHub Discussion](https://github.com/nsidc/earthaccess/discussions),
or [any other method](our-meet-ups.md).

Please note that we have a [code of conduct](./CODE_OF_CONDUCT.md). Please follow it in all of your interactions with the project.

## Development environment

1. Fork [nsidc/earthaccess](https://github.com/nsidc/earthaccess)
1. Clone your fork (`git clone [email protected]:{my-username}/earthaccess`)

`earthaccess` uses Poetry to build and publish the package to PyPI, the defacto Python repository. In order to develop new features or fix bugs etc. we need to set up a virtual environment and install the library locally. We can accomplish this with Poetry and/or Conda.

### Using Conda

If we have `mamba` (or `conda`) installed, we can use the environment file included in the `ci` folder. This will install all the libraries we need (including Poetry) to start developing `earthaccess`:

```bash
mamba env update -f ci/environment-dev.yml
mamba activate earthaccess-dev
poetry install
```

After activating our environment and installing the library with Poetry we can run Jupyter lab and start testing the local distribution or we can use `make` to run the tests and lint the code.
Now we can create a feature branch and push those changes to our fork!

### Using Poetry

If we want to use Poetry, first we need to [install it](https://python-poetry.org/docs/#installation). After installing Poetry we can use the same workflow we used for Conda, first we install the library locally:

```bash
poetry install
```

and now we can run the local Jupyter Lab and run the scripts etc. using Poetry:

```bash
poetry run jupyter lab
```

!!! note

You may need to use `poetry run make ...` to run commands in the environment.

### Managing Dependencies

If you need to add a dependency, you should do the following:

- Run `poetry add <package>` for a required (non-development) dependency
- Run `poetry add --group=dev <package>` for a development dependency, such
as a testing or code analysis dependency

Both commands will add an entry to `pyproject.toml` with a version that is
compatible with the rest of the dependencies. However, `poetry` pins versions
with a caret (`^`), which is not what we want. Therefore, you must locate the
new entry in `pyproject.toml` and change the `^` to `>=`. (See
[poetry-relax](https://github.com/zanieb/poetry-relax) for the reasoning behind
this.)

In addition, you must also add a corresponding entry to
`ci/environment-mindeps.yaml`. You'll notice in that file that required
dependencies should be pinned exactly to the versions specified in
`pyproject.toml` (after changing `^` to `>=` there), and that development
dependencies should be left unpinned.

Finally, for _development dependencies only_, you must add an entry to
`ci/environment-dev.yaml` with the same version constraint as in
`pyproject.toml`.
Please note that we have a [code of conduct](/CODE_OF_CONDUCT.md). Please follow it in all of your interactions with the project.

## First Steps to contribute

- Read the documentation
- Fork this repo (see "Development environment" section above for more)
- Install environment (see "Development environment" section above for more)
- Read the documentation!
- Fork this repo and set up development environment (see
[development environment documentation](./development.md) for details)
- Run the unit tests successfully in `main` branch:
- `make test`

Expand Down Expand Up @@ -144,32 +84,3 @@ the stubs appear under `stubs/cmr`.
1. You may merge the Pull Request once you have the sign-off of another
developer, or if you do not have permission to do that, you may request the
reviewer to merge it for you.

## Release process

> :memo: The versioning scheme we use is [SemVer](http://semver.org/). Note that until
> we agree we're ready for v1.0.0, we will not increment the major version.
1. Ensure all desired features are merged to `main` branch and `CHANGELOG.md` is updated.
1. Use `bump-my-version` to increase the version number in all needed places, e.g. to
increase the minor version (`1.2.3` to `1.3.0`):

```plain
bump-my-version bump minor
```

1. Push a tag on the new commit containing the version number, prefixed with `v`, e.g.
`v1.3.0`.
1. [Create a new GitHub Release](https://github.com/nsidc/earthaccess/releases/new). We
hand-curate our release notes to be valuable to humans. Please do not auto-generate
release notes and aim for consistency with the GitHub Release descriptions from other
releases.

> :gear: After the GitHub release is published, multiple automations will trigger:
>
> - Zenodo will create a new DOI.
> - GitHub Actions will publish a PyPI release.
> :memo: `earthaccess` is published to conda-forge through the
> [earthdata-feedstock](https://github.com/conda-forge/earthdata-feedstock), as this
> project was renamed early in its life. The conda package is named `earthaccess`.
21 changes: 21 additions & 0 deletions docs/contributing/our-meet-ups.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# How to collaborate with the _earthaccess_ team

## Bi-weekly (alternating weeks) _earthaccess_ hack days

???+ info "How to get invited"

For an invitation to our recurring hack day meeting, please visit our
[announcement thread on GitHub Discussions](https://github.com/nsidc/earthaccess/discussions/440#)
and make a comment to request a calendar invitation and Zoom link.


Hack days...

* Occur on alternating Tuesdays at 11AM - 1PM Mountain Time.
* Are self-determining; you can work on what sounds fun to you!
* Are supportive; _earthaccess_ developers, maintainers, and community managers will
be present on the call. We welcome and aim to foster new contributions and community members.
* Include live demos on request!

For a glimpse in to the work we do on a typical hack day, please visit our
[hack day share-out space in GitHub Discussions](https://github.com/nsidc/earthaccess/discussions/categories/hack-days)!
Loading

0 comments on commit 5a3180e

Please sign in to comment.