Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more options to run interactively on the cloud #171

Closed
scottyhq opened this issue May 30, 2023 · 21 comments
Closed

Add more options to run interactively on the cloud #171

scottyhq opened this issue May 30, 2023 · 21 comments

Comments

@scottyhq
Copy link
Contributor

With google dropping credits for mybinder.org recently i've noticed launching sessions are indeed more unreliable
https://blog.jupyter.org/mybinder-org-reducing-capacity-c93ccfc6413f

It would be good to document running this content on other "free" platforms such as:

@dcherian
Copy link
Contributor

dcherian commented Jun 1, 2023

Another option is a pyscript / thebe thing potentially? I don't know what the state of affairs is here.

@dcherian dcherian mentioned this issue Jun 1, 2023
34 tasks
@lsetiawan
Copy link
Member

I just discovered this the other day that might be promising for this: https://jupyterlite.readthedocs.io/en/latest/ it uses pyodide and can run jupyterlab in the browser. It does uses the users local computing resources like a regular web app, so technically it's not fully "cloud".

@dcherian
Copy link
Contributor

dcherian commented Jun 2, 2023

The tutorial content is mostly local datasets downloaded using pooch or synthetic datasets, so that would be totally fine.

@scottyhq
Copy link
Contributor Author

scottyhq commented Jun 2, 2023

Did a quick test with google colab (which I admittedly haven't used much). It's not really well setup for a directory of notebooks as far as I can tell, nor conda environments! The default runtime has the following versions pre-installed:

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.11 (main, Apr  5 2023, 14:15:10) [GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.15.107+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: None

xarray: 2022.12.0
pandas: 1.5.3
numpy: 1.22.4
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.12.1
distributed: 2022.12.1
matplotlib: 3.7.1
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.4.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: 7.2.2
mypy: None
IPython: 7.34.0
sphinx: 3.5.4

So many of the notebooks could be executed, but not all. A simple pip install zarr flox would work if you only need a few libraries. Installing a full-fledged conda environment is slow and cumbersome and per-notebook:

!pip install -q condacolab
import condacolab
condacolab.install()

import condacolab
condacolab.check()

# NOTE: this will take a while, be patient!
!mamba env update --quiet --name="base" --file="https://raw.githubusercontent.com/xarray-contrib/xarray-tutorial/main/conda/environment.yml"

@scottyhq
Copy link
Contributor Author

scottyhq commented Jun 2, 2023

Adding CI to publish a Docker Image to GHCR would be nice to facilitate running locally for people who like Docker and also running on GitHub Codespaces

@scottyhq
Copy link
Contributor Author

scottyhq commented Jun 2, 2023

AWS StudioLab is more straightforward because you have a full-fledged normal JupyterLab interface (file browser, multiple notebooks, a terminal). You still have to install the locked environment as a manual step, as the standard environment does not come with xarray. A bonus of using StudioLab compared to BinderHub is that content and environments persists across sessions.

Open In SageMaker Studio Lab

Note link syntax similar to binderhub above https://studiolab.sagemaker.aws/import/github/xarray-contrib/xarray-tutorial/blob/main/overview/fundamental-path/index.ipynb

mamba env create --name="xarray-tutorial" --file="https://raw.githubusercontent.com/xarray-contrib/xarray-tutorial/main/conda/environment.yml"

@lsetiawan
Copy link
Member

Do you know if we need to pay for AWS StudioLab? It's asking me to login.

@scottyhq
Copy link
Contributor Author

scottyhq commented Jun 2, 2023

Do you know if we need to pay for AWS StudioLab? It's asking me to login.

You do have to create an account unlike Binder & Colab, but it is free without any credit card required. They impose daily usage limits (I think 12 hour sessions). We'll want to check resource limits and make sure the notebooks all actually run

@lsetiawan
Copy link
Member

Gotcha sounds good. Posting link to their FAQ here: https://studiolab.sagemaker.aws/faq. There's a waitlist to make new account? At least that's what their FAQ said.

@scottyhq
Copy link
Contributor Author

scottyhq commented Jun 2, 2023

Oh didn't realize that!

Q: Why is there a waiting list to get an account?
We are limiting the number of new account registrations at this time to ensure a high quality of experience for all users.

Q: How long do I have to wait for my account request to get approved?
Account requests are typically approved within 1 to 5 business days.

That's definitely a deal-breaker for large tutorials where we likely won't be able to engage with participants beforehand to sign up. Will be good to know if you do get access in 1-2 days @lsetiawan !

@lsetiawan
Copy link
Member

Update: I was able to be approved in 2 minutes and setting up the account took about 5 minutes. Though right now it's not straight forward on how to spin up the index notebook with the supplied conda environment... will have to investigate that more. I think this is a potential great way to run the tutorial. If we can get access to the people attending the tutorials, there can be some time to notify the participants to get AWS StudioLab account.

@lsetiawan
Copy link
Member

Update 2: Looks like it's not very straight forward to open up the index.ipynb. Going to https://studiolab.sagemaker.aws/import/github/xarray-contrib/xarray-tutorial/blob/main/overview/fundamental-path/index.ipynb doesn't automatically clone the repo and spin up the environment. There are a lot of steps that need to be done, including cloning the entire repo, creating a custom environment from the conda env yaml file (like the instruction in https://github.com/aws/studio-lab-examples/blob/main/custom-environments/custom_environment.ipynb)... it doesn't have mamba so creating environment takes forever, and then navigating to the index.ipynb and opening that up. I feel like this is a lot of steps and I'm spoiled by my binder, but what do you think @scottyhq?

@scottyhq
Copy link
Contributor Author

Thanks for looking into it @lsetiawan ! Agreed that studiolab is a bit tricky. In the end we'll have a couple options with some pros and cons that we can document on one of the website pages. I think near-term we should try out jupyterlite and codespaces too.

@dcherian
Copy link
Contributor

Great stuff! Eventually, it would be good to summarize your learnings on the pro/cons of each option here: https://tutorial.xarray.dev/overview/get-started.html

@lsetiawan
Copy link
Member

Linking the comment from @dcherian here: #170 (comment).

Currently Quansight is offering to host Nebari for tutorial and I think we should definitely take them up on that as Nebari is a really great system for this kinds of things IMO. I'll fill out the form for this. Looks like I need a few specs questions answer help.

  1. I think these 2 options are enough for the tutorial (these are the default machines they're offering)
    Small (2 CPUs, 8 GB RAM)
    Medium (4 CPUs, 16 GB RAM)

  2. I assume we don't need a GPU instance, it doesn't look like any of the tutorials uses that.

@scottyhq Could you confirm the above? Thanks!

@dcherian
Copy link
Contributor

I'll fill out the form for this.

Thanks!

I think yes on (1), (2). We could optionally use GPUs but it isn't necessary.

@scottyhq
Copy link
Contributor Author

we should manage with "small", but let's go ahead and request medium since some of the content will focus on dask and having a bit more than typically available on binder systems would be nice :) No GPUs necessary.

@dcherian
Copy link
Contributor

I asked what the dask team was planning to do and got the following responses from Naty Clementi and Jacob Tomlinson:

  1. Naty: We were planning on running mostly local, but we talked about the chance of using coiled notebooks + jupyter-repo2docker to get everything on the image. https://blog.coiled.io/blog/coiled-notebooks.html
  2. Jacob: When I run RAPIDS tutorials I usually stand up my own Binder because I need to add GPUs to the nodes, it's pretty quick and easy to do, especially if you're just running vanilla Binder without the GPU stuff.

@lsetiawan
Copy link
Member

@scottyhq and I discussed in person of going forward with Github Codespaces, and now there's PR #184 for this setup specifically for Scipy 2023

@lsetiawan
Copy link
Member

Quansight have hosted a nebari instance for the workshop, which can be found at https://scipy.quansight.dev/

@scottyhq
Copy link
Contributor Author

After #184 we have the ability to run interactive sessions either on mybinder.org or github codespaces

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants