Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup jupyterhub config #78

Merged
merged 5 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 25 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,19 @@ RUN groupadd -r spark && useradd -r -g spark spark_user

RUN apt-get update && apt-get install -y \
# GCC required to resolve error during JupyterLab installation: psutil could not be installed from sources because gcc is not installed.
gcc curl git npm nodejs graphviz graphviz-dev libgdal-dev build-essential python3-dev\
gcc \
curl \
git \
wget \
vim \
npm \
nodejs \
graphviz \
graphviz-dev \
libgdal-dev \
build-essential \
python3-dev \
sudo \
&& rm -rf /var/lib/apt/lists/*

ENV HADOOP_AWS_VER=3.3.4
Expand Down Expand Up @@ -42,22 +54,26 @@ RUN pipenv sync --system

RUN chown -R spark_user:spark /opt/bitnami

# Set up Jupyter Lab directories
# Set up JupyterLab directories
MrCreosote marked this conversation as resolved.
Show resolved Hide resolved
ENV JUPYTER_CONFIG_DIR=/.jupyter
ENV JUPYTER_RUNTIME_DIR=/.jupyter/runtime
ENV JUPYTER_DATA_DIR=/.jupyter/data
RUN mkdir -p ${JUPYTER_CONFIG_DIR} ${JUPYTER_RUNTIME_DIR} ${JUPYTER_DATA_DIR}
RUN chown -R spark_user:spark /.jupyter

# Set up Jupyter Hub directories
# Set up JupyterHub directories
ENV JUPYTERHUB_CONFIG_DIR=/srv/jupyterhub
RUN mkdir -p ${JUPYTERHUB_CONFIG_DIR}
COPY ./src/notebook_utils/startup.py ${JUPYTERHUB_CONFIG_DIR}/startup.py
COPY ./config/jupyterhub_config.py ${JUPYTERHUB_CONFIG_DIR}/jupyterhub_config.py
COPY ./scripts/spawn_notebook.sh ${JUPYTERHUB_CONFIG_DIR}/spawn_notebook.sh
RUN chmod +x ${JUPYTERHUB_CONFIG_DIR}/spawn_notebook.sh
RUN chown -R spark_user:spark ${JUPYTERHUB_CONFIG_DIR}

# Jupyter Hub user home directory
RUN mkdir -p /jupyterhub/users_home
RUN chown -R spark_user:spark /jupyterhub/users_home
ENV JUPYTERHUB_USER_HOME=/jupyterhub/users_home
RUN mkdir -p $JUPYTERHUB_USER_HOME
RUN chown -R spark_user:spark $JUPYTERHUB_USER_HOME

RUN npm install -g configurable-http-proxy

Expand All @@ -82,6 +98,10 @@ ENV CDM_SHARED_DIR=/cdm_shared_workspace
RUN mkdir -p ${CDM_SHARED_DIR} && chmod -R 777 ${CDM_SHARED_DIR}
RUN chown -R spark_user:spark $CDM_SHARED_DIR

# Allow spark_user to use sudo without a password
Copy link
Collaborator Author

@Tianhao-Gu Tianhao-Gu Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JupyterHub requires the presence of system users for its operations. Configuring JupyterHub with a non-root user is a lot challenging, as this user must manage other system users (home dir, virtual env, etc). As a tmp solution, I want to provide sudo access for the spark_user. I am actually consider switching to root if it becomes too cumbersome.

# TODO: use `sudospawner` in JupyterHub to avoid this (https://jupyterhub.readthedocs.io/en/stable/howto/configuration/config-sudo.html)
RUN echo "spark_user ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
MrCreosote marked this conversation as resolved.
Show resolved Hide resolved

# Switch back to non-root user
USER spark_user

Expand Down
27 changes: 27 additions & 0 deletions config/jupyterhub_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
"""
This is the JupyterHub configuration file. It is used to configure the JupyterHub server.
Refer to the JupyterHub documentation for more information:
https://jupyterhub.readthedocs.io/en/latest/tutorial/getting-started/config-basics.html
https://jupyterhub.readthedocs.io/en/stable/reference/config-reference.html
"""
import os

from jupyterhub_config.custom_spawner import VirtualEnvSpawner

c = get_config()
MrCreosote marked this conversation as resolved.
Show resolved Hide resolved

# Set the authenticator class
# TODO: Change the authenticator class to a secure one (e.g. GitHubOAuthenticator)
c.JupyterHub.authenticator_class = 'jupyterhub.auth.DummyAuthenticator'
c.Authenticator.allowed_users = {'spark_user', 'test_user1', 'test_user2'}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-existent system users (test_user1, test_user2) will be automatically created later in custom_spawner.py.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually I want to switch to OAuth and phase out user/password auth completely. Maybe only allow admin users to use password.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably long term we'd swap out all the auth systems for KBase auth

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea. That's the plan to use KBase auth eventually.

c.DummyAuthenticator.password = os.environ['JUPYTERHUB_ADMIN_PASSWORD']
MrCreosote marked this conversation as resolved.
Show resolved Hide resolved

c.Authenticator.admin_users = {'spark_user'}

c.JupyterHub.spawner_class = VirtualEnvSpawner

# Set the JupyterHub IP address and port
c.JupyterHub.ip = '0.0.0.0'
c.JupyterHub.port = int(os.getenv('NOTEBOOK_PORT'))

c.JupyterHub.log_level = 'DEBUG'
4 changes: 3 additions & 1 deletion docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ services:
- JUPYTER_MODE=jupyterhub
- YARN_RESOURCE_MANAGER_URL=http://yarn-resourcemanager:8032
- SPARK_MASTER_URL=spark://spark-master:7077
- SPARK_DRIVER_HOST=dev-jupterhub
- SPARK_DRIVER_HOST=dev-jupyterhub
- MINIO_URL=http://minio:9002
- MINIO_ACCESS_KEY=minio-readwrite
- MINIO_SECRET_KEY=minio123
Expand All @@ -207,6 +207,7 @@ services:
- POSTGRES_DB=hive
- POSTGRES_URL=postgres:5432
- USAGE_MODE=dev
- JUPYTERHUB_ADMIN_PASSWORD=testpassword123
volumes:
- ./cdr/cdm/jupyter:/cdm_shared_workspace
- ./cdr/cdm/jupyter/jupyterhub/users_home:/jupyterhub/users_home
Expand Down Expand Up @@ -237,6 +238,7 @@ services:
- POSTGRES_PASSWORD=hivepassword
- POSTGRES_DB=hive
- POSTGRES_URL=postgres:5432
- JUPYTERHUB_ADMIN_PASSWORD=testpassword123
volumes:
- ./cdr/cdm/jupyter/jupyterhub/users_home:/jupyterhub/users_home

Expand Down
2 changes: 1 addition & 1 deletion scripts/notebook_entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ if [ "$JUPYTER_MODE" = "jupyterlab" ]; then
elif [ "$JUPYTER_MODE" = "jupyterhub" ]; then
echo "starting jupyterhub"

echo "TO BE IMPLEMENTED"
jupyterhub -f "$JUPYTERHUB_CONFIG_DIR"/jupyterhub_config.py
else
echo "ERROR: JUPYTER_MODE is not set to jupyterlab or jupyterhub. Please set JUPYTER_MODE to either jupyterlab or jupyterhub."
exit 1
Expand Down
9 changes: 9 additions & 0 deletions scripts/spawn_notebook.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

USERNAME=${JUPYTERHUB_USER}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JUPYTERHUB_USER will be set in custom_spawner.py in later PRs.


echo "Starting Jupyter Notebook for user: $USERNAME"
cd $JUPYTERHUB_USER_HOME/$USERNAME
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User home dir will also be created in custom_spawner.py in future PRs.


# Start the notebook server with current user
exec jupyterhub-singleuser
17 changes: 17 additions & 0 deletions src/jupyterhub_config/custom_spawner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from jupyterhub.spawner import SimpleLocalProcessSpawner


class VirtualEnvSpawner(SimpleLocalProcessSpawner):
"""
A custom JupyterHub spawner that creates and manages a virtual environment
for each user, configuring their workspace based on their admin status.
"""

def start(self):
"""
Start the JupyterHub server for the user. This method ensures that the
user's directory and virtual environment are set up, configures environment
variables, and sets the notebook directory before starting the server.
"""

return super().start()
1 change: 1 addition & 0 deletions test/src/jupyterhub_config/custom_spawner_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from jupyterhub_config.custom_spawner import *
Loading