Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add JUPYTER_MODE for JupyterHub #77

Merged
merged 5 commits into from
Sep 3, 2024
Merged

add JUPYTER_MODE for JupyterHub #77

merged 5 commits into from
Sep 3, 2024

Conversation

Tianhao-Gu
Copy link
Collaborator

No description provided.

@@ -179,6 +181,65 @@ services:
volumes:
- ./cdr/cdm/jupyter/user_shared_workspace:/cdm_shared_workspace/user_shared_workspace

dev_jupyterhub:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably only need one jupyterhub instance eventually and control permission based on different user groups. But I haven't completely sort it out yet.

Copy link

codecov bot commented Sep 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 48.76%. Comparing base (886ab19) to head (2ecedd6).
Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #77   +/-   ##
=======================================
  Coverage   48.76%   48.76%           
=======================================
  Files           4        4           
  Lines         121      121           
=======================================
  Hits           59       59           
  Misses         62       62           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@MrCreosote MrCreosote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it this is just the initial hub setup and there's more work to come

Dockerfile Outdated Show resolved Hide resolved
docker-compose.yaml Outdated Show resolved Hide resolved
Comment on lines +236 to +239
- POSTGRES_USER=hive
- POSTGRES_PASSWORD=hivepassword
- POSTGRES_DB=hive
- POSTGRES_URL=postgres:5432
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really the point of this PR but it makes me nervous that users have access to these creds...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea. We do have a todo item for this.

# TODO: create postgres user w/ only write access to the hive tables

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even that would make me nervous. It means that any user could blow away the hive tables

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the todo item to

# TODO: create postgres user r/ only read access to the hive tables

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users won't need to create tables?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess another question is if users and devs are all in the same jupyterhub instance is it possible to have different environments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users won't since they don't have minIO write permission. I am hoping it can be configured. But TBD.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so users = read only, devs = write, essentially.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could ever get the remote metastore working that might provide some protection as well, not sure

Comment on lines 240 to 241
volumes:
- ./cdr/cdm/jupyter/home:/home
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No shared workspace?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. Each user will have their own dir and python environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should dev_jupyterhub have the shared workspace mounted in that case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both instances mount - ./cdr/cdm/jupyter/jupyterhub/users_home:/jupyterhub/users_home. Each user will have their own dir inside of /jupyterhub/users_home and create notebook there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that dev-jupyterhub also mounts /cdr/cdm/jupyter:/cdm_shared_workspace. I'm asking if that line needs to be there or if it should be removed to match user_juypterhub

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any issues with using - ./cdr/cdm/jupyter:/cdm_shared_workspace for the dev environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying there's an issue, I just wasn't sure why it was there. can you remind me what's in the hive_metastore and cdm-postgres dirs? Are they both still relevant with the standalone postgres service?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cdm-postgres is not longer used, it's not getting updated for 2 months.
In hive_metastore there are <db_name>.db files. I think they are used when users run spark select statement.

Screenshot 2024-09-03 at 3 01 27 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe cdm-postgres was the local DB, and hive_metastore is cache data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exactly sure. But I want to worry about those later.

@Tianhao-Gu Tianhao-Gu merged commit a208fa7 into main Sep 3, 2024
9 of 10 checks passed
@Tianhao-Gu Tianhao-Gu deleted the dev_jupyterhub branch September 3, 2024 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants