Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Run dataget as non-root user UID 1000 #13

Closed
wants to merge 7 commits into from
Closed

Conversation

surchs
Copy link
Collaborator

@surchs surchs commented Dec 19, 2024

Changes proposed in this pull request:

  • run container as user with UID 1000 (usually first user on UNIX)

Checklist

This section is for the PR reviewer

  • PR has an interpretable title with a prefix ([ENH], [FIX], [REF], [TST], [CI], [MNT], [INF], [MODEL], [DOC]) (see our Contributing Guidelines for more info)
  • PR links to GitHub issue with mention Closes #XXXX
  • Tests pass
  • Checks pass

For new features:

  • Tests have been added

For bug fixes:

  • There is at least one test that would fail under the original bug conditions.

@alyssadai alyssadai self-requested a review January 9, 2025 06:54
Copy link
Contributor

@alyssadai alyssadai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @surchs! This is fancy stuff. Left a few 🍒 and clarification comments, but overall looks good👍

Dockerfile Outdated
Comment on lines 31 to 32
RUN git config --global user.email "[email protected]" && \
git config --global user.name "Bagel User"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍒 for clarity:

Suggested change
RUN git config --global user.email "bageluser@neuroabgel.org" && \
git config --global user.name "Bagel User"
RUN git config --global user.email "user@neurobagel.org" && \
git config --global user.name "Neurobagel User"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we also configure these variables above in the datalad install RUN step - maybe we can remove one set of these git config steps to minimize redundancy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH, good catch! I didn't realize this and added the git config below because I got an error that the config was missing. However the config is scoped to the user, and because we now change the user to USERNAME it likely fails.

I can fix this by moving the git config below the user change line. But it makes me wonder if this container is going to work when I change the user at runtime with the -u flag

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated
@@ -14,4 +22,13 @@ RUN apt-get update -qq && \
datalad wtf

COPY ./entrypoint.sh /entrypoint.sh

# Change ownership of the entrypoint script
RUN chown $USERNAME:$USERNAME /entrypoint.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, even after changing ownership here, the script should still be executable by all users, correct?

i.e., if we override the user in the container at runtime, they should still be able to run entrypoint.sh despite it still being owned by neurobagel_user?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and that's a good point! I will add execution permission for everyone instead

Dockerfile Outdated
Comment on lines 10 to 11
RUN groupadd --gid $USER_GID $USERNAME \
&& useradd --uid $USER_UID --gid $USER_GID -m $USERNAME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool, but I don't find it super intuitive. To make it easier for ourselves if we need to revisit these settings, how about a comment briefly describing the type of user/user group you are trying to create? e.g., Creating a user in the container which is the same as the first created user on the host system, with a home directory (?), and adding it to a new group with the same name as the user.

Also, I encountered the option --system or -r for groupadd/useradd, which seems to be specifically creating users for running services rather than login/interactive use (no home directory) - see https://serverfault.com/a/350932 and https://www.linuxquestions.org/questions/linux-newbie-8/useradd-r-option-and-system-account-question-892978/ for example. I believe system users are created by default with UIDs < 1000, so it would be an alternative to explicitly setting UID/GID. Is this something worth looking into? Or would having a home directory be actually important in our case, since we are presumably doing some git operations during the datalad install step? (If so, I think the current approach still makes sense)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the comment, agree that this can help. I don't see a benefit of using the -r flag you looked up. On a cursory read it seems that it might rather make things more complicated as it seems to be a user for deamons, not actual users. An extra home directory is cheap as well, so I don't think we should do more things more complicated than necessary

@surchs
Copy link
Collaborator Author

surchs commented Jan 11, 2025

So the tricky thing with this PR is that it will now always run the container as the user with UID 1000. Unless of course the user manually changes the ID. But there won't be a nice warning message if they don't change the ID and also don't happen to be the user with UID 1000, e.g. on a shared system. If the user with UID does not have write permission to the output directory specified for dataget, then the container will just fail with a "Permission denied" error.

That may not be a huge problem, because most likely they wouldn't be allowed to user docker on a shared system anyway. But the question is: how useful is this change going to be if we:

  1. Expect people to change the UID anyway (e.g. with the new instructions in [ENH] Add -u flag to dataget instruction query-tool#435) or
  2. Expect that people run the container on their personal laptop - where running it as root is not going to be such a big headache either.

So I'm wondering a bit how clever it is to merge this change here - or if we should just leave it at changing the default instructions so users always specify the uid:gid they want to run the container as

@surchs
Copy link
Collaborator Author

surchs commented Jan 13, 2025

After discussion we decided that it makes more sense to keep the container run as root by default, as most users will either override the user now or be root on the host anyway. For shared systems, we'll add a singularity command example in the query tool

@surchs surchs closed this Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Downloaded data are owned by root
2 participants