Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mounts on images [CLI-4] - image.add_local_python_package #2343

Merged
merged 49 commits into from
Nov 21, 2024

Conversation

freider
Copy link
Contributor

@freider freider commented Oct 16, 2024

Adds the notion of a "mount layer" on Image, which is a way for image objects to track mounts that should be attached to any container that uses the image.
To prevent issues with overlapping file system changes in case of image steps that modify files specified in mount layers, a mount layer is always "materialized" into an actual image layer using "COPY" commands in case there is a subsequent non-mount layer added to the image.

For now, the only place we use these mount layers is in the new image.add_local_python_packages() method on image, but the plan is to roll out changes that replace all current user usages of Mount (add_local_dir, add_local_file etc.) with mount layers and eventually get rid of Mounts as an external concept.

Note that since the attached mounts are entirely a client-side concept, a hypothetical Image.from_id or similar would not be able to recover the mount information, but I don't think we use that anywhere where that would make sense.

Backward/forward compatibility checks ---

Check these boxes or delete any item (or this section) if not relevant for this PR.

  • Client+Server: this change is compatible with old servers
  • Client forward compatibility: this change ensures client can accept data intended for later versions of itself

Note on protobuf: protobuf message changes in one place may have impact to
multiple entities (client, server, worker, database). See points above.


Changelog

  • Adds Image.add_local_python_packages which works similarly to Mount.from_local_python_packages but for images.

@freider freider changed the base branch from main to freider/image-mount-watch-cli-219 November 6, 2024 15:16
modal/image.py Outdated
raise InvalidError(
textwrap.dedent(
"""
It's recommended to run any `image.add_*` commands for adding local resources last
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions for how to make this wall of text easy to understand would be appreciated :D

@freider freider requested a review from mwaskom November 6, 2024 15:48
Base automatically changed from freider/image-mount-watch-cli-219 to main November 12, 2024 13:32
@freider freider changed the title Mounts on images [CLI-4] - image.add_local_python_package Mounts on images [CLI-4] - image.attach_local_python_package Nov 15, 2024
@freider freider changed the title Mounts on images [CLI-4] - image.attach_local_python_package Mounts on images [CLI-4] - image.add_local_python_package Nov 19, 2024
modal/image.py Outdated
raise InvalidError(
textwrap.dedent(
"""
An image tried to run a build step after using `image.add_local_*` to include local files.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwaskom any feedback on this error message copy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related but one sort of annoying thing is that verbose error messages end up looking bad in the terminal since we print them twice:

image

Also can we error on the side of shorter lines? (Or longer lines and let rich handle the wrapping?) With narrow terminals this gets a little garbled:
image

@@ -633,6 +634,7 @@ def from_args(
if proxy:
# HACK: remove this once we stop using ssh tunnels for this.
if image:
# TODO(elias): this will cause an error if users use prior `.add_local_*` commands without copy=True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the autossh installation - checking with @pawalt and @luiscape now

Copy link
Contributor

@mwaskom mwaskom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can probably do a little more here to help users understand this. Some of that would be aligning on clear terminology and using it consistently. It would also be good to add a conceptual explanation to the "Guide" page about containers (although it's getting so long now 😠; we need to refactor it) that we can point to.

Also the more I think about it, I wonder if it's misleading to talk about this in terms of "Python packages" when it really is just "add an entire directory tree where a Python module resides". Although that would run counter to my desire for a shorter method name 😁. But maybe this is a good opportunity to think a little bit more about the semantics of this helper function?

modal/image.py Outdated
self2._mounts = tuple(base_image._mounts) + (mount,)
self2._used_local_mounts = base_image._used_local_mounts | ({mount} if mount.is_local() else set())

return _Image._from_loader(_load, "ImageWithMounts()", deps=lambda: [base_image, mount])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ImageWithMounts() a new class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they just use the regular Image class - this is the repr string. Might have changed the repr it here for debugging reasons (would be nice to fix our object reprs to actually be meaningful/using some data from the specification)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(would be nice to fix our object reprs to actually be meaningful/using some data from the specification)
+1 yeah have been wanting to do this for a while, they could be a lot more userful

modal/image.py Outdated
@@ -553,6 +624,23 @@ def build_dockerfile(version: ImageBuilderVersion) -> DockerfileSpec:
context_mount=mount,
)

def add_local_python_packages(self, *packages: Union[str, Path], copy: bool = False) -> "_Image":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a Python API, would add_local_packages save users some typing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but what if the user has non-python packages locally (.deb?) that they want to add? 😬

I agree it's a bit on the long end, but this also reflects the existing Mount.from_local_python_packages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some more thinking... see below

modal/image.py Outdated
@@ -553,6 +624,23 @@ def build_dockerfile(version: ImageBuilderVersion) -> DockerfileSpec:
context_mount=mount,
)

def add_local_python_packages(self, *packages: Union[str, Path], copy: bool = False) -> "_Image":
"""Attaches local Python packages to the container running the image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we avoid introducing new terminology ("attach") if we're not otherwise going to use it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that surprised me when testing — but maybe it shouldn't have — is that adding a local package (really a third-party package; I was just reaching for something quick) didn't include its dependencies.

That makes sense from one perspective (this operation is more like copying a directory with some sugar to reference the directory via PYTHONPATH) but I wonder if it will surprise users? Since "adding a package" usually has some notion of dependency resolution.

Copy link
Contributor Author

@freider freider Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Attach" was an oversight when I renamed it again, thanks for spotting!

Yeah, adding third party packages or packages that have real install recipes via this method isn't intended, but I can see how people would do that. The main intent is for it to be used for directories that are part of the user's current python projet, and the interface has been using the module name as a convenience for pointing to that directory via the current sys.path.

Basically a short form for: add_local_dir(lookup_package("my_package"), filter=modal.file_filters.PyFiles) (assuming we had lookup_package() and filters...

I have separately been thinking that it would be nice to just point to a "project directory" when setting up the user's image, which is very similar to this but using file paths, and maybe that's a better way to do this?
The problem with that is that specifying file system paths in a portable way is so un-ergonomic for even simple cases, e.g:

E.g.

Image.debian_slim().add_local_dir(
    Path(__file__).parent,
    remote_path="/root/some_root_package_name",
    condition=python_mount_condition
)

Preferably I'd want it to work like above with default arguments of using:

Image.debian_slim().add_project_dir(".")  # use directory of current file as project dir - "standard", only include python files (or .modalignore?)

And then have some very easy ways to customize this:

Image.debian_slim().add_project_dir("..")  # use parent directory as root, using relative paths from the current file's dir or cwd? (the latter probably cleanest to implement)
Image.debian_slim().add_project_dir(".", filter=modal.filters.AllFiles)  # use all files instead of just python files
# this is where we could also take advantage of new glob/modalignore support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename add_local_python_packages to a "private _add_local_python_packages for now to get the rest of the changes merged in while we discuss the best way of "adding my local python files"

modal/image.py Outdated
Comment on lines 633 to 634
By default (copy=False) the packages are layered on top of the image when
the container starts up and not built as an image layer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble parsing "packages are layered ... not built as an image layer".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote:

Adds Python Package Files to Containers

Adds all files from the specified Python packages to containers running the image.

Packages are added to the /root directory, which is on the PYTHONPATH of any executed Modal functions.

By default (copy=False), the files are added to containers on startup and are not built into the actual image, which speeds up deployment.

Set copy=True to copy the files into an image layer at build time instead. This can slow down iteration since it requires a rebuild of the image and any subsequent build steps whenever the included files change, but it is required if you want to run additional build steps after this one.

Note: This excludes all dot-prefixed subdirectories or files and all .pyc/__pycache__ files. To add full directories with finer control, use .add_local_dir() instead.



def hydrate_image(img, client):
# there should be a more straight forward way to do this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

modal/image.py Outdated
Comment on lines 283 to 284
self._used_local_mounts = frozenset()
self._mounts = ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between _used_local_mounts and _mounts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, there is a docstring further up, but I renamed them now to make it clearer:

_used_local_mounts -> _serve_mounts # used for modal serve mount watching - also includes any context mounts for non-add operation, like copy_...
_mounts -> _deferred_mounts # these are the new "lazy" add to container on startup-mounts introduced by this pr

modal/image.py Outdated
Comment on lines 630 to 631
Packages are added to the /root directory which is on the PYTHONPATH of any
executed Modal functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that this is in the container?

Also might want to use capitalization for our proper nouns to Jono stays happy :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which nouns? Function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And Image?

modal/image.py Outdated
raise InvalidError(
textwrap.dedent(
"""
An image tried to run a build step after using `image.add_local_*` to include local files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related but one sort of annoying thing is that verbose error messages end up looking bad in the terminal since we print them twice:

image

Also can we error on the side of shorter lines? (Or longer lines and let rich handle the wrapping?) With narrow terminals this gets a little garbled:
image

modal/image.py Outdated
Comment on lines 340 to 341
filechange. Modal will then mount these files as a thin layer when starting your container, saving build
time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "thin layer" terminology used anywhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, changed

required package files change, but it allows you to run additional build
steps after this operation.
"""
mount = _Mount.from_local_python_packages(*packages)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we take the opportunity to break with the existing behavior of including data assets and everything else within the directory containing the Python package, at least by default? That's one of our most surprising and problem-causing behaviors IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the current behavior for explicit from_local_packages and by this code. The only things that are excluded are .-prefixed files/directories and __pycache__/.pyc.

It's the auto mounts today that exclude everything that's not a .py-file. Very consistent, I know...

Copy link
Contributor

@mwaskom mwaskom Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the auto mounts today that exclude everything that's not a .py-file.

I thought auto mounting now included most files? That's how users get themselves into trouble when they automount a project directory that has a lot of data in it.

I agree that you're mostly mirroring the existing behavior of Mount.from_local_python_packages but I'm wondering should we take this opportunity to break things about that behavior that have proven problematic.

(I also think we should add the dockerignore conditions to these interfaces, which will help a bit, but my suspicion is that it's easier to debug "this add_local_python_packages command is not including my data files" than "my Image build is really slow and seems to be uploading a lot of stuff". But I'm not totally sure.)

@freider freider merged commit 49d2962 into main Nov 21, 2024
21 checks passed
@freider freider deleted the freider/mount-on-image branch November 21, 2024 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants