-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mounts on images [CLI-4] - image.add_local_python_package
#2343
Conversation
modal/image.py
Outdated
raise InvalidError( | ||
textwrap.dedent( | ||
""" | ||
It's recommended to run any `image.add_*` commands for adding local resources last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions for how to make this wall of text easy to understand would be appreciated :D
image.add_local_python_package
image.attach_local_python_package
image.attach_local_python_package
image.add_local_python_package
modal/image.py
Outdated
raise InvalidError( | ||
textwrap.dedent( | ||
""" | ||
An image tried to run a build step after using `image.add_local_*` to include local files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mwaskom any feedback on this error message copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -633,6 +634,7 @@ def from_args( | |||
if proxy: | |||
# HACK: remove this once we stop using ssh tunnels for this. | |||
if image: | |||
# TODO(elias): this will cause an error if users use prior `.add_local_*` commands without copy=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can probably do a little more here to help users understand this. Some of that would be aligning on clear terminology and using it consistently. It would also be good to add a conceptual explanation to the "Guide" page about containers (although it's getting so long now 😠; we need to refactor it) that we can point to.
Also the more I think about it, I wonder if it's misleading to talk about this in terms of "Python packages" when it really is just "add an entire directory tree where a Python module resides". Although that would run counter to my desire for a shorter method name 😁. But maybe this is a good opportunity to think a little bit more about the semantics of this helper function?
modal/image.py
Outdated
self2._mounts = tuple(base_image._mounts) + (mount,) | ||
self2._used_local_mounts = base_image._used_local_mounts | ({mount} if mount.is_local() else set()) | ||
|
||
return _Image._from_loader(_load, "ImageWithMounts()", deps=lambda: [base_image, mount]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ImageWithMounts()
a new class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they just use the regular Image class - this is the repr string. Might have changed the repr it here for debugging reasons (would be nice to fix our object reprs to actually be meaningful/using some data from the specification)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(would be nice to fix our object reprs to actually be meaningful/using some data from the specification)
+1 yeah have been wanting to do this for a while, they could be a lot more userful
modal/image.py
Outdated
@@ -553,6 +624,23 @@ def build_dockerfile(version: ImageBuilderVersion) -> DockerfileSpec: | |||
context_mount=mount, | |||
) | |||
|
|||
def add_local_python_packages(self, *packages: Union[str, Path], copy: bool = False) -> "_Image": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a Python API, would add_local_packages
save users some typing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but what if the user has non-python packages locally (.deb?) that they want to add? 😬
I agree it's a bit on the long end, but this also reflects the existing Mount.from_local_python_packages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some more thinking... see below
modal/image.py
Outdated
@@ -553,6 +624,23 @@ def build_dockerfile(version: ImageBuilderVersion) -> DockerfileSpec: | |||
context_mount=mount, | |||
) | |||
|
|||
def add_local_python_packages(self, *packages: Union[str, Path], copy: bool = False) -> "_Image": | |||
"""Attaches local Python packages to the container running the image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we avoid introducing new terminology ("attach") if we're not otherwise going to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that surprised me when testing — but maybe it shouldn't have — is that adding a local package (really a third-party package; I was just reaching for something quick) didn't include its dependencies.
That makes sense from one perspective (this operation is more like copying a directory with some sugar to reference the directory via PYTHONPATH) but I wonder if it will surprise users? Since "adding a package" usually has some notion of dependency resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Attach" was an oversight when I renamed it again, thanks for spotting!
Yeah, adding third party packages or packages that have real install recipes via this method isn't intended, but I can see how people would do that. The main intent is for it to be used for directories that are part of the user's current python projet, and the interface has been using the module name as a convenience for pointing to that directory via the current sys.path.
Basically a short form for: add_local_dir(lookup_package("my_package"), filter=modal.file_filters.PyFiles)
(assuming we had lookup_package()
and filters...
I have separately been thinking that it would be nice to just point to a "project directory" when setting up the user's image, which is very similar to this but using file paths, and maybe that's a better way to do this?
The problem with that is that specifying file system paths in a portable way is so un-ergonomic for even simple cases, e.g:
E.g.
Image.debian_slim().add_local_dir(
Path(__file__).parent,
remote_path="/root/some_root_package_name",
condition=python_mount_condition
)
Preferably I'd want it to work like above with default arguments of using:
Image.debian_slim().add_project_dir(".") # use directory of current file as project dir - "standard", only include python files (or .modalignore?)
And then have some very easy ways to customize this:
Image.debian_slim().add_project_dir("..") # use parent directory as root, using relative paths from the current file's dir or cwd? (the latter probably cleanest to implement)
Image.debian_slim().add_project_dir(".", filter=modal.filters.AllFiles) # use all files instead of just python files
# this is where we could also take advantage of new glob/modalignore support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rename add_local_python_packages
to a "private _add_local_python_packages
for now to get the rest of the changes merged in while we discuss the best way of "adding my local python files"
modal/image.py
Outdated
By default (copy=False) the packages are layered on top of the image when | ||
the container starts up and not built as an image layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble parsing "packages are layered ... not built as an image layer".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote:
Adds Python Package Files to Containers
Adds all files from the specified Python packages to containers running the image.
Packages are added to the /root
directory, which is on the PYTHONPATH
of any executed Modal functions.
By default (copy=False
), the files are added to containers on startup and are not built into the actual image, which speeds up deployment.
Set copy=True
to copy the files into an image layer at build time instead. This can slow down iteration since it requires a rebuild of the image and any subsequent build steps whenever the included files change, but it is required if you want to run additional build steps after this one.
Note: This excludes all dot-prefixed subdirectories or files and all .pyc
/__pycache__
files. To add full directories with finer control, use .add_local_dir()
instead.
|
||
|
||
def hydrate_image(img, client): | ||
# there should be a more straight forward way to do this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
modal/image.py
Outdated
self._used_local_mounts = frozenset() | ||
self._mounts = () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between _used_local_mounts
and _mounts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, there is a docstring further up, but I renamed them now to make it clearer:
_used_local_mounts
-> _serve_mounts
# used for modal serve mount watching - also includes any context mounts for non-add operation, like copy_...
_mounts
-> _deferred_mounts
# these are the new "lazy" add to container on startup-mounts introduced by this pr
modal/image.py
Outdated
Packages are added to the /root directory which is on the PYTHONPATH of any | ||
executed Modal functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify that this is in the container?
Also might want to use capitalization for our proper nouns to Jono stays happy :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which nouns? Function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And Image?
modal/image.py
Outdated
raise InvalidError( | ||
textwrap.dedent( | ||
""" | ||
An image tried to run a build step after using `image.add_local_*` to include local files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modal/image.py
Outdated
filechange. Modal will then mount these files as a thin layer when starting your container, saving build | ||
time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "thin layer" terminology used anywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, changed
required package files change, but it allows you to run additional build | ||
steps after this operation. | ||
""" | ||
mount = _Mount.from_local_python_packages(*packages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we take the opportunity to break with the existing behavior of including data assets and everything else within the directory containing the Python package, at least by default? That's one of our most surprising and problem-causing behaviors IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the current behavior for explicit from_local_packages
and by this code. The only things that are excluded are .
-prefixed files/directories and __pycache__
/.pyc
.
It's the auto mounts today that exclude everything that's not a .py-file. Very consistent, I know...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the auto mounts today that exclude everything that's not a .py-file.
I thought auto mounting now included most files? That's how users get themselves into trouble when they automount a project directory that has a lot of data in it.
I agree that you're mostly mirroring the existing behavior of Mount.from_local_python_packages
but I'm wondering should we take this opportunity to break things about that behavior that have proven problematic.
(I also think we should add the dockerignore conditions to these interfaces, which will help a bit, but my suspicion is that it's easier to debug "this add_local_python_packages
command is not including my data files" than "my Image build is really slow and seems to be uploading a lot of stuff". But I'm not totally sure.)
Adds the notion of a "mount layer" on Image, which is a way for image objects to track mounts that should be attached to any container that uses the image.
To prevent issues with overlapping file system changes in case of image steps that modify files specified in mount layers, a mount layer is always "materialized" into an actual image layer using "COPY" commands in case there is a subsequent non-mount layer added to the image.
For now, the only place we use these mount layers is in the new
image.add_local_python_packages()
method on image, but the plan is to roll out changes that replace all current user usages ofMount
(add_local_dir, add_local_file etc.) with mount layers and eventually get rid of Mounts as an external concept.Note that since the attached mounts are entirely a client-side concept, a hypothetical
Image.from_id
or similar would not be able to recover the mount information, but I don't think we use that anywhere where that would make sense.Backward/forward compatibility checks
---Check these boxes or delete any item (or this section) if not relevant for this PR.
Note on protobuf: protobuf message changes in one place may have impact to
multiple entities (client, server, worker, database). See points above.
Changelog
Image.add_local_python_packages
which works similarly toMount.from_local_python_packages
but for images.