Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support excluding packages from the build #95

Open
neilmehta24 opened this issue Dec 2, 2024 · 3 comments
Open

Support excluding packages from the build #95

neilmehta24 opened this issue Dec 2, 2024 · 3 comments
Labels
Category: Enhancement New feature or request

Comments

@neilmehta24
Copy link
Member

Python packages can transitively pull in many packages that are not strictly required to run an application. Including these packages in the published layer means that our users have to download large packages they don't necessarily need. Layers should be able to specify in their venvstacks.toml configuration what packages should be not installed as part of the build or publish. A package that is marked as excluded would ideally not pull in packages that are only required by the excluded layer.

A package that is marked as excluded should not affect the environment solving, but simply affect the output directory. This is so that an application layer doesn't install a package that was excluded by the framework layer; the package belongs to the framework layer, but is not emitted into the build directory anywhere.

The overall goal of this is to reduce the size of layers deployed to our users' machines.

@neilmehta24 neilmehta24 added the Category: Enhancement New feature or request label Dec 2, 2024
@ncoghlan
Copy link
Collaborator

ncoghlan commented Dec 3, 2024

That makes sense. Longer term, we could even do something similar to what treeshaker does: https://pypi.org/project/treeshaker/ (treeshaker itself wouldn't be the right solution, but something along those lines should be feasible, with application layer shaking based on the launch module contents, and framework and runtime layer shaking based on the full set of defined application layers that depend on them).

For manual exclusions, pack_venv already filters out some files during the export process with shutil.ignore_patterns.

Due to the import-package-vs-dist-package ambiguity in Python, as well as the import-module-vs-import-package situation, I'm leaning towards making this two separate settings on the layer:

  • exclude_import_name: rather than using the result of ignore_patterns directly, env exports will have a dedicated copytree filtering function that excludes directories and files (excluding their extensions) matching the given name in the site-packages folder. Distributions will still claim to be installed in the deployed environment, but some of their files will be missing.
  • exclude_dist_package: this would run importlib.metadata.files in the build environment for each of the given distribution names, and use that to get a full list of files to be excluded from the export process. This would also exclude the installation metadata, so the distribution won't even claim to be installed in the deployed environment.

@neilmehta24
Copy link
Member Author

exclude_dist_package seems like the one most relevant to us right now.

@neilmehta24
Copy link
Member Author

exclude_dist_package gets us 90% of what we need. But, how difficult is it to determine if other dependencies can be transitively excluded? For example, if we say exclude_dist_package=["X"] and package "Y" was only installed due to the requirement from package "X", can we exclude package "Y" too?

@ncoghlan ncoghlan changed the title The ability to exclude packages from the build Support excluding packages from the build Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants