-
-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for packaging python AWS Lambda layers #19123
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, yay! Thanks for making our Lambda support miles better.
The implementation looks great to me. In response to your question:
It does seem weird for a layer to have an entry point, and I didn't quite get your point about macros, or why in your example the layer would care about foo.py at all (assuming that is the entry point). Surely the layer doesn't know about entry points that use it?
It seems like "a lambda uses a layer" is modeled by a dep, not an entry point. And it would be great for Pants to provide lambda publishing that prevents the user from having to generate metadata that just recapitulates these dependencies.
But there are a few caveats:
-
Order does matter for layers - they are merged in the order in which they are specified in the function's metadata, and so to create that metadata we would have to preserve the order. Does the dep field preserve order?
-
A function can only have 5 layers (not sure if that includes the function itself).
-
There's no transitivity - a layer can't depend on other layers.
So we'd have to do some validation if we model these as deps. The alternative is to give the lambda target a field that is a flat list of layers.
And in the mix is dep inference - is that likely to be possible or useful here?
Ah, I think you're thinking about this more expansively than me: I was just thinking of Layers as just another 'leaf' package type that's independent of any particular Lambda, and it's up to users to create them and deploy them in the right way... One way to do that would be to have macro
# macros.py
def lambda_with_layer(*, name, **kwargs):
python_awslambda(
name=name,
**kwargs,
include_requirements=False
)
python_awslambda(
name=f"{name}-layer",
**kwargs,
layout="layer",
) # BUILD
python_sources()
lambda_with_layer(
name="foo-function",
entry_point="./foo.py:handler"
) Then, the deploy infra (Terraform or CDK or whatever) would deploy a lambda by referring to the appropriate per-Lambda Layer package and the source Lambda package (e.g. in my work repo, we'd have a CDK construct that takes the name and constructs a For this particular use case, it's convenient to be able to specify what to build for the Layer in the same way as for the Lambda, so the macro can take
I wasn't thinking of the connection between layer and function as a pants level thing... and I'm not sure it fully can be, as I think pants can't/won't be inserting itself into the deploy pipeline to deploy a Layer and then add it to the appropriate Functions... As in, I'm not sure how to translate a pants-level dependency between a Function and a Layer into a deployed one, other than a human writing that logic into their deploy infra manually? (At least, not for 'interesting' deploys, e.g. a lambda as part of a broader CDK/CloudFormation stack) But some sort of pants smarts does seem very valuable. E.g. hypothetically pants could exclude from the lambda (or other layers) any sources/requirements that have already been included in earlier layers: # a 'core' layer that contains things used by literally every lambda
python_awslambda(
name="core-layer",
dependencies=["boto3", "some-logging-library"]
layout="layer",
)
# any extra dependencies required for foo.py
python_awslambda(
name="specific-layer",
entry_point="./foo.py:func",
layers=[":core-layer"], # pants-level transitive dependency, flattened into any users of :specific-layer
layout="layer",
)
python_awslambda(
name="lambda"
entry_point="./foo.py:func",
layers=[":specific-layer"], # automatically exclude boto3 or anything else in :core-layer or :specific-layer
) (Still using the single-target and In any case, I feel that might be worth considering as a 'value add'/follow-up on top of the raw layer building functionality, since e.g. someone might want to be publishing layers directly for others to consume, or otherwise need fine-grained control.
I think, mostly, these have to be managed by users: it's up to them to arrange their layers to not need more than 6 layer (i.e. if they split it up to much, they'll have to unspilt), and similarly for putting them in the correct order.
I'm not sure how one might infer dependencies usefully, because the breakdown of packages into Layers/Functions feels like a human-level thing, but I'm more than happy to hear ideas! |
Got it, yeah, fair to just have a layer be a standalone leaf thing and it's up to the user to combine layers into a function in the lambda metadata. We can defer any further cleverness to future improvements. But I still don't see why the layer needs an entry point field. A macro can still create one, no? It would just have to slightly futz with the kwargs, which is fine. I'd be reluctant to have inaccurate modeling just for the convenience of a high-level macro. So I think maybe a new target type does make sense (it's composed of the same fields, except for entry point, so it's not a lot of boilerplate): |
Sounds good. I guess a macro can take two arguments and compose them: # macros.py
def lambda_with_layer(*, name, entry_point_file, entry_point_func, **kwargs):
python_awslambda(
name=name,
entry_point="{entry_point_file}:{entry_point_func}",
**kwargs,
include_requirements=False
)
python_awslambda_layer(
name=f"{name}-layer",
**kwargs,
dependencies=[entry_point_file],
) So the outcome of this might be:
I think this will mean the Python import path ( |
That, or the macro can copy and modify the kwargs dict.
Now I'm confused again. Why does a layer need an entry point specification? This seems like the one thing that a function has and a layer doesn't. But in any case, the module:func should surely work? |
Ah sorry, I meant "specifying where to start when computing the transitive set of sources/requirements", provided via the layer target's |
I think just start with all the explicit dependencies, no? |
ba36803
to
87c7f1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated to introduce a new python_aws_lambda_layer
target. This feels better, other than the implementation being a bit non-DRY. Thanks for brainstorming.
I've rewritten the PR description to match this new state.
|
||
|
||
@dataclass(frozen=True) | ||
class PythonAwsLambdaLayerFieldSet(PackageFieldSet): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and the rules below) end up rather non-DRY, but the repetition is generally uninteresting (just splat the fieldset field into the matching request field). Let me know if there's a good way to DRY it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shared base class for the FieldSet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done that, although the more interesting duplication is constructing the BuildPythonFaaSRequest
and I don't know of a good way to DRY that.
PythonAwsLambdaIncludeRequirements, | ||
PythonAwsLambdaIncludeSources, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it didn't hurt to have a include_sources=
field on python_awslambda
functions in addition to layers, so this is in the shared fields. Let me know if you'd prefer it to be specific to layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit weird not to, I think? What would it mean to have a function without sources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one could have sources in a layer and only (a single?) third party dep in a layer, and the specify something in env vars to direct that dep to call into the sources...
That's pretty weird, and I don't know of any concrete example that actually does this (or why one would want the third-party dep in the function without the sources...). I've moved include_sources
to just python_aws_lambda_layer
to reduce the permutations/simplify the docs.
87c7f1d
to
33de0bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff!
This adjusts the AWS Lambda and Google Cloud Function documentation for the new Zip layout, added in #19076 and targeted for 2.17. This PR is just what's required for 2.17, ready to cherry-pick. The "Migrating" section is written with this in mind. It will require adjustment for 2.18 to reflect the change in defaults, and, hopefully, support for AWS Lambda Layers (#18880, #19123). Fixes #19067
This adjusts the AWS Lambda and Google Cloud Function documentation for the new Zip layout, added in pantsbuild#19076 and targeted for 2.17. This PR is just what's required for 2.17, ready to cherry-pick. The "Migrating" section is written with this in mind. It will require adjustment for 2.18 to reflect the change in defaults, and, hopefully, support for AWS Lambda Layers (pantsbuild#18880, pantsbuild#19123). Fixes pantsbuild#19067
This adjusts the AWS Lambda and Google Cloud Function documentation for the new Zip layout, added in #19076 and targeted for 2.17. This PR is just what's required for 2.17, ready to cherry-pick. The "Migrating" section is written with this in mind. It will require adjustment for 2.18 to reflect the change in defaults, and, hopefully, support for AWS Lambda Layers (#18880, #19123). Fixes #19067
This renames the `python_awslambda` target to `python_aws_lambda_function`. This is to reduce ambiguity now that Pants has support for building layers via `python_aws_lambda_layer` (#19123), in addition to functions, and also aligns with the `python_google_cloud_function` target too. This change has a single release deprecation since it's easy to fix, and keeps it inline with the layout switch over (e.g. #19122), but it's also easy to keep around for longer. This PR essentially just does a search and replace on the documentation, it doesn't update the docs to include info about layers. I'm working on that separately.
This updates the AWS Lambda and Google Cloud Functions documentation to be appropriate for Pants 2.18, which includes: - updates for the migration from Lambdex to 'native'/zip (continuing #19067 and #19122) - describing how to build a Lambda layer (documenting #19123) - updating the AWS Lambda docs for renaming from `python_awslambda` to `python_aws_lambda_function` (finishing off #19216)
Issue tracking one possible improvement: #19256 |
This ensures the `python_aws_lambda_layer` target type exists when activating the `pants.backend.awslambda.python` backend. This was missed from #19123, oops!
In yet more follow up to #19123 and #19550 for #18880, this ensures the `PythonAwsLambdaLayerFieldSet` is part of the `PackageFieldSet` union, so that `pants package path/to:some-layer` actually works. I clearly didn't test this properly in #19123 or #19550, but now I have: in a separate repo `PANTS_SOURCE=~/... pants package path/to:some-layer` produces `dist/path.to/some-layer.zip`, with the expected contents. 🎉
This fixes #18880 by making pants able to create a AWS Lambda package with the layout expected by a "Layer" (https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html). For Python, this the same as a normal Lambda, except within a
python/
:python/cowsay/__init__.py
will be importable viaimport cowsay
.The big win here is easily separating requirements from sources:
Packaging that will result in
path.to/lambda.zip
that contains only first-party sources, andpath.to/layer.zip
that contains only third-party requirements. This results in faster builds, less cache usage, and smaller deploy packages when only changing first-party sources. This side-steps some of the slowness in #19076. For the example in that PR:That is, the first-party-only lambda package ~1000× smaller, and is ~2× faster to build than even the Lambdex version.
This uses a separate target,
python_aws_lambda_layer
. The target has its inputs configured using thedependencies=[...]
field. For instance, the example above is saying create a lambda using all of the third-party requirements required (transitively) by./foo.py
, and none of the first-party sources.(The initial implementation just added a
layout="layer"
option to the existingpython_awslambda
target, but, per the discussion in this PR, that was deemed unnecessarily confusing, e.g. it'd keep thehandler
field around, which is meaningless for a layer.)Follow-up not handled here:
python_awslambda
target topython_aws_lambda_function
to be clearer (NB. I proposing also taking the chance to add an underscore in the name, which I've done with thepython_aws_lambda_layer
target. Let me know if there's a strong reason for theawslambda
name without an underscore. I note the GCF target ispython_google_cloud_function
)python_aws_lambda_function
specify which layers it will be used with, and thus automatically exclude the contents of the layer from the function (e.g. the example above could hypothetically replaceinclude_requirements=False
withlayers=[":layer"]
)The commits are individually reviewable.