Add support for packaging python AWS Lambda layers #19123

huonw · 2023-05-23T22:48:52Z

This fixes #18880 by making pants able to create a AWS Lambda package with the layout expected by a "Layer" (https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html). For Python, this the same as a normal Lambda, except within a python/: python/cowsay/__init__.py will be importable via import cowsay.

The big win here is easily separating requirements from sources:

# path/to/BUILD
python_sources()

python_awslambda(
  name="lambda",
  entry_point="./foo.py:handler",
  include_requirements=False
)
python_aws_lambda_layer(
  name="layer",
  dependencies=["./foo.py"],
  include_requirements=True,
  include_sources=False,
)

Packaging that will result in path.to/lambda.zip that contains only first-party sources, and path.to/layer.zip that contains only third-party requirements. This results in faster builds, less cache usage, and smaller deploy packages when only changing first-party sources. This side-steps some of the slowness in #19076. For the example in that PR:

metric	Lambdex	zip, no layers	using zip + layers (this PR)
init time on cold start	2.3-2.5s	1.3-1.4s	not yet tested
compressed size	24.6MB	23.8MB	28KB (lambda), 23.8MB (layer)
uncompressed size	117.8MB	115.8MB	72KB (lambda), 115.7MB (layer)
PEX-construction build time	~5s	~5s	~1.5s (lambda), ~5s (layer)
PEX-postprocessing build time	0.14s	4.8s	1s (lambda), ~5s (layer)

That is, the first-party-only lambda package ~1000× smaller, and is ~2× faster to build than even the Lambdex version.

This uses a separate target, python_aws_lambda_layer. The target has its inputs configured using the dependencies=[...] field. For instance, the example above is saying create a lambda using all of the third-party requirements required (transitively) by ./foo.py, and none of the first-party sources.

(The initial implementation just added a layout="layer" option to the existing python_awslambda target, but, per the discussion in this PR, that was deemed unnecessarily confusing, e.g. it'd keep the handler field around, which is meaningless for a layer.)

Follow-up not handled here:

for 2.18:
- documentation
- renaming the python_awslambda target to python_aws_lambda_function to be clearer (NB. I proposing also taking the chance to add an underscore in the name, which I've done with the python_aws_lambda_layer target. Let me know if there's a strong reason for the awslambda name without an underscore. I note the GCF target is python_google_cloud_function)
not necessarily for 2.18:
- potentially, adding "sizzle" like the ability to have python_aws_lambda_function specify which layers it will be used with, and thus automatically exclude the contents of the layer from the function (e.g. the example above could hypothetically replace include_requirements=False with layers=[":layer"])

The commits are individually reviewable.

benjyw

First of all, yay! Thanks for making our Lambda support miles better.

The implementation looks great to me. In response to your question:

It does seem weird for a layer to have an entry point, and I didn't quite get your point about macros, or why in your example the layer would care about foo.py at all (assuming that is the entry point). Surely the layer doesn't know about entry points that use it?

It seems like "a lambda uses a layer" is modeled by a dep, not an entry point. And it would be great for Pants to provide lambda publishing that prevents the user from having to generate metadata that just recapitulates these dependencies.

But there are a few caveats:

Order does matter for layers - they are merged in the order in which they are specified in the function's metadata, and so to create that metadata we would have to preserve the order. Does the dep field preserve order?
A function can only have 5 layers (not sure if that includes the function itself).
There's no transitivity - a layer can't depend on other layers.

So we'd have to do some validation if we model these as deps. The alternative is to give the lambda target a field that is a flat list of layers.

And in the mix is dep inference - is that likely to be possible or useful here?

src/python/pants/backend/awslambda/python/target_types.py

huonw · 2023-05-26T00:07:21Z

It does seem weird for a layer to have an entry point, and I didn't quite get your point about macros, or why in your example the layer would care about foo.py at all (assuming that is the entry point). Surely the layer doesn't know about entry points that use it?

Ah, I think you're thinking about this more expansively than me: I was just thinking of Layers as just another 'leaf' package type that's independent of any particular Lambda, and it's up to users to create them and deploy them in the right way...

One way to do that would be to have macro lambda_with_layer (or whatever) that splits each given lambda into two targets:

a Layer for all third party deps
a Function for all first party source

# macros.py
def lambda_with_layer(*, name, **kwargs):
  python_awslambda(
    name=name,
    **kwargs,
    include_requirements=False
  )
  python_awslambda(
    name=f"{name}-layer",
    **kwargs,
    layout="layer",
  )

# BUILD
python_sources()

lambda_with_layer(
   name="foo-function",
   entry_point="./foo.py:handler"
)

Then, the deploy infra (Terraform or CDK or whatever) would deploy a lambda by referring to the appropriate per-Lambda Layer package and the source Lambda package (e.g. in my work repo, we'd have a CDK construct that takes the name and constructs a Layer with {name}-layer.zip and a Function with the {name}.zip, and configures the Function to use the Layer).

For this particular use case, it's convenient to be able to specify what to build for the Layer in the same way as for the Lambda, so the macro can take entry_point="./foo.py:func and pass that to both targets. That is, the layer doesn't really have an entry point, it's just saying "build a layer using this entry point as the starting point for what to include".

It seems like "a lambda uses a layer" is modeled by a dep, not an entry point. And it would be great for Pants to provide lambda publishing that prevents the user from having to generate metadata that just recapitulates these dependencies.

I wasn't thinking of the connection between layer and function as a pants level thing... and I'm not sure it fully can be, as I think pants can't/won't be inserting itself into the deploy pipeline to deploy a Layer and then add it to the appropriate Functions... As in, I'm not sure how to translate a pants-level dependency between a Function and a Layer into a deployed one, other than a human writing that logic into their deploy infra manually? (At least, not for 'interesting' deploys, e.g. a lambda as part of a broader CDK/CloudFormation stack)

But some sort of pants smarts does seem very valuable. E.g. hypothetically pants could exclude from the lambda (or other layers) any sources/requirements that have already been included in earlier layers:

# a 'core' layer that contains things used by literally every lambda
python_awslambda(
  name="core-layer",
  dependencies=["boto3", "some-logging-library"]
  layout="layer",
)

# any extra dependencies required for foo.py
python_awslambda(
   name="specific-layer",
   entry_point="./foo.py:func",
   layers=[":core-layer"], # pants-level transitive dependency, flattened into any users of :specific-layer
   layout="layer",
)

python_awslambda(
  name="lambda"
  entry_point="./foo.py:func",
  layers=[":specific-layer"], # automatically exclude boto3 or anything else in :core-layer or :specific-layer
)

(Still using the single-target and entry_point phrasing, but that can be changed.)

In any case, I feel that might be worth considering as a 'value add'/follow-up on top of the raw layer building functionality, since e.g. someone might want to be publishing layers directly for others to consume, or otherwise need fine-grained control.

But there are a few caveats:

I think, mostly, these have to be managed by users: it's up to them to arrange their layers to not need more than 6 layer (i.e. if they split it up to much, they'll have to unspilt), and similarly for putting them in the correct order.

And in the mix is dep inference - is that likely to be possible or useful here?

I'm not sure how one might infer dependencies usefully, because the breakdown of packages into Layers/Functions feels like a human-level thing, but I'm more than happy to hear ideas!

src/python/pants/backend/awslambda/python/target_types.py

benjyw · 2023-05-26T19:59:38Z

Got it, yeah, fair to just have a layer be a standalone leaf thing and it's up to the user to combine layers into a function in the lambda metadata. We can defer any further cleverness to future improvements.

But I still don't see why the layer needs an entry point field. A macro can still create one, no? It would just have to slightly futz with the kwargs, which is fine. I'd be reluctant to have inaccurate modeling just for the convenience of a high-level macro.

So I think maybe a new target type does make sense (it's composed of the same fields, except for entry point, so it's not a lot of boilerplate): python_awslambda_layer, and python_awslambda_function can be an alias for python_awslambda until we deprecate that name?

huonw · 2023-05-26T22:00:05Z

Sounds good.

I guess a macro can take two arguments and compose them:

# macros.py
def lambda_with_layer(*, name, entry_point_file, entry_point_func, **kwargs):
  python_awslambda(
    name=name,
    entry_point="{entry_point_file}:{entry_point_func}",
    **kwargs,
    include_requirements=False
  )
  python_awslambda_layer(
    name=f"{name}-layer",
    **kwargs,
    dependencies=[entry_point_file],
  )

So the outcome of this might be:

separate target for layers
a layer is constructed only using inputs from (transitive) dependencies without a dedicated field for "entry point" or whatever
Follow-up: rename the function target

I think this will mean the Python import path (module.path:func) style of entry point specification won't work for layers: a macro like the above will always have to use entry_point_file="./module/path.py", because dependencies=["module.path"] doesn't work (I think?). This is probably okay!

benjyw · 2023-05-26T22:21:18Z

I guess a macro can take two arguments and compose them:

That, or the macro can copy and modify the kwargs dict.

I think this will mean the Python import path (module.path:func) style of entry point specification won't work for layers

Now I'm confused again. Why does a layer need an entry point specification? This seems like the one thing that a function has and a layer doesn't. But in any case, the module:func should surely work?

huonw · 2023-05-26T23:56:12Z

Ah sorry, I meant "specifying where to start when computing the transitive set of sources/requirements", provided via the layer target's dependencies, as in the macro example.

benjyw · 2023-05-27T00:40:12Z

Ah sorry, I meant "specifying where to start when computing the transitive set of sources/requirements", provided via the layer target's dependencies, as in the macro example.

I think just start with all the explicit dependencies, no?

huonw

I've updated to introduce a new python_aws_lambda_layer target. This feels better, other than the implementation being a bit non-DRY. Thanks for brainstorming.

I've rewritten the PR description to match this new state.

huonw · 2023-05-28T11:37:03Z

src/python/pants/backend/awslambda/python/rules.py

+
+
+@dataclass(frozen=True)
+class PythonAwsLambdaLayerFieldSet(PackageFieldSet):


This (and the rules below) end up rather non-DRY, but the repetition is generally uninteresting (just splat the fieldset field into the matching request field). Let me know if there's a good way to DRY it more.

Shared base class for the FieldSet?

I've done that, although the more interesting duplication is constructing the BuildPythonFaaSRequest and I don't know of a good way to DRY that.

huonw · 2023-05-28T11:38:05Z

src/python/pants/backend/awslambda/python/target_types.py

        PythonAwsLambdaIncludeRequirements,
+        PythonAwsLambdaIncludeSources,


I thought it didn't hurt to have a include_sources= field on python_awslambda functions in addition to layers, so this is in the shared fields. Let me know if you'd prefer it to be specific to layers.

It is a bit weird not to, I think? What would it mean to have a function without sources?

I guess one could have sources in a layer and only (a single?) third party dep in a layer, and the specify something in env vars to direct that dep to call into the sources...

That's pretty weird, and I don't know of any concrete example that actually does this (or why one would want the third-party dep in the function without the sources...). I've moved include_sources to just python_aws_lambda_layer to reduce the permutations/simplify the docs.

benjyw

Great stuff!

This adjusts the AWS Lambda and Google Cloud Function documentation for the new Zip layout, added in #19076 and targeted for 2.17. This PR is just what's required for 2.17, ready to cherry-pick. The "Migrating" section is written with this in mind. It will require adjustment for 2.18 to reflect the change in defaults, and, hopefully, support for AWS Lambda Layers (#18880, #19123). Fixes #19067

This adjusts the AWS Lambda and Google Cloud Function documentation for the new Zip layout, added in pantsbuild#19076 and targeted for 2.17. This PR is just what's required for 2.17, ready to cherry-pick. The "Migrating" section is written with this in mind. It will require adjustment for 2.18 to reflect the change in defaults, and, hopefully, support for AWS Lambda Layers (pantsbuild#18880, pantsbuild#19123). Fixes pantsbuild#19067

This adjusts the AWS Lambda and Google Cloud Function documentation for the new Zip layout, added in #19076 and targeted for 2.17. This PR is just what's required for 2.17, ready to cherry-pick. The "Migrating" section is written with this in mind. It will require adjustment for 2.18 to reflect the change in defaults, and, hopefully, support for AWS Lambda Layers (#18880, #19123). Fixes #19067

This renames the `python_awslambda` target to `python_aws_lambda_function`. This is to reduce ambiguity now that Pants has support for building layers via `python_aws_lambda_layer` (#19123), in addition to functions, and also aligns with the `python_google_cloud_function` target too. This change has a single release deprecation since it's easy to fix, and keeps it inline with the layout switch over (e.g. #19122), but it's also easy to keep around for longer. This PR essentially just does a search and replace on the documentation, it doesn't update the docs to include info about layers. I'm working on that separately.

This updates the AWS Lambda and Google Cloud Functions documentation to be appropriate for Pants 2.18, which includes: - updates for the migration from Lambdex to 'native'/zip (continuing #19067 and #19122) - describing how to build a Lambda layer (documenting #19123) - updating the AWS Lambda docs for renaming from `python_awslambda` to `python_aws_lambda_function` (finishing off #19216)

huonw · 2023-06-06T03:15:37Z

We can defer any further cleverness to future improvements

Issue tracking one possible improvement: #19256

This ensures the `python_aws_lambda_layer` target type exists when activating the `pants.backend.awslambda.python` backend. This was missed from #19123, oops!

In yet more follow up to #19123 and #19550 for #18880, this ensures the `PythonAwsLambdaLayerFieldSet` is part of the `PackageFieldSet` union, so that `pants package path/to:some-layer` actually works. I clearly didn't test this properly in #19123 or #19550, but now I have: in a separate repo `PANTS_SOURCE=~/... pants package path/to:some-layer` produces `dist/path.to/some-layer.zip`, with the expected contents. 🎉

huonw added the category:new feature label May 23, 2023

huonw marked this pull request as ready for review May 23, 2023 23:24

huonw requested a review from benjyw May 23, 2023 23:24

huonw changed the title ~~Add support for Lambda layers~~ Add support for packaging python AWS Lambda layers May 23, 2023

benjyw reviewed May 25, 2023

View reviewed changes

src/python/pants/backend/awslambda/python/target_types.py Outdated Show resolved Hide resolved

huonw commented May 26, 2023

View reviewed changes

src/python/pants/backend/awslambda/python/target_types.py Outdated Show resolved Hide resolved

huonw mentioned this pull request May 28, 2023

Adjust FaaS docs for layout = "zip" #19180

Merged

huonw added 2 commits May 28, 2023 21:10

Support pex3 venv create --prefix=...

84c1762

Add include_sources, prefix_in_artifact fields, optional reexport

0dee042

huonw force-pushed the feature/18880-lambda-layers branch 3 times, most recently from ba36803 to 87c7f1d Compare May 28, 2023 11:43

huonw commented May 28, 2023

View reviewed changes

huonw requested a review from benjyw May 28, 2023 11:44

Implement python_aws_lambda_target, and include_sources

33de0bb

huonw force-pushed the feature/18880-lambda-layers branch from 87c7f1d to 33de0bb Compare May 28, 2023 22:58

Review: only sources for layers, base class for field set

57adb0b

benjyw approved these changes May 30, 2023

View reviewed changes

huonw merged commit 318c853 into pantsbuild:main May 30, 2023

huonw deleted the feature/18880-lambda-layers branch May 30, 2023 05:48

huonw mentioned this pull request May 30, 2023

Adjust FaaS docs for layout = "zip" (Cherry-pick of #19180) #19198

Merged

huonw mentioned this pull request Jun 1, 2023

Rename python_awslambda to python_aws_lambda_function #19216

Merged

huonw mentioned this pull request Jun 1, 2023

Update FaaS docs for 2.18, including Lambda layers #19217

Merged

huonw mentioned this pull request Jun 6, 2023

Allow Lambda targets to point to python_aws_lambda_layers and exclude code automatically #19256

Open

huonw mentioned this pull request Aug 5, 2023

Register the python_aws_lambda_layer target type #19550

Merged

huonw added a commit that referenced this pull request Aug 6, 2023

Register the python_aws_lambda_layer target type (#19550)

fa5cf61

This ensures the `python_aws_lambda_layer` target type exists when activating the `pants.backend.awslambda.python` backend. This was missed from #19123, oops!

huonw mentioned this pull request Aug 7, 2023

Mark python_aws_lambda_layer as packageable #19562

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for packaging python AWS Lambda layers #19123

Add support for packaging python AWS Lambda layers #19123

huonw commented May 23, 2023 •

edited

Loading

benjyw left a comment

huonw commented May 26, 2023 •

edited

Loading

benjyw commented May 26, 2023

huonw commented May 26, 2023

benjyw commented May 26, 2023

huonw commented May 26, 2023 •

edited

Loading

benjyw commented May 27, 2023

huonw left a comment

huonw May 28, 2023

benjyw May 28, 2023

huonw May 29, 2023

huonw May 28, 2023

benjyw May 28, 2023

huonw May 29, 2023 •

edited

Loading

benjyw left a comment

huonw commented Jun 6, 2023



		@dataclass(frozen=True)
		class PythonAwsLambdaLayerFieldSet(PackageFieldSet):

		PythonAwsLambdaIncludeRequirements,
		PythonAwsLambdaIncludeSources,

Add support for packaging python AWS Lambda layers #19123

Add support for packaging python AWS Lambda layers #19123

Conversation

huonw commented May 23, 2023 • edited Loading

benjyw left a comment

Choose a reason for hiding this comment

huonw commented May 26, 2023 • edited Loading

benjyw commented May 26, 2023

huonw commented May 26, 2023

benjyw commented May 26, 2023

huonw commented May 26, 2023 • edited Loading

benjyw commented May 27, 2023

huonw left a comment

Choose a reason for hiding this comment

huonw May 28, 2023

Choose a reason for hiding this comment

benjyw May 28, 2023

Choose a reason for hiding this comment

huonw May 29, 2023

Choose a reason for hiding this comment

huonw May 28, 2023

Choose a reason for hiding this comment

benjyw May 28, 2023

Choose a reason for hiding this comment

huonw May 29, 2023 • edited Loading

Choose a reason for hiding this comment

benjyw left a comment

Choose a reason for hiding this comment

huonw commented Jun 6, 2023

huonw commented May 23, 2023 •

edited

Loading

huonw commented May 26, 2023 •

edited

Loading

huonw commented May 26, 2023 •

edited

Loading

huonw May 29, 2023 •

edited

Loading