-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration for new data package format #122
Conversation
dogversioning
commented
Oct 4, 2024
- Adds a migration for data packages (and column_types)
- Adds two new items to data packages
- Some cleanup of log names, and (hopefully) an addition to prevent errors in uploading.
☂️ Python Coverage
Overall Coverage
New Files
Modified Files
|
template.yaml
Outdated
@@ -77,11 +77,12 @@ Resources: | |||
Properties: | |||
FunctionName: !Sub 'CumulusAggFetchAuthorizer-${DeployStage}' | |||
Handler: src/handlers/site_upload/api_gateway_authorizer.lambda_handler | |||
Layers: [arn:aws:lambda:us-east-1:336392948345:layer:AWSSDKPandas-Python311:17] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i don't think this :should: be happening, but as near as I can tell: the reason uploads (and also some other bits of the upload pipeline/dashboard apis) broke is because in python 3.11 lambdas, the isolation between lambdas is different? Layers referencing a handler which didn't use pandas were previously fine, but now it seems like, since every lambda can access the entire projects code, without this layer they fail on importing pandas?
I'll look into this and figure out if there's a better isolation pathway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why would an import pandas
line even be hit in this case? Is the lambda now running a different code path because of a layer change?
That seems concerning (but maybe could happen if we now are pulling in a newer version of some pip package that added a pandas dep?).
Did we accidentally introduce an unexpected dependency on pandas ourselves and that's what's causing this? Like, do we have stacktraces for where the import comes from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no stack trace in the error message, but in the light of a fresh day with morning brain and doing it by hand, you're right - the dependency on pandas was added to the shared functions.py, which most everything imports.
I could move that function to a different shared location?
I was thinking about this anyway when this occured, but in light of this being the root cause, and now that we have the environment for it, i'll add an issue about running a regression against the APIs.
enums.ColumnTypesKeys.COLUMN_TYPES_FORMAT_VERSION.value: "1", | ||
enums.ColumnTypesKeys.COLUMN_TYPES_FORMAT_VERSION.value: "2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain to a n00b what implications this has?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At :some: point - it might be useful for a downstream consumer to realize the format has changed. Right now, it's primarily documentation.
This version change does mean that insight needs to be updated to expect a different kind of data back.
template.yaml
Outdated
@@ -77,11 +77,12 @@ Resources: | |||
Properties: | |||
FunctionName: !Sub 'CumulusAggFetchAuthorizer-${DeployStage}' | |||
Handler: src/handlers/site_upload/api_gateway_authorizer.lambda_handler | |||
Layers: [arn:aws:lambda:us-east-1:336392948345:layer:AWSSDKPandas-Python311:17] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why would an import pandas
line even be hit in this case? Is the lambda now running a different code path because of a layer change?
That seems concerning (but maybe could happen if we now are pulling in a newer version of some pip package that added a pandas dep?).
Did we accidentally introduce an unexpected dependency on pandas ourselves and that's what's causing this? Like, do we have stacktraces for where the import comes from?