Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration for new data package format #122

Merged
merged 3 commits into from
Oct 7, 2024

Conversation

dogversioning
Copy link
Contributor

  • Adds a migration for data packages (and column_types)
  • Adds two new items to data packages
  • Some cleanup of log names, and (hopefully) an addition to prevent errors in uploading.

Copy link

github-actions bot commented Oct 4, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
657 640 97% 90% 🟢

New Files

File Coverage Status
src/handlers/shared/pandas_functions.py 100% 🟢
TOTAL 100% 🟢

Modified Files

File Coverage Status
src/handlers/shared/functions.py 94% 🟢
src/handlers/site_upload/cache_api.py 100% 🟢
src/handlers/site_upload/powerset_merge.py 97% 🟢
TOTAL 97% 🟢

updated for commit: 31017da by action🐍

template.yaml Outdated
@@ -77,11 +77,12 @@ Resources:
Properties:
FunctionName: !Sub 'CumulusAggFetchAuthorizer-${DeployStage}'
Handler: src/handlers/site_upload/api_gateway_authorizer.lambda_handler
Layers: [arn:aws:lambda:us-east-1:336392948345:layer:AWSSDKPandas-Python311:17]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i don't think this :should: be happening, but as near as I can tell: the reason uploads (and also some other bits of the upload pipeline/dashboard apis) broke is because in python 3.11 lambdas, the isolation between lambdas is different? Layers referencing a handler which didn't use pandas were previously fine, but now it seems like, since every lambda can access the entire projects code, without this layer they fail on importing pandas?

I'll look into this and figure out if there's a better isolation pathway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why would an import pandas line even be hit in this case? Is the lambda now running a different code path because of a layer change?

That seems concerning (but maybe could happen if we now are pulling in a newer version of some pip package that added a pandas dep?).

Did we accidentally introduce an unexpected dependency on pandas ourselves and that's what's causing this? Like, do we have stacktraces for where the import comes from?

Copy link
Contributor Author

@dogversioning dogversioning Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no stack trace in the error message, but in the light of a fresh day with morning brain and doing it by hand, you're right - the dependency on pandas was added to the shared functions.py, which most everything imports.

I could move that function to a different shared location?

I was thinking about this anyway when this occured, but in light of this being the root cause, and now that we have the environment for it, i'll add an issue about running a regression against the APIs.

enums.ColumnTypesKeys.COLUMN_TYPES_FORMAT_VERSION.value: "1",
enums.ColumnTypesKeys.COLUMN_TYPES_FORMAT_VERSION.value: "2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain to a n00b what implications this has?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At :some: point - it might be useful for a downstream consumer to realize the format has changed. Right now, it's primarily documentation.

This version change does mean that insight needs to be updated to expect a different kind of data back.

template.yaml Outdated
@@ -77,11 +77,12 @@ Resources:
Properties:
FunctionName: !Sub 'CumulusAggFetchAuthorizer-${DeployStage}'
Handler: src/handlers/site_upload/api_gateway_authorizer.lambda_handler
Layers: [arn:aws:lambda:us-east-1:336392948345:layer:AWSSDKPandas-Python311:17]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why would an import pandas line even be hit in this case? Is the lambda now running a different code path because of a layer change?

That seems concerning (but maybe could happen if we now are pulling in a newer version of some pip package that added a pandas dep?).

Did we accidentally introduce an unexpected dependency on pandas ourselves and that's what's causing this? Like, do we have stacktraces for where the import comes from?

@dogversioning dogversioning merged commit 5dbb54d into main Oct 7, 2024
2 checks passed
@dogversioning dogversioning deleted the mg/dp-migrate-upload-bugfix branch October 7, 2024 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants