Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCHEMATIC-163] Catch error when manifest is generated and existing one doesn't have entityId #1551

Merged
merged 5 commits into from
Dec 2, 2024

Conversation

BWMac
Copy link
Collaborator

@BWMac BWMac commented Nov 21, 2024

Description:

When a user generates a manifest and existing manifests are used to populate information in the new one, the manifests must have the entityId column. Recently a user encountered a nondescript KeyError due to a missing entityId column in a manifest. This was most likely due to their manually creating a manifest and uploading it manually or by generating the manifest with schematic, removing the entityId column and uploading it without schematic. Going forward, we should include a more detailed error message in these cases so that users can be helped back into the correct workflow of using schematic.

This PR adds error handling in SynapseStorage._get_file_entityIds for these cases and a detailed error message. I also added unit tests for _get_file_entityIds because there weren't any previously.

Notes:

  • I reproduced the error described in the ticket, introduced the error handling and tested it again, confirming that the new error type and message are received in this case.

Update (11/26):

I have updated this PR to include a snippet of code that will catch if the manifest provided does not have the entityId column in the SynapseStorage.updateDatasetManifestFiles function and trigger a new manifest to be generated in this case.

Testing:

I used the instructions in the ticket to reproduce the error, and then applied this latest change:

When I run

schematic manifest -c test_flat_config.yml get -dt BulkRNA-seqAssay -d syn64109617 -s -a

I get a manifest with annotations filled out.

When I run

schematic manifest -c test_flat_config.yml get -dt BulkRNA-seqAssay -d syn64109617 -s

I get a manifest without annotations filled out.

@BWMac BWMac marked this pull request as ready for review November 21, 2024 21:07
@BWMac BWMac requested a review from a team as a code owner November 21, 2024 21:07
Copy link
Collaborator

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll count on @GiaJordan to leave any comments in case we miss anything

schematic/store/synapse.py Show resolved Hide resolved
Copy link
Contributor

@GiaJordan GiaJordan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one comment but otherwise looks good!

@BWMac BWMac marked this pull request as draft November 22, 2024 20:11
@BWMac BWMac marked this pull request as ready for review November 26, 2024 20:50
@BWMac BWMac merged commit c681346 into develop Dec 2, 2024
8 checks passed
@thomasyu888 thomasyu888 deleted the bwmac/SCHEMATIC-163/error-message-update branch December 11, 2024 04:12
andrewelamb added a commit that referenced this pull request Dec 16, 2024
* add new tests

* add unit tests

* ran black

* Update schematic/models/validate_attribute.py

Co-authored-by: BryanFauble <[email protected]>

* added tests

* Update README.md

* Update README.md

* add unit tests

* run black

* Update README.md

* temp commit

* remove old tests

* [FDS-2386] Synapse entity tracking and code concurrency updates (#1505)

* [FDS-2386] Synapse entity tracking and code concurrency updates

* ran black

* Update CODEOWNERS

* updated data model type rules to include error param

* fix validate type attribute to use msg level param

* added error handling

* run black

* create Node class

* sat up Node class so that nodes with no displayName fields cause an error on creation

* ran black

* ran mypy

* added new configs for CLI tests

* added new manifests for testing CLI commands

* automate manual CLI tests

* ran black

* Update CODEOWNERS

* Update scan_repo.yml

* Update .github/CODEOWNERS

* Update .github/workflows/scan_repo.yml

* Attach additional telemetry data to OTEL traces (#1519)

* Attach additional telemetry data to OTEL traces

* feat: added tracing for cross manifest validation and file name validation  (#1509)

* add tracing for GX validation

* temp commit

* Updating contribution doc to expect squash and merge (#1534)

* [FDS-2491] Integration tests for Schematic API Test plan (#1512)

Integration tests for Schematic API Test plan

* [FDS-2500] Add Integration Tests for: Manifest Validation (#1516)

* Add Integration Tests for: Manifest Validation

* [FDS-2449] Lock `sphinx` version and update `poetry.lock` (#1530)

Also install `typing-extensions` in the build

* manual test files now being saved in manifests folder

* manual test files now being saved in manifests folder

* remove lines to delete json files that were under git control

* ran black

* add try finally blocks to remove created files

* ran black

* add lines to remove created json files

* Update file annotation store process to require filename be present in order to annotate file

* add lines to remove created json files

* Revert "Update file annotation store process to require filename be present in order to annotate file"

This reverts commit f57c718.

* Don't attempt to annotate the table

* add code in finally blocks to reset config to default values, when tests change them

* complete submit manifest command test

* ran black

* add test for bug case

* update test for table tidyness

* remove unused import

* remove etag column if already present when building temp file view

* catch all exceptions to switch to sequential mode

* update test for updated data

* Revert "update test for updated data"

This reverts commit 255e3c0.

* Revert "catch all exceptions to switch to sequential mode"

This reverts commit 68b0b24.

* catch ValueErrors as well

* Updates for integration test failures (#1537)

* Updates for integration test failures, Config file reset and scope changes

* add todos for removing config resets

* [FDS-2525] Authenticated export of telemetry data (#1527)

* Authenticated export of telemetry data, updating to HTTP otel library

* temp reduce tests

* restore tests

* uncomment tests

* redid how files are deleted, manual tests values are set

* ran black

* [SCHEMATIC-157] Make some dependencies required to avoid `schematic CLI` commands from potentially erroring when doing a pip install (#1540)

* Make otel flash non-optional

* Add dependencies as non-optional

* Include schematic_api for now (#1547)

* update toml version to 24.11.1 (#1548)

* [SCHEMATIC-193] Support exporting telemetry data from GH integration test runs (#1550)

* Support exporting telemetry data from GH run via access token retrieved via oauth2

* [SCHEMATIC-30, SCHEMATIC-200] Add version to click cli / use pathlib.Path module for checking cache size (#1542)

* Add version to click cli

* Add version

* Run black

* Reformat

* Fix

* Update schematic/schemas/data_model_parser.py

* Add test for check_synapse_cache_size

* Reformat

* Fix tests

* Remove unused parameter

* Install all-extras for now

* Make otel flash non-optional

* Update dockerfile

* Add dependencies as non-optional

* Update pyproject toml

* Fix trivy issue

* Add service version

* Run black

* Move all utils.general tests into separate folder

* Use pre-commit

* Add updates to contribution doc

* Fix

* Add service version to log provider

---------

Co-authored-by: BryanFauble <[email protected]>

* [SCHEMATIC-212] Prevent traces from being combined (#1552)

* Set instance id in github CI run, uninstrument flask auto during integration test run

* [SCHEMATIC-163] Catch error when manifest is generated and existing one doesn't have `entityId` (#1551)

* adds error handling

* adds unit tests for _get_file_entityIds

* updates error message

* adds entityid check to parent func

* updates docstring

* [SCHEMATIC-183] Use paths from file view for manifest generation (#1529)

source manifest file paths from synapse fileviews at generation

* [SCHEMATIC-214] Wrap pandas functions to support not including `None` with the NA values argument (#1553)

* Wrap pandas functions to support not including `None` with the NA values argument

* Ignore types

* pylint issues

* ordering of ignore

* Add to integration test to cover none in a manifest

* Add additional test for manifest

* [SCHEMATIC-210] Add attribute to nones data model (#1555)

Update example_test_nones.model.csv component and add new invalid manifest with nones

* first commit

* ran black

* add test for validateModelManifest

* [SCHEMATIC-214] change data model and component (#1556)

* add valid values to Patient attributes

* update data model

* add test manifests

* update test for new model

* update test for new valid value

* change test to use new manifests

* remove uneeded test file

* revert file

* revert file

* change tests to use new manifests

* remove uneeded manifests

* ran black

* add tests back in

* ran black

* revert manifest

* Split up valid and errored test as separate testing functions

* Remove unused import

---------

Co-authored-by: Gianna Jordan <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Thomas Yu <[email protected]>

* incremented packge version number

* Update publish.yml

* Update test.yml

* Update api_test.yml

* Update pdoc.yml

* Update version.py

* updates publish.yml (#1558) (#1561)

Co-authored-by: Brad Macdonald <[email protected]>

---------

Co-authored-by: BryanFauble <[email protected]>
Co-authored-by: Jenny V Medina <[email protected]>
Co-authored-by: Thomas Yu <[email protected]>
Co-authored-by: Lingling <[email protected]>
Co-authored-by: GiaJordan <[email protected]>
Co-authored-by: Brad Macdonald <[email protected]>
Co-authored-by: Gianna Jordan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants