Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure the AI Company ID to Entity ID data has only distinct rows #116

Merged
merged 1 commit into from
Feb 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion run_pacta_data_preparation.R
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,8 @@ factset_entity_id__ar_company_id <-
select(
factset_entity_id = "factset_id",
ar_company_id = "company_id"
)
) %>%
distinct()
Comment on lines +246 to +247
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Should this be something we warn about? like "X input contains duplicate rows, removing"
Maybe not important.

Copy link
Member Author

@cjyetman cjyetman Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 maybe? I think that it should be expected that all of these inputs do not have random duplicate lines.

I've already gone down the rabbit hole of imagining adding validation functions to pacta.data.preparation for all of these inputs and all of the configuration options 😈

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i mean, in my perfect world, once #94 is closed, I would imagine just adding a simple "warning" message in the wrapper function whenever this kind of wonkiness happens would be sufficient

readRDS(factset_entity_info_path) %>%
pacta.data.preparation::prepare_entity_info(factset_entity_id__ar_company_id) %>%
saveRDS(file.path(data_prep_outputs_path, "entity_info.rds"))
Expand Down
Loading