Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace uses of DataFrame with MetricflowDataTable #1235

Merged
merged 15 commits into from
Jun 3, 2024
Merged

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented May 31, 2024

Description

This replacement is needed for the later removal of the pandas dependency.

@cla-bot cla-bot bot added the cla:yes label May 31, 2024
@plypaul plypaul changed the title Replace uses of Dataframe with MetricflowDataTable Replace uses of DataFrame with MetricflowDataTable May 31, 2024
@plypaul plypaul force-pushed the p--py312--04 branch 2 times, most recently from c5e8281 to a07a1c6 Compare May 31, 2024 02:01
@plypaul plypaul marked this pull request as ready for review May 31, 2024 02:19
@tlento tlento added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jun 2, 2024
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 2, 2024 20:14 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 2, 2024 20:14 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 2, 2024 20:14 — with GitHub Actions Inactive
@tlento tlento had a problem deploying to DW_INTEGRATION_TESTS June 2, 2024 20:14 — with GitHub Actions Failure
@tlento tlento self-requested a review June 2, 2024 20:14
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting this up into reviewable chunks. Note the change doesn't work on Snowflake due to their default behavior of SHOUTING ALL THE COLUMN NAMES.

dim_vals = result_dataframe[result_dataframe.columns[~result_dataframe.columns.isin(metric_names)]].iloc[:, 0]

return sorted([str(val) for val in dim_vals])
return sorted([str(val) for val in query_result.result_df.column_values_iterator(0)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice. Do we know for sure the dimension will always come first, or should we get rid of the magic number and do something like query_result.result_df.column_values_iterator(query_result.result_df.column_name_index(get_group_by_values))?

The pandas operation was skipping all of the metric columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's a specified order when the SQL is rendered to have the dimension values first

def as_tuple(self) -> Tuple[SqlSelectColumn, ...]:

It would be better to do a lookup, but mapping the name to the output column is not as straightforward as it should be since the output column name can be different from the input.

"""Helper method to get the engine-specific type value.

The dtype dict here is non-exhaustive but should be adequate for our needs.
"""
# TODO: add type handling for string/bool/bigint types for all engines
if dtype == "string" or dtype == "object":
column_type = column_description.column_type
if column_type is str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is so much better than the magic string comparisons....

assert df.columns.tolist() == [col]
assert set(df[col]) == vals
assert df.column_count == 1
assert df.column_names == (col,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is case sensitive, which doesn't work with Snowflake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

| 2020-01-03 00:00:00 | 5 | 1 | 0 |
metric_time__day listing__capacity_latest bookings instant_bookings
------------------ -------------------------- ---------- ------------------
2019-12-01 5 1 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. Is that because the datetime/dateutil objects don't format in 0 values for time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually due to a case in the text formatting logic. In retrospect, let me remove that.

@plypaul plypaul added Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment and removed Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Jun 3, 2024
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 16:20 — with GitHub Actions Inactive
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 17:16 — with GitHub Actions Inactive
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 17:16 — with GitHub Actions Inactive
@plypaul plypaul had a problem deploying to DW_INTEGRATION_TESTS June 3, 2024 17:16 — with GitHub Actions Failure
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 17:16 — with GitHub Actions Inactive
@plypaul plypaul added Reload Test Data in SQL Engines Should be run when test data changes and removed Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Jun 3, 2024
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 18:45 — with GitHub Actions Inactive
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 18:45 — with GitHub Actions Inactive
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 18:45 — with GitHub Actions Inactive
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 18:45 — with GitHub Actions Inactive
@github-actions github-actions bot removed the Reload Test Data in SQL Engines Should be run when test data changes label Jun 3, 2024
@plypaul plypaul added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jun 3, 2024
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 19:36 — with GitHub Actions Inactive
@plypaul plypaul had a problem deploying to DW_INTEGRATION_TESTS June 3, 2024 19:36 — with GitHub Actions Failure
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 19:36 — with GitHub Actions Inactive
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 19:36 — with GitHub Actions Inactive
@plypaul plypaul added Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment and removed Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Jun 3, 2024
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 19:49 — with GitHub Actions Inactive
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 19:49 — with GitHub Actions Inactive
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 19:49 — with GitHub Actions Inactive
@plypaul plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 19:49 — with GitHub Actions Inactive
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jun 3, 2024
@plypaul plypaul requested a review from tlento June 3, 2024 20:45
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@plypaul plypaul merged commit 8ecf93a into main Jun 3, 2024
30 checks passed
@plypaul plypaul deleted the p--py312--04 branch June 3, 2024 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants