Replace uses of `DataFrame` with `MetricflowDataTable` #1235

plypaul · 2024-05-31T01:49:54Z

Description

This replacement is needed for the later removal of the pandas dependency.

…rify a table was returned.

…ask`.

tlento

Thanks for splitting this up into reviewable chunks. Note the change doesn't work on Snowflake due to their default behavior of SHOUTING ALL THE COLUMN NAMES.

tlento · 2024-05-31T17:49:06Z

metricflow/engine/metricflow_engine.py

-        dim_vals = result_dataframe[result_dataframe.columns[~result_dataframe.columns.isin(metric_names)]].iloc[:, 0]
-
-        return sorted([str(val) for val in dim_vals])
+        return sorted([str(val) for val in query_result.result_df.column_values_iterator(0)])


Oh nice. Do we know for sure the dimension will always come first, or should we get rid of the magic number and do something like query_result.result_df.column_values_iterator(query_result.result_df.column_name_index(get_group_by_values))?

The pandas operation was skipping all of the metric columns.

Yeah, there's a specified order when the SQL is rendered to have the dimension values first

metricflow/metricflow/plan_conversion/select_column_gen.py

Line 36 in bbd2901

def as_tuple(self) -> Tuple[SqlSelectColumn, ...]:

It would be better to do a lookup, but mapping the name to the output column is not as straightforward as it should be since the output column name can be different from the input.

tlento · 2024-05-31T17:55:13Z

tests_metricflow/fixtures/sql_clients/adapter_backed_ddl_client.py

        """Helper method to get the engine-specific type value.

        The dtype dict here is non-exhaustive but should be adequate for our needs.
        """
        # TODO: add type handling for string/bool/bigint types for all engines
-        if dtype == "string" or dtype == "object":
+        column_type = column_description.column_type
+        if column_type is str:


Oh this is so much better than the magic string comparisons....

tlento · 2024-06-02T20:11:04Z

tests_metricflow/sql_clients/test_sql_client.py

-    assert df.columns.tolist() == [col]
-    assert set(df[col]) == vals
+    assert df.column_count == 1
+    assert df.column_names == (col,)


This is case sensitive, which doesn't work with Snowflake.

tlento · 2024-06-02T20:16:50Z

tests_metricflow/snapshots/test_cli.py/str/BigQuery/test_saved_query__cli_output.txt

-| 2020-01-03 00:00:00 |                          5 |          1 |                  0 |
+metric_time__day      listing__capacity_latest    bookings    instant_bookings
+------------------  --------------------------  ----------  ------------------
+2019-12-01                                   5           1                   0


Oh interesting. Is that because the datetime/dateutil objects don't format in 0 values for time?

This is actually due to a case in the text formatting logic. In retrospect, let me remove that.

tlento

/* PR_START p--py312 04 */ Add assertion for intergration tests to ve…

91bb0fb

…rify a table was returned.

plypaul added the Skip Changelog label May 31, 2024

cla-bot bot added the cla:yes label May 31, 2024

plypaul changed the title ~~Replace uses of Dataframe with MetricflowDataTable~~ Replace uses of DataFrame with MetricflowDataTable May 31, 2024

plypaul force-pushed the p--py312--04 branch 2 times, most recently from c5e8281 to a07a1c6 Compare May 31, 2024 02:01

plypaul marked this pull request as ready for review May 31, 2024 02:19

plypaul added 12 commits May 31, 2024 10:50

Use .text_format() for data table to string conversion.

b291444

Migrate various uses of dataframes to the data table.

e464d74

Update SqlTableSnapshot to use MetricFlowDataTable.

eeb57c1

Update snapshots for new test-table format.

152b6ce

Rename strings similar to "dataframe".

3ed8f0c

Update snapshots for WriteToResultDataframeNode rename.

95e0e7e

Rename write_to_dataframe.py to write_to_data_table.py.

ae53659

Rename SelectSqlQueryToDataFrameTask to `SelectSqlQueryToDataTableT…

47f4866

…ask`.

Update snapshots for rename to SelectSqlQueryToDataTableTask.

1808631

Rename as_df to as_data_table.

b6a313a

Update SnowflakeInferenceContextProvider to use correct types.

a3c2110

Update the CLI to use corresponding methods for the data table.

0ece9ef

plypaul force-pushed the p--py312--04 branch from a07a1c6 to 0ece9ef Compare May 31, 2024 17:51

tlento added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jun 2, 2024

tlento temporarily deployed to DW_INTEGRATION_TESTS June 2, 2024 20:14 — with GitHub Actions Inactive

tlento had a problem deploying to DW_INTEGRATION_TESTS June 2, 2024 20:14 — with GitHub Actions Failure

tlento self-requested a review June 2, 2024 20:14

tlento reviewed Jun 2, 2024

View reviewed changes

plypaul added Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment and removed Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Jun 3, 2024

plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 16:20 — with GitHub Actions Inactive

plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 17:16 — with GitHub Actions Inactive

plypaul had a problem deploying to DW_INTEGRATION_TESTS June 3, 2024 17:16 — with GitHub Actions Failure

plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 17:16 — with GitHub Actions Inactive

plypaul added Reload Test Data in SQL Engines Should be run when test data changes and removed Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Jun 3, 2024

plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 18:45 — with GitHub Actions Inactive

github-actions bot removed the Reload Test Data in SQL Engines Should be run when test data changes label Jun 3, 2024

plypaul added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jun 3, 2024

plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 19:36 — with GitHub Actions Inactive

plypaul had a problem deploying to DW_INTEGRATION_TESTS June 3, 2024 19:36 — with GitHub Actions Failure

plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 19:36 — with GitHub Actions Inactive

plypaul added 2 commits June 3, 2024 12:48

Address comments.

19beaba

Update snapshots.

027afbb

plypaul force-pushed the p--py312--04 branch from b7f5509 to 027afbb Compare June 3, 2024 19:48

plypaul added Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment and removed Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Jun 3, 2024

plypaul temporarily deployed to DW_INTEGRATION_TESTS June 3, 2024 19:49 — with GitHub Actions Inactive

github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jun 3, 2024

plypaul requested a review from tlento June 3, 2024 20:45

tlento approved these changes Jun 3, 2024

View reviewed changes

plypaul merged commit 8ecf93a into main Jun 3, 2024
30 checks passed

plypaul deleted the p--py312--04 branch June 3, 2024 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace uses of `DataFrame` with `MetricflowDataTable` #1235

Replace uses of `DataFrame` with `MetricflowDataTable` #1235

plypaul commented May 31, 2024

tlento left a comment

tlento May 31, 2024

plypaul Jun 3, 2024

tlento May 31, 2024

tlento Jun 2, 2024

plypaul Jun 3, 2024

tlento Jun 2, 2024

plypaul Jun 3, 2024

tlento left a comment

Replace uses of DataFrame with MetricflowDataTable #1235

Replace uses of DataFrame with MetricflowDataTable #1235

Conversation

plypaul commented May 31, 2024

Description

tlento left a comment

Choose a reason for hiding this comment

tlento May 31, 2024

Choose a reason for hiding this comment

plypaul Jun 3, 2024

Choose a reason for hiding this comment

tlento May 31, 2024

Choose a reason for hiding this comment

tlento Jun 2, 2024

Choose a reason for hiding this comment

plypaul Jun 3, 2024

Choose a reason for hiding this comment

tlento Jun 2, 2024

Choose a reason for hiding this comment

plypaul Jun 3, 2024

Choose a reason for hiding this comment

tlento left a comment

Choose a reason for hiding this comment

Replace uses of `DataFrame` with `MetricflowDataTable` #1235

Replace uses of `DataFrame` with `MetricflowDataTable` #1235