-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace uses of DataFrame
with MetricflowDataTable
#1235
Conversation
…rify a table was returned.
Dataframe
with MetricflowDataTable
DataFrame
with MetricflowDataTable
c5e8281
to
a07a1c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for splitting this up into reviewable chunks. Note the change doesn't work on Snowflake due to their default behavior of SHOUTING ALL THE COLUMN NAMES.
dim_vals = result_dataframe[result_dataframe.columns[~result_dataframe.columns.isin(metric_names)]].iloc[:, 0] | ||
|
||
return sorted([str(val) for val in dim_vals]) | ||
return sorted([str(val) for val in query_result.result_df.column_values_iterator(0)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice. Do we know for sure the dimension will always come first, or should we get rid of the magic number and do something like query_result.result_df.column_values_iterator(query_result.result_df.column_name_index(get_group_by_values))
?
The pandas operation was skipping all of the metric columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's a specified order when the SQL is rendered to have the dimension values first
def as_tuple(self) -> Tuple[SqlSelectColumn, ...]: |
It would be better to do a lookup, but mapping the name to the output column is not as straightforward as it should be since the output column name can be different from the input.
"""Helper method to get the engine-specific type value. | ||
|
||
The dtype dict here is non-exhaustive but should be adequate for our needs. | ||
""" | ||
# TODO: add type handling for string/bool/bigint types for all engines | ||
if dtype == "string" or dtype == "object": | ||
column_type = column_description.column_type | ||
if column_type is str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is so much better than the magic string comparisons....
assert df.columns.tolist() == [col] | ||
assert set(df[col]) == vals | ||
assert df.column_count == 1 | ||
assert df.column_names == (col,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is case sensitive, which doesn't work with Snowflake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
| 2020-01-03 00:00:00 | 5 | 1 | 0 | | ||
metric_time__day listing__capacity_latest bookings instant_bookings | ||
------------------ -------------------------- ---------- ------------------ | ||
2019-12-01 5 1 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. Is that because the datetime/dateutil objects don't format in 0 values for time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually due to a case in the text formatting logic. In retrospect, let me remove that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
This replacement is needed for the later removal of the
pandas
dependency.