Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the expectations for when static data is included and make it configurable #7667

Closed
jleibs opened this issue Oct 9, 2024 · 2 comments
Labels
enhancement New feature or request feat-dataframe-api Everything related to the dataframe API

Comments

@jleibs
Copy link
Member

jleibs commented Oct 9, 2024

When static data is included in the result set it can be confusing.

The static data, when returned, might show up in row 0, but the row has no index values which can require extra user-side filtering.

Proposal

  • Add a new enum along the lines of include_static to QueryExpression. The enum should have 3 values.
    • AUTO include a static row at row 0 only if there is static data in the dataset (the default)
    • ALWAYS row = 0 will be static regardless of whether there is static data
    • NEVER will never include an explicit row for the static data -- it will only be accessible via LatestAt.
  • For AUTO / ALWAYS, the choice to include the static data is independent of any value of filtered_index_range.
  • When using filtered_index_values or using_index_values :
    • The static data will only be included if the input includes TimeInt::STATIC
    • If ALWAYS and the input has no TimeInt::STATIC return an error
    • If NEVER and the input has a TimeInt::STATIC return an error.
  • Note that we can implement a helper such as view(...).static_only() which is just an alias for:
    include_static=ALWAYS, filtered_index_values=[TmeInt::STATIC]

For 0.19

This does not need to be done for 0.19. Let's make the behavior match AUTO as default and add the configurabillity later.

  • If filtered_index_values or using_index_values are set, then the choice to include static data is determined by an explicit TimeInt::STATIC in the input.
  • Otherwise, if the view contains any column which includes static data, then row 0 will be the Static row, and this data will be returned regardless of the range specified by filtered_index_range
  • We should clearly document: if there is static data in the results, it will always be in row 0 and that will be indicated by the the TimeInt for the index column being null
@jleibs
Copy link
Member Author

jleibs commented Oct 9, 2024

Thinking about this more I'm now in favor of the alternative proposal:

@teh-cmc
Copy link
Member

teh-cmc commented Oct 10, 2024

@teh-cmc teh-cmc closed this as not planned Won't fix, can't repro, duplicate, stale Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feat-dataframe-api Everything related to the dataframe API
Projects
None yet
Development

No branches or pull requests

2 participants