Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include a view.schema() API and improved support for static data #7754

Merged
merged 10 commits into from
Oct 16, 2024

Conversation

jleibs
Copy link
Member

@jleibs jleibs commented Oct 15, 2024

What

  • Resolves: Make it easy to select only static / only non-static columns #7601
  • Makes it possible to see the schema of just a view
  • Adds optional params to view creation for:
    • include_semantically_empty_columns,
    • include_indicator_columns,
    • include_tombstone_columns,
  • Include ability to check whether columns are static
  • Add a select_static API variant to getting just the static data from a view

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
  • If applicable, add a new check to the release checklist!
  • If have noted any breaking changes to the log API in CHANGELOG.md and the migration guide

To run all checks from main, comment on the PR with @rerun-bot full-check.

@jleibs jleibs added 🐍 Python API Python logging API feat-dataframe-api Everything related to the dataframe API labels Oct 15, 2024
@jleibs jleibs marked this pull request as ready for review October 15, 2024 17:14
@jleibs jleibs changed the title Include a view.schema() API and params for view-level content filtering Include a view.schema() API and improved support for static data Oct 15, 2024
@jleibs jleibs force-pushed the jleibs/view_schema branch from edf0c96 to 50ab88b Compare October 15, 2024 21:23
@teh-cmc teh-cmc self-requested a review October 16, 2024 07:28
@teh-cmc teh-cmc merged commit 85cceb9 into main Oct 16, 2024
72 checks passed
@teh-cmc teh-cmc deleted the jleibs/view_schema branch October 16, 2024 07:40
teh-cmc pushed a commit that referenced this pull request Oct 16, 2024
### What
Based on top of:
- #7754
- Will need rebase after merging ^

This tries to alleviates a possible footgun where a user creates what
appears to be a valid view expression but it only includes static data.
In these cases the results of `.select()` won't produce any data since
there are no row-providing columns.

There are many possible ways to end up in this state but the logic here
should not be too likely for false-warnings while producing a reasonable
degree of user safety.

If the user:
- Writes a content expression that only matches static content
- AND writes a select statement that queries static data
- AND does not call `using_index_values(...)`

Then we will produce a warning.

The most likely false positive where this would introduce a spurious
warning would be a user wanting to query for a mixture of static and
non-static data in a circumstance where sometimes none of the non-static
data is logged and the user expects to (correctly) get no rows in this
case. However, these circumstances generally imply a more advanced user
that could then work around then with a mixed query + join anyways.

Future work:
- #7759
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat-dataframe-api Everything related to the dataframe API 🐍 Python API Python logging API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it easy to select only static / only non-static columns
3 participants