Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expressify lower_bound and upper_bound in is_between #1672

Merged
merged 6 commits into from
Dec 29, 2024

Conversation

MarcoGorelli
Copy link
Member

What type of PR is this? (check all applicable

closes #1659

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

  • Related issue #<issue number>
  • Closes #<issue number>

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Comment on lines -351 to -375
def sum(self: Self, *column_names: str) -> ArrowExpr:
return ArrowExpr.from_column_names(
*column_names, backend_version=self._backend_version, version=self._version
).sum()

def mean(self: Self, *column_names: str) -> ArrowExpr:
return ArrowExpr.from_column_names(
*column_names, backend_version=self._backend_version, version=self._version
).mean()

def median(self: Self, *column_names: str) -> ArrowExpr:
return ArrowExpr.from_column_names(
*column_names, backend_version=self._backend_version, version=self._version
).median()

def max(self: Self, *column_names: str) -> ArrowExpr:
return ArrowExpr.from_column_names(
*column_names, backend_version=self._backend_version, version=self._version
).max()

def min(self: Self, *column_names: str) -> ArrowExpr:
return ArrowExpr.from_column_names(
*column_names, backend_version=self._backend_version, version=self._version
).min()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by - it's redundat to define all of these in the CompliantExprs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! I love to see net negative in PRs πŸ‘Œ

@MarcoGorelli MarcoGorelli added the enhancement New feature or request label Dec 28, 2024
@MarcoGorelli MarcoGorelli marked this pull request as ready for review December 28, 2024 21:42
Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor comment on the unit test.

The approach used here to expressify is_between seems generic enough to be used elsewhere, is that right?! I somehow thought it was going to be fairly more complex, yet extract_compliant together with <>_and_extract_native seem all we need πŸ™ŒπŸΌπŸš€

Comment on lines -351 to -375
def sum(self: Self, *column_names: str) -> ArrowExpr:
return ArrowExpr.from_column_names(
*column_names, backend_version=self._backend_version, version=self._version
).sum()

def mean(self: Self, *column_names: str) -> ArrowExpr:
return ArrowExpr.from_column_names(
*column_names, backend_version=self._backend_version, version=self._version
).mean()

def median(self: Self, *column_names: str) -> ArrowExpr:
return ArrowExpr.from_column_names(
*column_names, backend_version=self._backend_version, version=self._version
).median()

def max(self: Self, *column_names: str) -> ArrowExpr:
return ArrowExpr.from_column_names(
*column_names, backend_version=self._backend_version, version=self._version
).max()

def min(self: Self, *column_names: str) -> ArrowExpr:
return ArrowExpr.from_column_names(
*column_names, backend_version=self._backend_version, version=self._version
).min()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! I love to see net negative in PRs πŸ‘Œ

def test_is_between_expressified(constructor: Constructor) -> None:
data = {"a": [1, 4, 2, 5], "b": [0, 5, 2, 4], "c": [9, 9, 9, 9]}
df = nw.from_native(constructor(data))
result = df.select(nw.col("a").is_between(nw.col("b"), nw.col("c")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test with a generic expression instead of just col?
Random proposal that would not change expected_dict:

Suggested change
result = df.select(nw.col("a").is_between(nw.col("b"), nw.col("c")))
result = df.select(nw.col("a").is_between(nw.col("b") * 0.9, nw.col("c") - 1))

@FBruzzesi
Copy link
Member

On a second thought, we are creating some asymmetry between what we can pass to Series.is_between vs Expr.is_between, aren't we? I would expect to be able to pass series' to Series.is_between, e.g.

some_series.is_between(lower_bound_series, upper_bound_series)

Apologies if this is already possible, if so we may want to add a test case for it

@MarcoGorelli
Copy link
Member Author

thanks for your review! yup, this is what we do in __eq__ to accept either scalars or expressions / series, we can almost certainly do it in more places (and probably reuse more code)

@MarcoGorelli MarcoGorelli merged commit 2dd4480 into narwhals-dev:main Dec 29, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: is_between incorrect behaviour (TypeError: cannot create expression literal for value of type Expr)
2 participants