Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: is_between incorrect behaviour (TypeError: cannot create expression literal for value of type Expr) #1659

Closed
sergiocalde94 opened this issue Dec 26, 2024 · 6 comments · Fixed by #1672

Comments

@sergiocalde94
Copy link

Describe the bug

When attempting to use nw.lit to create literal expressions for filtering dates in a Narwhal DataFrame, the following error occurs:

TypeError: cannot create expression literal for value of type Expr.

This happens specifically when using nw.lit in combination with .filter() and .is_between() for date-based comparisons. However, casting and aliasing the literals and then using them in .to_native() before filtering resolves the issue (but I can't do this since I want to have agnostic dataframe filtering.

Steps or code to reproduce the bug

Setup:

import polars as pl
import narwhal as nw

start_train_date = "2024-01-01"
end_train_date = "2024-03-05"

df_polars = pl.DataFrame({
    "application_started_at": [
        "2024-01-01", "2024-02-01", "2024-03-01", "2024-04-01", "2024-04-06",
    ]
})

df_native = nw.from_native(df_polars)

then:

# This will raise the TypeError
df_native.select(
    nw.col("application_started_at").cast(nw.Date).dt.date(),
    nw.lit(start_train_date).cast(nw.Date).alias("a"),
    nw.lit(end_train_date).cast(nw.Date).alias("b")
).filter(
    nw.col("application_started_at").is_between(nw.col("a"), nw.col("b"), closed="both")
).to_native()

to get the error and:

(
    df_native
    .select(
        nw.col(col_time).cast(nw.Date).dt.date(),
        nw.lit(start_train_date).cast(nw.Date).alias("a"),
        nw.lit(end_train_date).cast(nw.Date).alias("b")
    )
    .to_native()
    .filter(
        pl.col(col_time).is_between(pl.col("a"), pl.col("b"), closed="both")
    )
)

to get the expected results.

Expected results

application_started_at a b
2024-01-01 2024-01-01 2024-03-05
2024-02-01 2024-01-01 2024-03-05
2024-03-01 2024-01-01 2024-03-05

Actual results

TypeError: cannot create expression literal for value of type Expr.

Please run narwhals.show_version() and enter the output below.

1.19.1

Relevant log output

TypeError: cannot create expression literal for value of type Expr.

Hint: Pass `allow_object=True` to accept any value and create a literal of type Object.
File <command-332563390273393>, line 19
      4 df_polars = pl.DataFrame({
      5     "application_started_at": [
      6         "2024-01-01", "2024-02-01", "2024-03-01", "2024-04-01", "2024-04-06",
      7     ]
      8 })
     10 df_native = nw.from_native(df_polars)
     12 (
     13     df_native
     14     .select(
     15         nw.col(col_time).cast(nw.Date).dt.date(),
     16         nw.lit(start_train_date).cast(nw.Date).alias("a"),
     17         nw.lit(end_train_date).cast(nw.Date).alias("b")
     18     )
---> 19     .filter(
     20         nw.col(col_time).is_between(nw.col("a"), nw.col("b"), closed="both")
     21     )
     22     .to_native()
     23 )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-ad765748-653a-475d-a18a-358020328a68/lib/python3.11/site-packages/polars/functions/lit.py:179, in lit(value, dtype, allow_object)
    176 except AttributeError:
    177     item = value
--> 179 return wrap_expr(plr.lit(item, allow_object, is_scalar=True))
@MarcoGorelli
Copy link
Member

Thanks for the report!

@sergiocalde94
Copy link
Author

You're welcome! Thanks for this fantastic tool. BTW I am doing this as a temporal fix:

(
    df_native
    .select(
        nw.col(col_time).cast(nw.Date).dt.date(),
        nw.lit(start_train_date).cast(nw.Date).alias("a"),
        nw.lit(end_train_date).cast(nw.Date).alias("b")
    )
    .filter(
        nw.col(col_time) >= nw.col("a"),
        nw.col(col_time) <= nw.col("b"),
    )
    .to_native()
)

@MarcoGorelli
Copy link
Member

thanks! will try to include a fix in next monday's release. just out of interest, what are you using narwhals for?

another workaround might be

    .filter(
        nw.col(col_time).is_between(start_time, end_time, closed="both")
    )

@sergiocalde94
Copy link
Author

I prefer to use the other one so I don't create any new column :)

I am using Narwhals for an internal library that deals with ML preprocessing. Our team used to work using pandas but some of us are migrating to Polars. This way we can use both :)

@sergiocalde94
Copy link
Author

BTW I just saw that when using that function with pandas I can't use the date thing: NotImplementedError: Date dtype only supported for pyarrow-backed data types in pandas.

I can imagine that (at this moment) if the format is YYYY-MM-dd I can use < and > since they are comparable as strings, right?

@FBruzzesi
Copy link
Member

Hey @sergiocalde94 thanks for reporting the issue! #730 goes into a similar direction.

Allowing for expressions is extremely flexible but not fully supported just yet! Maybe it could be in scope as one of 2025 goals 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants