Skip to content

Commit

Permalink
docs: pandas boolean content tabs (#1394)
Browse files Browse the repository at this point in the history
  • Loading branch information
FBruzzesi authored Nov 17, 2024
1 parent f8f3683 commit 1fb7b85
Show file tree
Hide file tree
Showing 2 changed files with 61 additions and 36 deletions.
2 changes: 1 addition & 1 deletion docs/backcompat.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ before making any change.
```

To check if a dtype is a datetime (regardless of `time_unit` or `time_zone`)
we recommend using `==` instead, as that works consistenty
we recommend using `==` instead, as that works consistently
across namespaces:

```python exec="1" source="above" session="backcompat"
Expand Down
95 changes: 60 additions & 35 deletions docs/pandas_like_concepts/boolean.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,56 +6,81 @@ For example, if you do `nw.col('a')*2`, then:
- Values which were non-null get multiplied by 2.
- Null values stay null.

```python exec="1" source="above" session="boolean" result="python"
```python exec="1" source="above" session="boolean"
import narwhals as nw

import pandas as pd
import polars as pl
import pyarrow as pa
from narwhals.typing import FrameT

data = {"a": [1.4, None, 4.2]}
print("pandas output")
print(nw.from_native(pd.DataFrame(data)).with_columns(b=nw.col("a") * 2).to_native())
print("\nPolars output")
print(nw.from_native(pl.DataFrame(data)).with_columns(b=nw.col("a") * 2).to_native())
print("\nPyArrow output")
print(nw.from_native(pa.table(data)).with_columns(b=nw.col("a") * 2).to_native())


def multiplication(df: FrameT) -> FrameT:
return nw.from_native(df).with_columns((nw.col("a") * 2).alias("a*2")).to_native()
```

=== "pandas"
```python exec="true" source="material-block" result="python" session="boolean"
import pandas as pd

df = pd.DataFrame(data)
print(multiplication(df))
```

=== "Polars (eager)"
```python exec="true" source="material-block" result="python" session="boolean"
import polars as pl

df = pl.DataFrame(data)
print(multiplication(df))
```

=== "PyArrow"
```python exec="true" source="material-block" result="python" session="boolean"
import pyarrow as pa

table = pa.table(data)
print(multiplication(table))
```

What do we do, however, when the result column is boolean? For
example, `nw.col('a') > 0`?
Unfortunately, this is backend-dependent:

- for all backends except pandas, null values are preserved
- for pandas, this depends on the dtype backend:
- for PyArrow dtypes and pandas nullable dtypes, null
values are preserved
- for the classic NumPy dtypes, null values are typically
filled in with `False`.
- for PyArrow dtypes and pandas nullable dtypes, null values are preserved
- for the classic NumPy dtypes, null values are typically filled in with `False`.

pandas is generally moving towards nullable dtypes, and they
[may become the default in the future](https://github.com/pandas-dev/pandas/pull/58988),
so we hope that the classical NumPy dtypes not supporting null values will just
be a temporary legacy pandas issue which will eventually go
away anyway.

```python exec="1" source="above" session="boolean" result="python"
print("pandas output")
print(nw.from_native(pd.DataFrame(data)).with_columns(b=nw.col("a") > 2).to_native())
print("\npandas (nullable dtypes) output")
print(
nw.from_native(pd.DataFrame(data, dtype="Float64"))
.with_columns(b=nw.col("a") > 2)
.to_native()
)
print("\npandas (pyarrow dtypes) output")
print(
nw.from_native(pd.DataFrame(data, dtype="Float64[pyarrow]"))
.with_columns(b=nw.col("a") > 2)
.to_native()
)
print("\nPolars output")
print(nw.from_native(pl.DataFrame(data)).with_columns(b=nw.col("a") > 2).to_native())
print("\nPyArrow output")
print(nw.from_native(pa.table(data)).with_columns(b=nw.col("a") > 2).to_native())
```
```python exec="1" source="above" session="boolean"
def comparison(df: FrameT) -> FrameT:
return nw.from_native(df).with_columns((nw.col("a") > 2).alias("a>2")).to_native()
```

=== "pandas"
```python exec="true" source="material-block" result="python" session="boolean"
import pandas as pd

df = pd.DataFrame(data)
print(comparison(df))
```

=== "Polars (eager)"
```python exec="true" source="material-block" result="python" session="boolean"
import polars as pl

df = pl.DataFrame(data)
print(comparison(df))
```

=== "PyArrow"
```python exec="true" source="material-block" result="python" session="boolean"
import pyarrow as pa

table = pa.table(data)
print(comparison(table))
```

0 comments on commit 1fb7b85

Please sign in to comment.