diff --git a/docs/backcompat.md b/docs/backcompat.md index 807d51ed6..e44c38ebf 100644 --- a/docs/backcompat.md +++ b/docs/backcompat.md @@ -134,7 +134,7 @@ before making any change. ``` To check if a dtype is a datetime (regardless of `time_unit` or `time_zone`) - we recommend using `==` instead, as that works consistenty + we recommend using `==` instead, as that works consistently across namespaces: ```python exec="1" source="above" session="backcompat" diff --git a/docs/pandas_like_concepts/boolean.md b/docs/pandas_like_concepts/boolean.md index fe60904cb..c1b434f57 100644 --- a/docs/pandas_like_concepts/boolean.md +++ b/docs/pandas_like_concepts/boolean.md @@ -6,32 +6,49 @@ For example, if you do `nw.col('a')*2`, then: - Values which were non-null get multiplied by 2. - Null values stay null. -```python exec="1" source="above" session="boolean" result="python" +```python exec="1" source="above" session="boolean" import narwhals as nw - -import pandas as pd -import polars as pl -import pyarrow as pa +from narwhals.typing import FrameT data = {"a": [1.4, None, 4.2]} -print("pandas output") -print(nw.from_native(pd.DataFrame(data)).with_columns(b=nw.col("a") * 2).to_native()) -print("\nPolars output") -print(nw.from_native(pl.DataFrame(data)).with_columns(b=nw.col("a") * 2).to_native()) -print("\nPyArrow output") -print(nw.from_native(pa.table(data)).with_columns(b=nw.col("a") * 2).to_native()) + + +def multiplication(df: FrameT) -> FrameT: + return nw.from_native(df).with_columns((nw.col("a") * 2).alias("a*2")).to_native() ``` +=== "pandas" + ```python exec="true" source="material-block" result="python" session="boolean" + import pandas as pd + + df = pd.DataFrame(data) + print(multiplication(df)) + ``` + +=== "Polars (eager)" + ```python exec="true" source="material-block" result="python" session="boolean" + import polars as pl + + df = pl.DataFrame(data) + print(multiplication(df)) + ``` + +=== "PyArrow" + ```python exec="true" source="material-block" result="python" session="boolean" + import pyarrow as pa + + table = pa.table(data) + print(multiplication(table)) + ``` + What do we do, however, when the result column is boolean? For example, `nw.col('a') > 0`? Unfortunately, this is backend-dependent: - for all backends except pandas, null values are preserved - for pandas, this depends on the dtype backend: - - for PyArrow dtypes and pandas nullable dtypes, null - values are preserved - - for the classic NumPy dtypes, null values are typically - filled in with `False`. + - for PyArrow dtypes and pandas nullable dtypes, null values are preserved + - for the classic NumPy dtypes, null values are typically filled in with `False`. pandas is generally moving towards nullable dtypes, and they [may become the default in the future](https://github.com/pandas-dev/pandas/pull/58988), @@ -39,23 +56,31 @@ so we hope that the classical NumPy dtypes not supporting null values will just be a temporary legacy pandas issue which will eventually go away anyway. -```python exec="1" source="above" session="boolean" result="python" -print("pandas output") -print(nw.from_native(pd.DataFrame(data)).with_columns(b=nw.col("a") > 2).to_native()) -print("\npandas (nullable dtypes) output") -print( - nw.from_native(pd.DataFrame(data, dtype="Float64")) - .with_columns(b=nw.col("a") > 2) - .to_native() -) -print("\npandas (pyarrow dtypes) output") -print( - nw.from_native(pd.DataFrame(data, dtype="Float64[pyarrow]")) - .with_columns(b=nw.col("a") > 2) - .to_native() -) -print("\nPolars output") -print(nw.from_native(pl.DataFrame(data)).with_columns(b=nw.col("a") > 2).to_native()) -print("\nPyArrow output") -print(nw.from_native(pa.table(data)).with_columns(b=nw.col("a") > 2).to_native()) -``` \ No newline at end of file +```python exec="1" source="above" session="boolean" +def comparison(df: FrameT) -> FrameT: + return nw.from_native(df).with_columns((nw.col("a") > 2).alias("a>2")).to_native() +``` + +=== "pandas" + ```python exec="true" source="material-block" result="python" session="boolean" + import pandas as pd + + df = pd.DataFrame(data) + print(comparison(df)) + ``` + +=== "Polars (eager)" + ```python exec="true" source="material-block" result="python" session="boolean" + import polars as pl + + df = pl.DataFrame(data) + print(comparison(df)) + ``` + +=== "PyArrow" + ```python exec="true" source="material-block" result="python" session="boolean" + import pyarrow as pa + + table = pa.table(data) + print(comparison(table)) + ```