Skip to content

Commit

Permalink
add api reference
Browse files Browse the repository at this point in the history
  • Loading branch information
MarcoGorelli committed Mar 24, 2024
1 parent 3a7c77d commit e32916f
Show file tree
Hide file tree
Showing 17 changed files with 414 additions and 61 deletions.
26 changes: 26 additions & 0 deletions docs/api-reference/dataframe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# `narwhals.DataFrame`

::: narwhals.dataframe.DataFrame
handler: python
options:
members:
- to_pandas
- to_numpy
- shape
- __getitem__
- to_dict
- schema
- columns
- with_columns
- select
- rename
- head
- drop
- unique
- filter
- group_by
- sort
- join
show_root_heading: false
show_source: false
show_bases: false
Empty file.
22 changes: 22 additions & 0 deletions docs/api-reference/lazyframe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# `narwhals.LazyFrame`

::: narwhals.dataframe.LazyFrame
handler: python
options:
members:
- schema
- columns
- with_columns
- select
- rename
- head
- drop
- unique
- filter
- group_by
- sort
- join
- collect
show_root_heading: false
show_source: false
show_bases: false
12 changes: 12 additions & 0 deletions docs/api-reference/narwhals.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# `narwhals`

Here are the top-level functions available in Narwhals.

::: narwhals
handler: python
options:
members:
- from_native
- to_native
#show_root_heading: false
show_source: false
Empty file added docs/api-reference/series.md
Empty file.
81 changes: 70 additions & 11 deletions docs/basics/column.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
# Column
# Series

In [dataframe.md](dataframe.md), you learned how to write a dataframe-agnostic function.

We only used DataFrame methods there - but what if we need to operate on its columns?

## Extracting a column
Note that Polars does not have lazy columns. If you need to operate on columns as part of
a dataframe operation, you should use expressions - but if you need to extract a single
column, you need to ensure that you start with an eager `DataFrame`. To do that, we'll
use the `nw.DataFrame` constructor, as opposed to `nw.from_native`.

## Extracting a series

## Example 1: filter based on a column's values
## Example 1: filter based on a series' values

This can stay lazy, so we just use `nw.from_native`:

```python exec="1" source="above" session="ex1"
import narwhals as nw

def my_func(df):
df_s = nw.DataFrame(df)
df_s = nw.from_native(df)
df_s = df_s.filter(nw.col('a') > 0)
return nw.to_native(df_s)
```
Expand All @@ -26,25 +32,32 @@ def my_func(df):
print(my_func(df))
```

=== "Polars"
=== "Polars (eager)"
```python exec="true" source="material-block" result="python" session="ex1"
import polars as pl

df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
print(my_func(df))
```

=== "Polars (lazy)"
```python exec="true" source="material-block" result="python" session="ex1"
import polars as pl

df = pl.LazyFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
print(my_func(df).collect())
```

## Example 2: multiply a column's values by a constant

Let's write a dataframe-agnostic function which multiplies the values in column
`'a'` by 2.
`'a'` by 2. This can also stay lazy.

```python exec="1" source="above" session="ex2"
import narwhals as nw

def my_func(df):
df_s = nw.DataFrame(df)
df_s = nw.from_native(df)
df_s = df_s.with_columns(nw.col('a')*2)
return nw.to_native(df_s)
```
Expand All @@ -57,22 +70,30 @@ def my_func(df):
print(my_func(df))
```

=== "Polars"
=== "Polars (eager)"
```python exec="true" source="material-block" result="python" session="ex2"
import polars as pl

df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
print(my_func(df))
```

=== "Polars (lazy)"
```python exec="true" source="material-block" result="python" session="ex2"
import polars as pl

df = pl.LazyFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
print(my_func(df).collect())
```

Note that column `'a'` was overwritten. If we had wanted to add a new column called `'c'` containing column `'a'`'s
values multiplied by 2, we could have used `Column.rename`:
values multiplied by 2, we could have used `Series.alias`:

```python exec="1" source="above" session="ex2.1"
import narwhals as nw

def my_func(df):
df_s = nw.DataFrame(df)
df_s = nw.from_native(df)
df_s = df_s.with_columns((nw.col('a')*2).alias('c'))
return nw.to_native(df_s)
```
Expand All @@ -85,10 +106,48 @@ def my_func(df):
print(my_func(df))
```

=== "Polars"
=== "Polars (eager)"
```python exec="true" source="material-block" result="python" session="ex2.1"
import polars as pl

df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
print(my_func(df))
```

=== "Polars (lazy)"
```python exec="true" source="material-block" result="python" session="ex2.1"
import polars as pl

df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
print(my_func(df))
```

## Example 3: finding the mean of a column as a scalar

Now, we want to find the mean of column `'a'`, and we need it as a Python scalar.
This means that computation cannot stay lazy - it must execute!
Therefore, instead of `nw.from_dataframe`, we'll use `nw.DataFrame`.

```python exec="1" source="above" session="ex2"
import narwhals as nw

def my_func(df):
df_s = nw.DataFrame(df)
return df_s['a'].mean()
```

=== "pandas"
```python exec="true" source="material-block" result="python" session="ex2"
import pandas as pd

df = pd.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
print(my_func(df))
```

=== "Polars (eager)"
```python exec="true" source="material-block" result="python" session="ex2"
import polars as pl

df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
print(my_func(df))
```
7 changes: 5 additions & 2 deletions docs/basics/complete_example.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,10 @@ stored them in attributes `self.means` and `self.std_devs`.

The general strategy will be:

1. Initialise a Narwhals DataFrame by passing your dataframe to `nw.DataFrame`.
1. Initialise a Narwhals DataFrame or LazyFrame by passing your dataframe to `nw.from_native`.

Note: if you need eager execution, use `nw.DataFrame` instead.

2. Express your logic using the subset of the Polars API supported by Narwhals.
3. If you need to return a dataframe to the user in its original library, call `narwhals.to_native`.

Expand All @@ -27,7 +30,7 @@ import narwhals as nw

class StandardScalar:
def transform(self, df):
df = nw.DataFrame(df)
df = nw.from_native(df)
df = df.with_columns(
(nw.col(col) - self._means[col]) / self._std_devs[col]
for col in df.columns
Expand Down
19 changes: 15 additions & 4 deletions docs/basics/dataframe.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@

To write a dataframe-agnostic function, the steps you'll want to follow are:

1. Initialise a Narwhals DataFrame by passing your dataframe to `nw.DataFrame`.
1. Initialise a Narwhals DataFrame or LazyFrame by passing your dataframe to `nw.from_native`.

Note: if you need eager execution, use `nw.DataFrame` instead.

2. Express your logic using the subset of the Polars API supported by Narwhals.
3. If you need to return a dataframe to the user in its original library, call `narwhals.to_native`.

Expand All @@ -16,9 +19,9 @@ import narwhals as nw

def func(df):
# 1. Create a Narwhals dataframe
df_s = nw.DataFrame(df)
df_s = nw.from_native(df)
# 2. Use the subset of the Polars API supported by Narwhals
df_s = df_s.group_by('a').agg(nw.col('b').mean())
df_s = df_s.group_by('a').agg(nw.col('b').mean()).sort('a')
# 3. Return a library from the user's original library
return nw.to_native(df_s)
```
Expand All @@ -32,10 +35,18 @@ Let's try it out:
print(func(df))
```

=== "Polars"
=== "Polars (eager)"
```python exec="true" source="material-block" result="python" session="df_ex1"
import polars as pl

df = pl.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
print(func(df))
```

=== "Polars (lazy)"
```python exec="true" source="material-block" result="python" session="df_ex1"
import polars as pl

df = pl.LazyFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
print(func(df).collect())
```
43 changes: 43 additions & 0 deletions docs/generate_members.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# ruff: noqa
# type: ignore
import sys

sys.path.append("..")

import pandas as pd
import polars as pl

pd_series = pd.Series([1], name="a").__column_consortium_standard__()
pl_series = pl.Series("a", [1]).__column_consortium_standard__()
pd_df = pd.DataFrame({"a": [1]}).__dataframe_consortium_standard__()
pl_df = pl.DataFrame({"a": [1]}).__dataframe_consortium_standard__()
pd_scalar = pd_df.col("a").mean()
pl_scalar = pl_df.col("a").mean()
pd_namespace = pd_df.__dataframe_namespace__()
pl_namespace = pl_df.__dataframe_namespace__()

for name, object in [
("pandas-column.md", pd_series),
("polars-column.md", pl_series),
("pandas-dataframe.md", pd_df),
("polars-dataframe.md", pl_df),
("pandas-scalar.md", pd_scalar),
("polars-scalar.md", pl_scalar),
("pandas-namespace.md", pd_scalar),
("polars-namespace.md", pl_scalar),
]:
members = [
i for i in object.__dir__() if not (i.startswith("_") and not i.startswith("__"))
]

with open(name) as fd:
content = fd.read()

members_txt = "\n - ".join(sorted(members)) + "\n "

start = content.index("members:")
end = content.index("show_signature")
content = content[:start] + f"members:\n - {members_txt}" + content[end:]

with open(name, "w") as fd:
fd.write(content)
2 changes: 1 addition & 1 deletion docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@ Then, if you start the Python REPL and see the following:
```python
>>> import narwhals
>>> narwhals
'0.4.1'
'0.6.9'
```
then installation worked correctly!
24 changes: 10 additions & 14 deletions docs/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,31 +13,27 @@ Then, please install the following:

Create a Python file `t.py` with the following content:

```python
```python exec="1" source="above" session="quickstart" result="python"
import pandas as pd
import polars as pl
import narwhals as nw


def my_function(df_any):
df = nw.DataFrame(df_any)
column_names = df.column_names
df = nw.from_native(df_any)
column_names = df.columns
return column_names


df_pandas = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df_polars = pl.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})

print('pandas result: ', my_function(df_pandas))
print('Polars result: ', my_function(df_polars))
print('pandas output')
print(my_function(df_pandas))
print('Polars output')
print(my_function(df_polars))
```

If you run `python t.py` and your output looks like this:
```
pandas result: ['a', 'b']
Polars result: ['a', 'b']
```

then all your installations worked perfectly.

Let's learn about what you just did, and what Narwhals can do for you.
If you run `python t.py` then your output should look like the above. This is the simplest possible example of a dataframe-agnostic
function - as we'll soon see, we can do much more advanced things.
Let's learn about what you just did, and what Narwhals can do for you!
4 changes: 1 addition & 3 deletions docs/reference.md → docs/related.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
# Reference

Here are some related projects.
# Related projects

## Dataframe Interchange Protocol

Expand Down
1 change: 0 additions & 1 deletion docs/requirements-docs.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
markdown-exec[ansi]
mkdocs
mkdocs-material
mkdocstrings
mkdocstrings[python]
Loading

0 comments on commit e32916f

Please sign in to comment.