-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: make plotly-express dataframe agnostic via narwhals #4790
Changes from 93 commits
9873e97
0389591
ba93236
421fc1d
ca5c820
a6aab24
7665f10
ec4f250
7e0d4c2
5543638
cd0dab7
f334b32
e5eb949
7747e30
ac00b36
da80c5b
8a72ba1
aeff203
2041bef
71473f1
9f74c38
28587c9
1bb2448
f45addf
5341759
dfc957c
90f2667
fb58d1b
ddb3b35
37ce302
4da8768
210e01a
c00525e
3486a3e
c0ce093
0eb6951
844a6a9
0c27789
0e6ff78
c2337c9
2cc5d7b
1b27487
23a23be
7968cff
cf76721
91db84b
5c6772e
9ec3f9e
400a624
1aa5163
594ded0
6676061
d7d2884
d6ee676
82c114d
0ceabc1
c9b626e
87841d1
ffa7b3b
3ba19ae
a70146b
1fa9fe4
0103aa6
673d141
b858ed8
bbcf438
49efae2
a36bc24
c119153
7416407
1867f6f
d3a28c0
e6e9994
64b8c70
3f6b383
755aea8
6f18021
4d62e73
a770fd8
7d6f7d6
b8c10ec
490b64a
f7fd4c9
8753acb
1429e6f
878d4db
192e0a8
de6761c
bcfef68
519cc68
e5520a7
b855352
51e2b23
7ef9f28
e9a367d
12fed31
27b2996
f27f959
126a79d
7735366
b6516b4
6f1389f
db22268
b514c01
ce8fb9a
e47827e
9a9283a
2630a5a
fef6dbe
48c7f62
18cc11c
d94cbf7
c320c46
afdb31f
68ab52a
b8ccec4
f102998
2df0427
55a0178
bb327d5
7d611fb
a22a7be
44a52e5
fc74b2e
499e2fa
d2e1008
742b2ec
269dea6
b1dc48d
17fb96f
9f2c55b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,9 +8,6 @@ | |
exposed as part of the public API for documentation purposes. | ||
""" | ||
|
||
import pandas as pd | ||
import numpy as np | ||
|
||
__all__ = ["ols", "lowess", "rolling", "ewm", "expanding"] | ||
|
||
|
||
|
@@ -32,6 +29,8 @@ def ols(trendline_options, x_raw, x, y, x_label, y_label, non_missing): | |
respect to the base 10 logarithm of the input. Note that this means no zeros can | ||
be present in the input. | ||
""" | ||
import numpy as np | ||
|
||
valid_options = ["add_constant", "log_x", "log_y"] | ||
for k in trendline_options.keys(): | ||
if k not in valid_options: | ||
|
@@ -110,11 +109,25 @@ def lowess(trendline_options, x_raw, x, y, x_label, y_label, non_missing): | |
|
||
|
||
def _pandas(mode, trendline_options, x_raw, y, non_missing): | ||
import numpy as np | ||
|
||
try: | ||
import pandas as pd | ||
except ImportError: | ||
msg = "Trendline requires pandas to be installed" | ||
raise ImportError(msg) | ||
|
||
modes = dict(rolling="Rolling", ewm="Exponentially Weighted", expanding="Expanding") | ||
trendline_options = trendline_options.copy() | ||
function_name = trendline_options.pop("function", "mean") | ||
function_args = trendline_options.pop("function_args", dict()) | ||
series = pd.Series(y, index=x_raw) | ||
|
||
series = pd.Series(np.copy(y), index=x_raw.to_pandas()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MarcoGorelli for old numpy versions, arrow arrays and polars series
when then it is used by pandas operations. I wonder if we There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For my own curiosity why do we need Pandas still here? Does Narwhals provide a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We do not support If and when that will be the case, then pandas can be completely optional. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think there's a way to implement rolling functions in numpy, especially with some of the extra options We'll add them to Narwhals though, we'd talked about it last month but decided that at the time it wasn't the top priority - but it is on the roadmap There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it. This approach makes sense to me -- I opened #4834 to keep track of this as a follow-up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
rather than an option, does it makes sense here to check |
||
|
||
# TODO: Narwhals Series/DataFrame do not support rolling, ewm nor expanding, therefore | ||
# it fallbacks to pandas Series independently of the original type. | ||
# Plotly issue: https://github.com/plotly/plotly.py/issues/4834 | ||
# Narwhals issue: https://github.com/narwhals-dev/narwhals/issues/1254 | ||
agg = getattr(series, mode) # e.g. series.rolling | ||
agg_obj = agg(**trendline_options) # e.g. series.rolling(**opts) | ||
function = getattr(agg_obj, function_name) # e.g. series.rolling(**opts).mean | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
import pandas as pd | ||
import polars as pl | ||
import pyarrow as pa | ||
import pytest | ||
|
||
from narwhals.typing import IntoDataFrame | ||
from narwhals.utils import parse_version | ||
|
||
|
||
def pandas_constructor(obj) -> IntoDataFrame: | ||
return pd.DataFrame(obj) # type: ignore[no-any-return] | ||
|
||
|
||
def pandas_nullable_constructor(obj) -> IntoDataFrame: | ||
return pd.DataFrame(obj).convert_dtypes(dtype_backend="numpy_nullable") # type: ignore[no-any-return] | ||
|
||
|
||
def pandas_pyarrow_constructor(obj) -> IntoDataFrame: | ||
return pd.DataFrame(obj).convert_dtypes(dtype_backend="pyarrow") # type: ignore[no-any-return] | ||
|
||
|
||
def polars_eager_constructor(obj) -> IntoDataFrame: | ||
return pl.DataFrame(obj) | ||
|
||
|
||
def pyarrow_table_constructor(obj) -> IntoDataFrame: | ||
return pa.table(obj) # type: ignore[no-any-return] | ||
|
||
|
||
constructors = [polars_eager_constructor, pyarrow_table_constructor, pandas_constructor] | ||
|
||
if parse_version(pd.__version__) >= parse_version("2.0.0"): | ||
constructors.extend( | ||
[ | ||
pandas_nullable_constructor, | ||
pandas_pyarrow_constructor, | ||
] | ||
) | ||
|
||
|
||
@pytest.fixture(params=constructors) | ||
def constructor(request: pytest.FixtureRequest): | ||
return request.param # type: ignore[no-any-return] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning stability and testing purposes, wondering if it's better to load specific version instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you suggest to pin a specific narwhals version in the requirements?
We try to make some promises of non-breaking changes with the stable API. Additionally, we test each change against the main/master branches of the downstream libraries that adopted narwhals, in a github action.
Of course this is not perfect, but gives us some confidence when we make a change and a release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @archmoj ! nice to meet you, and thanks for taking a look 🙏
I'd advise against pinning the Narwhals version exactly, and that's because it might conflict with other tools which are using Narwhals
For example, Altair currently has
narwhals>=1.5.2
, and this PR hasnarwhals>=1.12.0
. So, a user wanting to install both just needs to havenarwhals>=1.12.0
, and there's no issuesIf Plotly were to pin Narwhals exactly (say,
narwhals==1.12.0
), and Altair were to do the same (say,narwhals==1.10.0
), then users wouldn't be able to install both Altair and Plotly together, which would be a pityIndeed, as @FBruzzesi says, the intention behind
narwhals.stable.v1
is to have a stable and perfectly backwards-compatible API, in order to give downstream libraries (e.g. Plotly, Altair, Marimo, ...) the confidence to not pin the Narwhals version exactly, allowing for multiple major libraries to have Narwhals as a dependency without forbidding users to install them all togetherFurthermore, for major libraries, we run downstream tests in the Narwhals CI to check that any change we add won't break things for you
Happy to discuss further if you like of course (and to take a call if you'd like to discuss this or other topics) 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you very much for the clarification.
Please a comment including this information in this file so that we resolve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the following comment under the import be enough?
It's a TL;DR of what Marco mentioned