Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make plotly-express dataframe agnostic via narwhals #4790

Merged
merged 131 commits into from
Nov 13, 2024
Merged
Show file tree
Hide file tree
Changes from 121 commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
9873e97
non core changes
FBruzzesi Sep 28, 2024
0389591
_core overhaul
FBruzzesi Sep 28, 2024
ba93236
some _core fixes
FBruzzesi Sep 28, 2024
421fc1d
tests replace sort_index(axis=1)
FBruzzesi Sep 28, 2024
ca5c820
reset_index in concat and allow any object to pandas
FBruzzesi Sep 28, 2024
a6aab24
trendline prep
FBruzzesi Sep 29, 2024
7665f10
WIP Index
FBruzzesi Sep 29, 2024
ec4f250
clean from breakpoints
FBruzzesi Sep 29, 2024
7e0d4c2
some tests fix
FBruzzesi Sep 29, 2024
5543638
hotfix and tests output to pandas
FBruzzesi Sep 29, 2024
cd0dab7
FIX: columns never as index
FBruzzesi Sep 29, 2024
f334b32
getting there with the tests
FBruzzesi Sep 29, 2024
e5eb949
get_column instead of pandas slicing, unix to seconds
FBruzzesi Sep 30, 2024
7747e30
bump narhwals, hierarchy fastpath
FBruzzesi Oct 1, 2024
ac00b36
fix to_unindexed_series
FBruzzesi Oct 1, 2024
da80c5b
fix trendline
FBruzzesi Oct 1, 2024
8a72ba1
rm numpy dep in _core
FBruzzesi Oct 2, 2024
aeff203
fix: _check_dataframe_all_leaves
FBruzzesi Oct 2, 2024
2041bef
(maybe) fix to_unindexed_series
FBruzzesi Oct 2, 2024
71473f1
(maybe) fix to_unindexed_series
FBruzzesi Oct 2, 2024
9f74c38
started tests with constructor
FBruzzesi Oct 2, 2024
28587c9
added constructor to all tests
FBruzzesi Oct 2, 2024
1bb2448
added some comments for fixme
FBruzzesi Oct 2, 2024
f45addf
to_py_scalar and more tests
FBruzzesi Oct 3, 2024
5341759
dealing with exceptions and tests
FBruzzesi Oct 3, 2024
dfc957c
bump version, sort(...,nulls_last=True)
FBruzzesi Oct 4, 2024
90f2667
We did it: no more dups in group by :D
FBruzzesi Oct 4, 2024
fb58d1b
concat_str
FBruzzesi Oct 5, 2024
ddb3b35
fix test_several_dataframes
FBruzzesi Oct 5, 2024
37ce302
dedups customdata
FBruzzesi Oct 5, 2024
4da8768
getting there
FBruzzesi Oct 6, 2024
210e01a
xfail pyarrow chunked-array because name-less
FBruzzesi Oct 6, 2024
c00525e
all green with edge narhwals
FBruzzesi Oct 6, 2024
3486a3e
add pandas nullable constructors in tests
FBruzzesi Oct 7, 2024
c0ce093
bump narwhals and address todos
FBruzzesi Oct 9, 2024
0eb6951
check narwhals installation
FBruzzesi Oct 9, 2024
844a6a9
rm unused comments
FBruzzesi Oct 9, 2024
0c27789
rm unused code
FBruzzesi Oct 9, 2024
0e6ff78
add pyarrow and narwhals to requirements_39_pandas_2_optional
FBruzzesi Oct 9, 2024
c2337c9
requirements, test requirements optional
FBruzzesi Oct 15, 2024
2cc5d7b
refactor tests
FBruzzesi Oct 15, 2024
1b27487
address feedbacks
FBruzzesi Oct 15, 2024
23a23be
typos
FBruzzesi Oct 15, 2024
7968cff
conftest
FBruzzesi Oct 15, 2024
cf76721
merge master
FBruzzesi Oct 15, 2024
91db84b
mock interchange
FBruzzesi Oct 15, 2024
5c6772e
optional requirements
FBruzzesi Oct 15, 2024
9ec3f9e
move conftest in express folder
FBruzzesi Oct 15, 2024
400a624
hotfix and figure_factory hexbin
FBruzzesi Oct 15, 2024
1aa5163
old versions, polars[timezone], hotfix
FBruzzesi Oct 16, 2024
594ded0
fix frame value in hexbin
FBruzzesi Oct 16, 2024
6676061
copy numpy array
FBruzzesi Oct 16, 2024
d7d2884
hotfix hexbin mapbox
FBruzzesi Oct 16, 2024
d6ee676
Merge branch 'master' into plotly-with-narwhals
FBruzzesi Oct 16, 2024
82c114d
fix test
FBruzzesi Oct 17, 2024
0ceabc1
Merge branch 'plotly:master' into plotly-with-narwhals
FBruzzesi Oct 17, 2024
c9b626e
use lazy in process_dataframe_hierarchy
FBruzzesi Oct 17, 2024
87841d1
fix custom sort in process_dataframe_pie
FBruzzesi Oct 18, 2024
ffa7b3b
Merge branch 'master' into plotly-with-narwhals
archmoj Oct 21, 2024
3ba19ae
bump version and adjust core
FBruzzesi Oct 21, 2024
a70146b
use dtype.is_numeric
FBruzzesi Oct 22, 2024
1fa9fe4
Merge branch 'master' into plotly-with-narwhals
FBruzzesi Oct 22, 2024
0103aa6
revert test
FBruzzesi Oct 22, 2024
673d141
Merge branch 'plotly-with-narwhals' of https://github.com/FBruzzesi/p…
FBruzzesi Oct 22, 2024
b858ed8
feedback adjustments
FBruzzesi Oct 23, 2024
bbcf438
Merge branch 'master' into plotly-with-narwhals
FBruzzesi Oct 23, 2024
49efae2
raise if numpy is missing, conftest fix, typo
FBruzzesi Oct 25, 2024
a36bc24
__plotly_n_unique__
FBruzzesi Oct 25, 2024
c119153
Merge branch 'master' into plotly-with-narwhals
FBruzzesi Oct 25, 2024
7416407
format
FBruzzesi Oct 25, 2024
1867f6f
format
FBruzzesi Oct 25, 2024
d3a28c0
feedback adjustments
FBruzzesi Oct 27, 2024
e6e9994
use drop_null_keys, some pandas fastpaths
MarcoGorelli Oct 25, 2024
64b8c70
bump narwhals version
MarcoGorelli Oct 27, 2024
3f6b383
some improvements by Marco
FBruzzesi Oct 27, 2024
755aea8
format and pyspark path
FBruzzesi Oct 27, 2024
6f18021
add narwhals to requirements core
FBruzzesi Oct 27, 2024
4d62e73
Update packages/python/plotly/plotly/express/_core.py
FBruzzesi Oct 28, 2024
a770fd8
refactor checking for df
MarcoGorelli Oct 29, 2024
7d6f7d6
pushdown only for interchange libraries, sort out test
MarcoGorelli Oct 29, 2024
b8c10ec
Update packages/python/plotly/plotly/express/_core.py
MarcoGorelli Oct 29, 2024
490b64a
fixup
MarcoGorelli Oct 29, 2024
f7fd4c9
Merge remote-tracking branch 'origin/plotly-with-narwhals' into plotl…
MarcoGorelli Oct 29, 2024
8753acb
lint
MarcoGorelli Oct 29, 2024
1429e6f
bump narwhals version
MarcoGorelli Oct 29, 2024
878d4db
refactor checking for df and bump version
FBruzzesi Oct 29, 2024
192e0a8
use token in process_dataframe_hierarchy
FBruzzesi Oct 29, 2024
de6761c
Range(label=...) for px.funnel
FBruzzesi Oct 29, 2024
bcfef68
improve error message and in-line comments
FBruzzesi Oct 30, 2024
519cc68
better comments
FBruzzesi Oct 30, 2024
e5520a7
rm unused import and fix typo
FBruzzesi Oct 31, 2024
b855352
Merge branch 'master' into plotly-with-narwhals
FBruzzesi Oct 31, 2024
51e2b23
make sure column + token is unique, replace **{} with .alias()
FBruzzesi Oct 31, 2024
7ef9f28
WIP
FBruzzesi Oct 31, 2024
e9a367d
WIP
FBruzzesi Oct 31, 2024
12fed31
Merge branch 'master' into plotly-with-narwhals
FBruzzesi Nov 1, 2024
27b2996
use nw.get_native_namespace
FBruzzesi Nov 1, 2024
f27f959
Merge branch 'plotly-with-narwhals' of https://github.com/FBruzzesi/p…
FBruzzesi Nov 1, 2024
126a79d
Merge branch 'master' into feat/dataframe-agnostic-data
FBruzzesi Nov 1, 2024
7735366
add narwhals in various requirements
FBruzzesi Nov 1, 2024
b6516b4
docstrings
FBruzzesi Nov 1, 2024
6f1389f
rm type hints, change post_agg to use alias
FBruzzesi Nov 1, 2024
db22268
feedback adjustments
FBruzzesi Nov 1, 2024
b514c01
move imports out, fix pyarrow
FBruzzesi Nov 1, 2024
ce8fb9a
rm unused narwhals wrapper
FBruzzesi Nov 1, 2024
e47827e
comment about stable api
FBruzzesi Nov 1, 2024
9a9283a
update changelog
FBruzzesi Nov 1, 2024
2630a5a
fixup time zone handling
MarcoGorelli Nov 1, 2024
fef6dbe
modin and cudf
FBruzzesi Nov 3, 2024
48c7f62
defensive from_native call
FBruzzesi Nov 4, 2024
18cc11c
typo
FBruzzesi Nov 4, 2024
d94cbf7
fixup timezones
FBruzzesi Nov 4, 2024
c320c46
move from object to datetime dtype in _plotly_utils/test/validators
FBruzzesi Nov 4, 2024
afdb31f
simplify ecdfnorm
MarcoGorelli Nov 4, 2024
68ab52a
Merge pull request #4 from MarcoGorelli/ecdf-mode-perf
FBruzzesi Nov 5, 2024
b8ccec4
Merge branch 'master' into plotly-with-narwhals
FBruzzesi Nov 5, 2024
f102998
rm to_py_scalar call in for loop -> fix Pie performances
FBruzzesi Nov 5, 2024
2df0427
Merge branch 'plotly-with-narwhals' of https://github.com/FBruzzesi/p…
FBruzzesi Nov 5, 2024
55a0178
Merge branch 'master' into feat/dataframe-agnostic-data
FBruzzesi Nov 5, 2024
bb327d5
merge feat/dataframe-agnostic-data
FBruzzesi Nov 5, 2024
7d611fb
use return_type directly when building datasets
FBruzzesi Nov 5, 2024
a22a7be
stocks date to string and test_trendline_on_timeseries fix
FBruzzesi Nov 6, 2024
44a52e5
merge master and rm FIXME comment
FBruzzesi Nov 7, 2024
fc74b2e
do not repeat new_series unnecessarely
FBruzzesi Nov 8, 2024
499e2fa
bump version, use numpy for range
FBruzzesi Nov 8, 2024
d2e1008
trigger ci now that new version is published
FBruzzesi Nov 8, 2024
742b2ec
add narwhals to np2_optional.txt
FBruzzesi Nov 8, 2024
269dea6
version
FBruzzesi Nov 8, 2024
b1dc48d
Merge branch 'master' into plotly-with-narwhals
MarcoGorelli Nov 12, 2024
17fb96f
Merge branch 'master' into plotly-with-narwhals
FBruzzesi Nov 12, 2024
9f2c55b
Merge branch 'master' into plotly-with-narwhals
FBruzzesi Nov 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ This project adheres to [Semantic Versioning](http://semver.org/).

### Updated

- Updated plotly.py to use base64 encoding of arrays in plotly JSON to improve performance.
- Updated plotly.py to use base64 encoding of arrays in plotly JSON to improve performance.
- Add `subtitle` attribute to all Plotly Express traces
- Make plotly-express dataframe agnostic via Narwhals [#4790](https://github.com/plotly/plotly.py/pull/4790)

## [5.24.1] - 2024-09-12

Expand Down
52 changes: 22 additions & 30 deletions packages/python/plotly/_plotly_utils/basevalidators.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import re
import sys
import warnings
import narwhals.stable.v1 as nw
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerning stability and testing purposes, wondering if it's better to load specific version instead?

Copy link
Contributor Author

@FBruzzesi FBruzzesi Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you suggest to pin a specific narwhals version in the requirements?

We try to make some promises of non-breaking changes with the stable API. Additionally, we test each change against the main/master branches of the downstream libraries that adopted narwhals, in a github action.

Of course this is not perfect, but gives us some confidence when we make a change and a release

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @archmoj ! nice to meet you, and thanks for taking a look 🙏

I'd advise against pinning the Narwhals version exactly, and that's because it might conflict with other tools which are using Narwhals

For example, Altair currently has narwhals>=1.5.2, and this PR has narwhals>=1.12.0. So, a user wanting to install both just needs to have narwhals>=1.12.0, and there's no issues

If Plotly were to pin Narwhals exactly (say, narwhals==1.12.0), and Altair were to do the same (say, narwhals==1.10.0), then users wouldn't be able to install both Altair and Plotly together, which would be a pity

Indeed, as @FBruzzesi says, the intention behind narwhals.stable.v1 is to have a stable and perfectly backwards-compatible API, in order to give downstream libraries (e.g. Plotly, Altair, Marimo, ...) the confidence to not pin the Narwhals version exactly, allowing for multiple major libraries to have Narwhals as a dependency without forbidding users to install them all together

Furthermore, for major libraries, we run downstream tests in the Narwhals CI to check that any change we add won't break things for you

Happy to discuss further if you like of course (and to take a call if you'd like to discuss this or other topics) 🤗

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you very much for the clarification.
Please a comment including this information in this file so that we resolve it.

Copy link
Contributor Author

@FBruzzesi FBruzzesi Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the following comment under the import be enough?

import narwhals.stable.v1 as nw
# The reason to use narwhals.stable.v1 is to have a stable and perfectly 
# backwards-compatible API, hence the confidence to not pin the Narwhals version exactly,
# allowing for multiple major libraries to have Narwhals as a dependency without
# forbidding users to install them all together due to dependency conflicts.

It's a TL;DR of what Marco mentioned


from _plotly_utils.optional_imports import get_module

Expand Down Expand Up @@ -72,8 +73,6 @@ def copy_to_readonly_numpy_array(v, kind=None, force_numeric=False):
"""
np = get_module("numpy")

# Don't force pandas to be loaded, we only want to know if it's already loaded
pd = get_module("pandas", should_load=False)
assert np is not None

# ### Process kind ###
Expand All @@ -93,34 +92,26 @@ def copy_to_readonly_numpy_array(v, kind=None, force_numeric=False):
"O": "object",
}

# Handle pandas Series and Index objects
if pd and isinstance(v, (pd.Series, pd.Index)):
if v.dtype.kind in numeric_kinds:
# Get the numeric numpy array so we use fast path below
v = v.values
elif v.dtype.kind == "M":
# Convert datetime Series/Index to numpy array of datetimes
if isinstance(v, pd.Series):
with warnings.catch_warnings():
warnings.simplefilter("ignore", FutureWarning)
# Series.dt.to_pydatetime will return Index[object]
# https://github.com/pandas-dev/pandas/pull/52459
v = np.array(v.dt.to_pydatetime())
else:
# DatetimeIndex
v = v.to_pydatetime()
elif pd and isinstance(v, pd.DataFrame) and len(set(v.dtypes)) == 1:
dtype = v.dtypes.tolist()[0]
if dtype.kind in numeric_kinds:
v = v.values
elif dtype.kind == "M":
with warnings.catch_warnings():
warnings.simplefilter("ignore", FutureWarning)
# Series.dt.to_pydatetime will return Index[object]
# https://github.com/pandas-dev/pandas/pull/52459
v = [
np.array(row.dt.to_pydatetime()).tolist() for i, row in v.iterrows()
]
# With `pass_through=True``, the original object will be returned if unable to convert
FBruzzesi marked this conversation as resolved.
Show resolved Hide resolved
# to a Narwhals DataFrame or Series.
v = nw.from_native(v, allow_series=True, pass_through=True)

if isinstance(v, nw.Series):
if v.dtype == nw.Datetime and v.dtype.time_zone is not None:
# Remove time zone so that local time is displayed
v = v.dt.replace_time_zone(None).to_numpy()
else:
v = v.to_numpy()
elif isinstance(v, nw.DataFrame):
schema = v.schema
overrides = {}
for key, val in schema.items():
if val == nw.Datetime and val.time_zone is not None:
# Remove time zone so that local time is displayed
overrides[key] = nw.col(key).dt.replace_time_zone(None)
if overrides:
v = v.with_columns(**overrides)
v = v.to_numpy()

if not isinstance(v, np.ndarray):
# v has its own logic on how to convert itself into a numpy array
Expand Down Expand Up @@ -193,6 +184,7 @@ def is_homogeneous_array(v):
np
and isinstance(v, np.ndarray)
or (pd and isinstance(v, (pd.Series, pd.Index)))
or (isinstance(v, nw.Series))
):
return True
if is_numpy_convertable(v):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,12 +73,13 @@ def color_categorical_pandas(request, pandas_type):
def dates_array(request):
return np.array(
[
datetime(year=2013, month=10, day=10),
datetime(year=2013, month=11, day=10),
datetime(year=2013, month=12, day=10),
datetime(year=2014, month=1, day=10),
datetime(year=2014, month=2, day=10),
]
"2013-10-10",
"2013-11-10",
"2013-12-10",
"2014-01-10",
"2014-02-10",
],
dtype="datetime64[ns]",
)


Expand Down Expand Up @@ -183,7 +184,7 @@ def test_data_array_validator_dates_series(
assert isinstance(res, np.ndarray)

# Check dtype
assert res.dtype == "object"
assert res.dtype == "<M8[ns]"

# Check values
np.testing.assert_array_equal(res, dates_array)
Expand All @@ -200,7 +201,7 @@ def test_data_array_validator_dates_dataframe(
assert isinstance(res, np.ndarray)

# Check dtype
assert res.dtype == "object"
assert res.dtype == "<M8[ns]"

# Check values
np.testing.assert_array_equal(res, dates_array.reshape(len(dates_array), 1))
1 change: 1 addition & 0 deletions packages/python/plotly/optional-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ ipython

## pandas deps for some matplotlib functionality ##
pandas
narwhals>=1.13.2

## scipy deps for some FigureFactory functions ##
scipy
Expand Down
Loading