Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plotly with pandas>=3.0 fails to raise name conflict error #4837

Closed
MarcoGorelli opened this issue Oct 28, 2024 · 3 comments
Closed

plotly with pandas>=3.0 fails to raise name conflict error #4837

MarcoGorelli opened this issue Oct 28, 2024 · 3 comments
Assignees
Labels
bug something broken P2 considered for next cycle

Comments

@MarcoGorelli
Copy link
Contributor

MarcoGorelli commented Oct 28, 2024

import pandas as pd
import plotly.express as px
df = pd.DataFrame(dict(x=[0, 1], y=[1, 10], z=[0.1, 0.8], money=[100, 200]))
df2 = pd.DataFrame(dict(time=[23, 26], money=[100, 200]))
fig = px.scatter(df, x="z", y=df2.money, size=df.y)

With pandas 2.2.3:

Traceback (most recent call last):
  File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/marimo/_runtime/executor.py", line 157, in execute_cell
    exec(cell.body, glbls)
  Cell marimo://trying_plotly.py#cell=cell-0
, line 5, in <module>
    fig = px.scatter(df, x="z", y=df2.money, size=df.y)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_chart_types.py", line 66, in scatter
    return make_figure(args=locals(), constructor=go.Scatter)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_core.py", line 2117, in make_figure
    args = build_dataframe(args, constructor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_core.py", line 1513, in build_dataframe
    df_output, wide_id_vars = process_args_into_dataframe(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_core.py", line 1271, in process_args_into_dataframe
    col_name = _check_name_not_reserved(field, reserved_names)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/scratch/.venv/lib/python3.12/site-packages/plotly/express/_core.py", line 1006, in _check_name_not_reserved
    raise NameError(
NameError: A name conflict was encountered for argument 'y'. A column or index with name 'y' is ambiguous.

With the latest pandas nightly (installable with pip install --pre --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple pandas) it just plots, without raising

Image

The difference is due to pandas no longer caching __getitem__ for columns:

in pandas 3.0+

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

In [3]: df['a'] is df['a']
Out[3]: False

in pandas 2.2.3

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

In [3]: df['a'] is df['a']
Out[3]: True
@gvwilson gvwilson added bug something broken P1 needed for current cycle labels Oct 28, 2024
@emilykl
Copy link
Contributor

emilykl commented Oct 28, 2024

@MarcoGorelli I can't tell whether size in the generated chart matches df2.money or df.y -- could you post a version where money=[200, 100] in order to clarify?

If the former, this is definitely a bug; if the latter, this seems fine -- I'm guessing it's the former though.

Do you happen to have a link to the Pandas PR containing this behavior change? I see a few related to __getitem__ in the changelog but can't find the exact one.

@gvwilson gvwilson added P2 considered for next cycle and removed P1 needed for current cycle labels Oct 29, 2024
@MarcoGorelli
Copy link
Contributor Author

MarcoGorelli commented Oct 29, 2024

hey - sure, if I do that, the output does actually look correct:

Image

I reported this because there's a test which checks that this raises:

with pytest.raises(NameError) as err_msg:
fig = px.scatter(df, x="z", y=df2.money, size=df.y)
assert "A name conflict was encountered for argument 'y'" in str(err_msg.value)

and it would no longer raise with pandas 3.0. However, to be honest I can't see what's ambiguous here, it looks well-defined.

So, either this is a false positive and this check for reserved names can be removed, or it breaks something else?

Looks like this was introduced in #1768, and the is vs equals topics was brought up #1768 (comment) . The people involved there don't seem to be active in Plotly any more, reckon it's OK to ask them?


The pandas PR is pandas-dev/pandas#56614 (though the removal of the item cache doesn't seem mentioned - maybe it's just considered an internal thing which users weren't meant to be relying on in the first place)

@MarcoGorelli
Copy link
Contributor Author

closed by #4790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken P2 considered for next cycle
Projects
None yet
Development

No branches or pull requests

3 participants