-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python!): Use Altair in DataFrame.plot #17995
Conversation
def line( | ||
self, | ||
x: str | Any | None = None, | ||
y: str | Any | None = None, | ||
color: str | Any | None = None, | ||
order: str | Any | None = None, | ||
tooltip: str | Any | None = None, | ||
*args: Any, | ||
**kwargs: Any, | ||
) -> alt.Chart: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dangotbanned - may I ask for your input here please?
- which do you think are the most common types of plots which are worth explicitly making functions for? Functionality would be unaffected, they would just work better with tab completion
- how would you suggest typing the various arguments? Does Altair have public type hints?
- Any Altair maintainers you'd suggest looping into the discussion?
Thanks 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping, happy to help where I can @MarcoGorelli
Couple of resources up top that I think could be useful:
Will respond each question in another comment 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1
- which do you think are the most common types of plots which are worth explicitly making functions for? Functionality would be unaffected, they would just work better with tab completion
Can't speak for everyone, but for a reduced selection:
- Bar
- Line (Line, Trail)
- Scatter (Circle, Point, Square, Image, Text)
- Area (Area, Rect)
- Tick
- Boxplot
- Geoshape
- Field-dependent, but super important for those who need it
Looking at hvPlot, there are a few methods/chart types I'd need to do some digging to work out the equivalent in altair
(if there is one).
However, my suggestion would be using the names defined there, both for compatibility when switching backends and to reduce the number of methods.
Examples
Haven't covered everything here, but it's a start:
hvPlotTabular
-> altair.Chart
(bar|barh)
->mark_bar
box
->mark_boxplot
scatter
->mark_(circle|point|square|image|text)
labels
->mark_text
points
->mark_point
line
->mark_(line|trail)
(polygons|paths)
->mark_geoshape
(area|heatmap)
->mark_area
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2
- how would you suggest typing the various arguments? Does Altair have public type hints?
I might update this later after thinking on it some more.
Yeah they've been there since 5.2.0
but will be improved for altair>=5.4.0
with https://github.com/vega/altair/blob/main/altair/vegalite/v5/schema/_typing.py
For altair
the model is quite different to matplotlib
-style functions, but .encode()
would be where to start.
Something like:
# Annotation from `.encode()`
# y: Optional[str | Y | Map | YDatum | YValue] = Undefined
# Don't name it this pls
TypeForY = str | Mapping[str, Any] | Any
I wouldn't worry about any altair
-specific types here.
Spelling them out won't have an impact on attribute access of the result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3
- Any Altair maintainers you'd suggest looping into the discussion?
For typing @binste but really anyone from vega/altair#3452 I think would be interested (time-permitting)
Back again after thinking @MarcoGorelli Feel free to rename things, but I came up with this for the typing Super long code blockfrom __future__ import annotations
from typing import TYPE_CHECKING, Any, Mapping, Union
from typing_extensions import TypeAlias, TypedDict, Unpack
if TYPE_CHECKING:
import altair as alt
import narwhals.stable.v1 as nw
ChannelType: TypeAlias = Union[str, Mapping[str, Any], Any]
class EncodeKwds(TypedDict, total=False):
angle: ChannelType
color: ChannelType
column: ChannelType
description: ChannelType
detail: ChannelType | list[Any]
facet: ChannelType
fill: ChannelType
fillOpacity: ChannelType
href: ChannelType
key: ChannelType
latitude: ChannelType
latitude2: ChannelType
longitude: ChannelType
longitude2: ChannelType
opacity: ChannelType
order: ChannelType | list[Any]
radius: ChannelType
radius2: ChannelType
row: ChannelType
shape: ChannelType
size: ChannelType
stroke: ChannelType
strokeDash: ChannelType
strokeOpacity: ChannelType
strokeWidth: ChannelType
text: ChannelType
theta: ChannelType
theta2: ChannelType
tooltip: ChannelType | list[Any]
url: ChannelType
x: ChannelType
x2: ChannelType
xError: ChannelType
xError2: ChannelType
xOffset: ChannelType
y: ChannelType
y2: ChannelType
yError: ChannelType
yError2: ChannelType
yOffset: ChannelType
class Plot:
chart: alt.Chart
def __init__(self, df: nw.DataFrame) -> None:
import altair as alt
self.chart = alt.Chart(df)
def line(
self,
x: ChannelType | None = None,
y: ChannelType | None = None,
color: ChannelType | None = None,
order: ChannelType | list[Any] | None = None,
tooltip: ChannelType | list[Any] | None = None,
/,
**kwargs: Unpack[EncodeKwds],
) -> alt.Chart: ... Which checks out below. You can use def test_plot_typing() -> None:
from typing import cast
from typing_extensions import reveal_type
plot = cast(Plot, "test")
reveal_type(plot) # Type of "plot" is "Plot"
example_1 = plot.line(x="col 1")
reveal_type(example_1) # Type of "example_1" is "Chart"
example_2 = plot.line("col 1", "col 2")
reveal_type(example_2) # Type of "example_2" is "Chart"
example_err = plot.line("col 1", "col 2", x="col 3")
reveal_type(example_err) # Type of "example_err" is "Any" At least for VSCode, you get the expanded docs on hover: You could then repeat the |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17995 +/- ##
==========================================
- Coverage 79.80% 79.79% -0.01%
==========================================
Files 1497 1499 +2
Lines 200379 200464 +85
Branches 2841 2864 +23
==========================================
+ Hits 159913 159966 +53
- Misses 39941 39952 +11
- Partials 525 546 +21 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping! I'm of course already a big fan of this PR ;) Let me know if I can help!
I find @dangotbanned's suggestion regarding typing a reasonable compromise so that there are some useful type hints but using the Any
escape hatch instead of typing out all of them explicitly. However, if you fully want to mirror the type hints in altair with all altair-specific classes, I think we could expose those. Maybe something like altair.typing.XChannelType
, ...
Thanks both for comments! One slight hesitation I have about adding such a large class as |
So far we did not explicitly expose the types in Altair, even keeping many of them in private modules, as we first wanted to use them for a while before others rely on it. But I think now, also thanks to the recent improvements done by @dangotbanned, we could expose the most relevant ones. This could include a Any thoughts on this @dangotbanned? I think I could spend some time this weekend to think through which types we'd want to expose and how but I'd also very much appreciate your input and help if you want to. Maybe we can even get it into Altair 5.4 and release that in the next 1-2 weeks and so Polars 1.4 could have that as a minimum dependency and use the types. |
Fully agree on the autogenerated
Happy to discuss in an
@MarcoGorelli Maybe with fewer positional-args, but this was what I had in mind back in vega/altair#3452 (comment) Code blockfrom __future__ import annotations
import sys
from typing import Any, Generic, TypeVar
import narwhals.stable.v1 as nw
if sys.version_info >= (3, 12):
from typing import Protocol, runtime_checkable
else:
from typing_extensions import Protocol, runtime_checkable
T_Plot = TypeVar("T_Plot")
@runtime_checkable
class SupportsPlot(Generic[T_Plot], Protocol):
chart: T_Plot
def __init__(self, df: nw.DataFrame) -> None: ...
def bar(
self,
x: Any | None = None,
y: Any | None = None,
color: Any | None = None,
tooltip: Any | None = None,
/,
**kwargs: Any,
) -> T_Plot: ...
def line(
self,
x: Any | None = None,
y: Any | None = None,
color: Any | None = None,
order: Any | None = None,
tooltip: Any | None = None,
/,
**kwargs: Any,
) -> T_Plot: ... So on the It would also allow decisions like #17995 (comment) to be made in |
Ooh i like where things are going 😍
This sounds good, we just need to be careful to learn lessons from the pandas plotting backends and why altair-pandas was abandoned. I think it was probably because:
How would that work? In this PR we're essentially deferring the whole implementation to Altair - if you have time/interest, do you fancy opening a separate PR to show how it would work? If you'd like to talk things over (which may be a good idea if we're coordinating changes across projects, which is never easy), feel free to book some time at https://calendly.com/marcogorelli |
Implementation-wise, I cannot contribute much and while not involved I have been following historical developments in pandas and Vega-Altair from the sidelines:
|
@MarcoGorelli @mattijn Really appreciate your thorough responses, I'll do my homework reading up on all of your links and follow up For now, I can say I'd want to go in with the most minimal + simple definition for a
For import altair as alt
import polars as pl
df = pl.DataFrame()
...
chart = alt.Chart(df).mark_line().encode(...) # <---- |
TLDR: Simple idea got complex 👎Edit: Leaving this here for future reference, but no longer pushing for this path.
@MarcoGorelli I'm starting to think I've bitten off more than I can chew with this one 😞 I guess I'll run through some stuff, maybe it sparks an idea for someone else.
We've got lots of examples of this in However I don't think this is sufficient for the task, given that each backend would be returning vastly different objects. Probably a fair assumption that a user would be calling Digging through
AFAIK this would still rely on lots of library-specific code and some IR, which I was hoping to avoid. Something I hadn't seen, but thought could be explored is using library-specific stubs. Did some experimenting with Code block# hypothetical `.pyi`, located external to `polars`
# ruff: noqa: F401
import sys
import typing as t
import typing_extensions as te
from typing import Any, Generic, TypeVar
import narwhals.stable.v1 as nw
import polars as pl
import seaborn as sns
from matplotlib.axes import Axes
import altair as alt
if sys.version_info >= (3, 12):
from typing import Protocol, runtime_checkable
else:
from typing_extensions import Protocol, runtime_checkable
if t.TYPE_CHECKING:
import matplotlib as mpl
import seaborn.categorical as sns_c
from matplotlib.axes import Axes
ChannelType: te.TypeAlias = str | t.Mapping[str, Any] | Any
class EncodeKwds(te.TypedDict, total=False):
angle: ChannelType
color: ChannelType
column: ChannelType
description: ChannelType
detail: ChannelType | list[Any]
facet: ChannelType
fill: ChannelType
fillOpacity: ChannelType
href: ChannelType
key: ChannelType
latitude: ChannelType
latitude2: ChannelType
longitude: ChannelType
longitude2: ChannelType
opacity: ChannelType
order: ChannelType | list[Any]
radius: ChannelType
radius2: ChannelType
row: ChannelType
shape: ChannelType
size: ChannelType
stroke: ChannelType
strokeDash: ChannelType
strokeOpacity: ChannelType
strokeWidth: ChannelType
text: ChannelType
theta: ChannelType
theta2: ChannelType
tooltip: ChannelType | list[Any]
url: ChannelType
x: ChannelType
x2: ChannelType
xError: ChannelType
xError2: ChannelType
xOffset: ChannelType
y: ChannelType
y2: ChannelType
yError: ChannelType
yError2: ChannelType
yOffset: ChannelType
T = TypeVar("T")
@runtime_checkable
class SupportsPlot(Generic[T], Protocol):
backend: t.ClassVar[te.LiteralString]
chart: T
def __init__(self, df: nw.DataFrame, /) -> None: ...
def area(self, *args: Any, **kwargs: Any) -> T: ...
def bar(self, *args: Any, **kwargs: Any) -> T: ...
def line(self, *args: Any, **kwargs: Any) -> T: ...
def scatter(self, *args: Any, **kwargs: Any) -> T: ...
@runtime_checkable
class AltairPlot(SupportsPlot[alt.ChartType]):
backend: t.ClassVar[te.LiteralString] = "altair"
chart: T
def __init__(self, df: nw.DataFrame, /) -> None: ...
def area(
self,
x: ChannelType | None = None,
y: ChannelType | None = None,
color: ChannelType | None = None,
tooltip: ChannelType | list[Any] | None = None,
/,
**kwargs: te.Unpack[EncodeKwds],
) -> alt.ChartType: ...
def bar(
self,
x: ChannelType | None = None,
y: ChannelType | None = None,
color: ChannelType | None = None,
tooltip: ChannelType | list[Any] | None = None,
/,
**kwargs: te.Unpack[EncodeKwds],
) -> alt.ChartType: ...
def line(
self,
x: ChannelType | None = None,
y: ChannelType | None = None,
color: ChannelType | None = None,
order: ChannelType | list[Any] | None = None,
tooltip: ChannelType | list[Any] | None = None,
/,
**kwargs: te.Unpack[EncodeKwds],
) -> alt.ChartType: ...
def scatter(
self,
x: ChannelType | None = None,
y: ChannelType | None = None,
color: ChannelType | None = None,
size: ChannelType | None = None,
tooltip: ChannelType | list[Any] | None = None,
/,
**kwargs: te.Unpack[EncodeKwds],
) -> alt.ChartType: ...
@runtime_checkable
class SeabornPlot(SupportsPlot[Axes]):
backend: t.ClassVar[te.LiteralString] = "seaborn"
chart: T
def __init__(self, df: nw.DataFrame, /) -> None: ...
def area(
self,
*,
x: sns_c.ColumnName | sns_c._Vector | None = None,
y: sns_c.ColumnName | sns_c._Vector | None = None,
hue: sns_c.ColumnName | sns_c._Vector | None = None,
**kwargs: Any,
) -> Axes: ...
def bar(
self,
*,
x: sns_c.ColumnName | sns_c._Vector | None = None,
y: sns_c.ColumnName | sns_c._Vector | None = None,
hue: sns_c.ColumnName | sns_c._Vector | None = None,
**kwargs: Any,
) -> Axes: ...
def line(
self,
*,
x: sns_c.ColumnName | sns_c._Vector | None = None,
y: sns_c.ColumnName | sns_c._Vector | None = None,
hue: sns_c.ColumnName | sns_c._Vector | None = None,
**kwargs: Any,
) -> Axes: ...
def scatter(
self,
*,
x: sns_c.ColumnName | sns_c._Vector | None = None,
y: sns_c.ColumnName | sns_c._Vector | None = None,
hue: sns_c.ColumnName | sns_c._Vector | None = None,
**kwargs: Any,
) -> Axes: ... Not sure how you'd convince a type checker of which Final idea is to call in @max-muoto 👋 for thoughts on the soundness of any of the above. |
Thanks all for comments 🙏! I do like how issues such as this one bring different projects together 💯 Totally agree on not adding code to Altair which isn't directly useful to Altair itself. The only request I'd have is public types as mentioned in #17995 (comment), but even then, it's hardly essential Regarding customisability of results 🔧 : I'd say that if anyone wants fully customisable results, they should use Altair (or their favourite plotting lib) directly. The advantage of
Furthermore, having some built-in I think the fully-customisable backends part is becoming too complex too quickly. No other plotting library is close to (as far as I can tell) supporting Polars natively without extra heavy dependencies. I'd suggest to:
|
Absolutely agree @MarcoGorelli, will do my best to support you with this in |
You could provide a link to https://altair-viz.github.io/user_guide/customization.html#chart-themes in the docs, for users who simply want different (but consistent) defaults |
Very interesting reading through all the comments and links 😄 +1 on Marco's summary: expose some types publicly in Altair, wait with standardisation until other plotting libraries are being considered as well. I'll work on the public types soon.
I think this is something useful to consider early on! The default theme of Altair/Vega-Lite feels a bit dated for my taste but changing it in Altair should be well thought through and be part of a major release. In Polars, we'd have the opportunity to spruce it up a bit from the beginning. Personally, I use something close to https://gist.github.com/binste/b4042fa76a89d72d45cbbb9355ec6906 which only requires minimal modifications. Streamlit have their own theme as well enabled by default |
Cool to see this being implemented in Polars and an interesting discussion to follow! I would be inclined to agree with what @MarcoGorelli said regarding a fully-customisable backends becoming too complex too quickly and think it is a good idea to outsource any type of customization as much as possible. In addition to switching from |
FYI, Altair 5.4.0 is out now including the removal of the dependencies on numpy, pandas, and toolz + with a new |
# Calling `plot` the first time is slow | ||
# https://github.com/pola-rs/polars/issues/13500 | ||
pytestmark = pytest.mark.slow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
importing Altair is about 70 times faster* than importing hvplot, so I think we can remove this slow marker
*timed by performing time python -c 'import altair'
and time python -c 'import hvplot'
7 times each, and finding the ratio of the smallest "real time" results for both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @MarcoGorelli and @altair team. Really great effort and great addition. 🙌
One advantage of hvplot / bokeh is the free interactivity. |
thanks @v1gnesh - coming soon 😉 vega/altair#3394 In the meantime, if you'd like to keep using hvplot, you can just add |
@v1gnesh what type of interactivity is important to you? For example tooltips, panning/zooming, brushes, etc |
@dwootton The collection of functions in the side, ex: zoom, reset zoom, selection tool, hand tool. |
@v1gnesh Just a heads up that you can already achieve zoom, reset zoom, and panning ("hand tool" in hvplot) in altair via the When you say "free interactivity", do you mean that you would like this to be the default behavior without having to type |
@joelostblom Thanks for the links. Yup, whenever it makes sense for it to be the default, at least I would prefer interactive becoming the default. |
The new default plots from Altair have reduced the interactivity of plots. I've gone back to using Was this a premature move considering Altair is missing essential features:
Was the main intent to deliver a basic plotting experience without adding many dependencies? |
that's totally fair - it's easy to go back to hvplot if that works for you
having said that - stay tuned, more developments may be on their way 👀 😉 |
@mjmdavis Thanks for the feedback! While box zoom is not available (vega/vega-lite#4742), you should be able to hover a data point to show additional info in a tooltip as per #18625. If that's not what you mean, could you elaborate on what you expect to happens when hovering? I'm also curious exactly what you are referring to with "resize plot", do you mean something like dragging in the corner of the plot to resize it? You are currently able to resize plots with e..g |
So, my use case is mostly data exploration in jupyter notebooks. For this, I've become quite fond of ipympl. My biggest probelm there is that it can be tricky to get it to work with different kernels and there's frequently a 10 minute dance to get things working. The default plot however has the basic zoom to selection functionality that is very useful when dealing with complex signals. And it's convenient to not have to re-run code to change the size of the plot as you resize your screen. Vega-Lite and hvplot definitely benefit from being able to show plots in saved notebooks! There are a lot of considerations here so it's hard to please everyone. My 2c is that it's nice to be able to do some quick GUI based exploration when plotting with default settings. |
Some context behind this: since vega/altair#3452, Altair support Polars natively, without any extra heavy dependencies (no pandas, no NumPy, no PyArrow). Altair is a very popular and widely used library, with excellent docs and static typing - hence, I think it'd be best suited as Polars' default plotting backend
DataFrame.plot
was marked as "unstable" so this change can technically be made in Polars1.4.01.5.0. What I've implemented here is a very thin layer on top of Altair, so it should be both convenient to users and easy to maintainFor existing users wishing to preserve HvPlot plots, all they need to do is apply the diff
So, the impact on users should be fairly small
HvPlot maintainers have been extra-friendly have helpful (especially with answering user questions in Discord). I think it'd be good to still mention them in the docstring (and also to help users for whom this represents an API change), and recommend their library in the "visualisation" section of the user guide
Demo
DataFrame (here
source
is apolars.DataFrame
):Series plots work too:
Tab-complete works well, making this well-suited to EDA:
TODO
line
andpoint
, so users get good tab completionfigure out static typingdoneF.A.Q.: what about other plotting backends?
Maybe, in the future, the plotting backend could be configurable in
pl.Config
. But I think that's an orthogonal issue and can be done/discussed separately. Plotting will stay "unstable" for the time being