diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 25ef7f533..71e24fa96 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -40,8 +40,8 @@ jobs: if: ${{ matrix.python-version == '3.8' }} run: | pip uninstall -y pyarrow vegafusion vegafusion-python-embed - - name: Maybe install lowest supported Pandas version - # We install the lowest supported Pandas version for one job to test that + - name: Maybe install lowest supported pandas version + # We install the lowest supported pandas version for one job to test that # it still works. Downgrade to the oldest versions of pandas and numpy that include # Python 3.8 wheels, so only run this job for Python 3.8 if: ${{ matrix.python-version == '3.8' }} diff --git a/altair/utils/core.py b/altair/utils/core.py index f365ce962..28601db3c 100644 --- a/altair/utils/core.py +++ b/altair/utils/core.py @@ -343,7 +343,7 @@ def to_list_if_array(val): if dtype_name == "category": # Work around bug in to_json for categorical types in older versions # of pandas as they do not properly convert NaN values to null in to_json. - # We can probably remove this part once we require Pandas >= 1.0 + # We can probably remove this part once we require pandas >= 1.0 col = df[col_name].astype(object) df[col_name] = col.where(col.notnull(), None) elif dtype_name == "string": diff --git a/doc/case_studies/exploring-weather.rst b/doc/case_studies/exploring-weather.rst index b041ba212..bc9776787 100644 --- a/doc/case_studies/exploring-weather.rst +++ b/doc/case_studies/exploring-weather.rst @@ -17,7 +17,7 @@ The dataset is a CSV file with columns for the temperature wind speed (in meter/second), and weather type. We have one row for each day from January 1st, 2012 to December 31st, 2015. -Altair is designed to work with data in the form of Pandas_ +Altair is designed to work with data in the form of pandas_ dataframes, and contains a loader for this and other built-in datasets: .. altair-plot:: @@ -28,7 +28,7 @@ dataframes, and contains a loader for this and other built-in datasets: df = data.seattle_weather() df.head() -The data is loaded from the web and stored in a Pandas DataFrame, and from +The data is loaded from the web and stored in a pandas DataFrame, and from here we can explore it with Altair. Let’s start by looking at the precipitation, using tick marks to see the @@ -135,7 +135,7 @@ Note that this calculation doesn't actually do any data manipulation in Python, but rather encodes and stores the operations within the plot specification, where they will be calculated by the renderer. -Of course, the same calculation could be done by using Pandas manipulations to +Of course, the same calculation could be done by using pandas manipulations to explicitly add a column to the dataframe; the disadvantage there is that the derived values would have to be stored in the plot specification rather than computed on-demand in the browser. @@ -265,4 +265,4 @@ You can find more visualizations in the :ref:`example-gallery`. If you want to further customize your charts, you can refer to Altair's :ref:`api`. -.. _Pandas: http://pandas.pydata.org/ +.. _pandas: http://pandas.pydata.org/ diff --git a/doc/getting_started/project_philosophy.rst b/doc/getting_started/project_philosophy.rst index 7681df893..1457c9cf1 100644 --- a/doc/getting_started/project_philosophy.rst +++ b/doc/getting_started/project_philosophy.rst @@ -8,7 +8,7 @@ Many excellent plotting libraries exist in Python, including: * `Seaborn `_ * `Lightning `_ * `Plotly `_ -* `Pandas built-in plotting `_ +* `pandas built-in plotting `_ * `HoloViews `_ * `VisPy `_ * `pygg `_ diff --git a/doc/getting_started/starting.rst b/doc/getting_started/starting.rst index 61fc80147..a9b4ba470 100644 --- a/doc/getting_started/starting.rst +++ b/doc/getting_started/starting.rst @@ -29,10 +29,10 @@ Here is the outline of this basic tutorial: The Data -------- -Data in Altair is built around the Pandas Dataframe. One of the defining +Data in Altair is built around the pandas Dataframe. One of the defining characteristics of statistical visualization is that it begins with `tidy `_ -Dataframes. For the purposes of this tutorial, we'll start by importing Pandas +Dataframes. For the purposes of this tutorial, we'll start by importing pandas and creating a simple DataFrame to visualize, with a categorical variable in column a and a numerical variable in column b: diff --git a/doc/releases/changes.rst b/doc/releases/changes.rst index 7bbb5abe2..db4ad5c82 100644 --- a/doc/releases/changes.rst +++ b/doc/releases/changes.rst @@ -23,7 +23,7 @@ Version 5.1.2 (released Oct 3, 2023) Bug Fixes ~~~~~~~~~ -- Remove usage of deprecated Pandas parameter ``convert_dtypes`` (#3191) +- Remove usage of deprecated pandas parameter ``convert_dtypes`` (#3191) - Fix encoding type inference for boolean columns when pyarrow is installed (#3210) Version 5.1.1 (released August 30, 2023) diff --git a/doc/user_guide/data.rst b/doc/user_guide/data.rst index 6958f3f62..48a09db29 100644 --- a/doc/user_guide/data.rst +++ b/doc/user_guide/data.rst @@ -15,7 +15,7 @@ and :class:`FacetChart`) accepts a dataset as its first argument. There are many different ways of specifying a dataset: -- as a `Pandas DataFrame `_ +- as a `pandas DataFrame `_ - as a DataFrame that supports the DataFrame Interchange Protocol (contains a ``__dataframe__`` attribute), e.g. polars and pyarrow. This is experimental. - as a :class:`Data` or related object (i.e. :class:`UrlData`, :class:`InlineData`, :class:`NamedData`) - as a url string pointing to a ``json`` or ``csv`` formatted text file @@ -81,7 +81,7 @@ Similarly, we must also specify the data type when referencing data by URL: Encodings and their associated types are further discussed in :ref:`user-guide-encoding`. Below we go into more detail about the different ways of specifying data in an Altair chart. -Pandas DataFrame +pandas DataFrame ~~~~~~~~~~~~~~~~ .. _data-in-index: @@ -102,7 +102,7 @@ At times, relevant data appears in the index. For example: data.head() If you would like the index to be available to the chart, you can explicitly -turn it into a column using the ``reset_index()`` method of Pandas dataframes: +turn it into a column using the ``reset_index()`` method of pandas dataframes: .. altair-plot:: @@ -114,7 +114,7 @@ turn it into a column using the ``reset_index()`` method of Pandas dataframes: If the index object does not have a ``name`` attribute set, the resulting column will be called ``"index"``. More information is available in the -`Pandas documentation `_. +`pandas documentation `_. .. _data-long-vs-wide: @@ -193,11 +193,11 @@ step within the chart itself. We will detail to two approaches below. .. _data-converting-long-form: -Converting with Pandas +Converting with pandas """""""""""""""""""""" -This sort of data manipulation can be done as a preprocessing step using Pandas_, +This sort of data manipulation can be done as a preprocessing step using pandas_, and is discussed in detail in the `Reshaping and Pivot Tables`_ section of the -Pandas documentation. +pandas documentation. For converting wide-form data to the long-form data used by Altair, the ``melt`` method of dataframes can be used. The first argument to ``melt`` is the column @@ -210,7 +210,7 @@ be optionally specified: wide_form.melt('Date', var_name='company', value_name='price') -For more information on the ``melt`` method, see the `Pandas melt documentation`_. +For more information on the ``melt`` method, see the `pandas melt documentation`_. In case you would like to undo this operation and convert from long-form back to wide-form, the ``pivot`` method of dataframes is useful. @@ -220,7 +220,7 @@ to wide-form, the ``pivot`` method of dataframes is useful. long_form.pivot(index='Date', columns='company', values='price').reset_index() -For more information on the ``pivot`` method, see the `Pandas pivot documentation`_. +For more information on the ``pivot`` method, see the `pandas pivot documentation`_. Converting with Fold Transform """""""""""""""""""""""""""""" @@ -307,9 +307,9 @@ created using Altair's :func:`sphere` generator function. Here is an example: alt.layer(background, lines).project('naturalEarth1') -.. _Pandas: http://pandas.pydata.org/ -.. _Pandas pivot documentation: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html -.. _Pandas melt documentation: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.melt.html#pandas.DataFrame.melt +.. _pandas: http://pandas.pydata.org/ +.. _pandas pivot documentation: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html +.. _pandas melt documentation: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.melt.html#pandas.DataFrame.melt .. _Reshaping and Pivot Tables: https://pandas.pydata.org/pandas-docs/stable/reshaping.html diff --git a/doc/user_guide/data_transformers.rst b/doc/user_guide/data_transformers.rst index 44a569974..a4820cf5e 100644 --- a/doc/user_guide/data_transformers.rst +++ b/doc/user_guide/data_transformers.rst @@ -6,7 +6,7 @@ Data Transformers Before a Vega-Lite or Vega specification can be passed to a renderer, it typically has to be transformed in a number of ways: -* Pandas Dataframe has to be sanitized and serialized to JSON. +* pandas Dataframe has to be sanitized and serialized to JSON. * The rows of a Dataframe might need to be sampled or limited to a maximum number. * The Dataframe might be written to a ``.csv`` of ``.json`` file for performance reasons. @@ -19,7 +19,7 @@ These data transformations are managed by the data transformation API of Altair. API of Vega and Vega-Lite. A data transformer is a Python function that takes a Vega-Lite data ``dict`` or -Pandas ``DataFrame`` and returns a transformed version of either of these types:: +pandas ``DataFrame`` and returns a transformed version of either of these types:: from typing import Union Data = Union[dict, pd.DataFrame] @@ -30,7 +30,7 @@ Pandas ``DataFrame`` and returns a transformed version of either of these types: Dataset Consolidation ~~~~~~~~~~~~~~~~~~~~~ -Datasets passed as Pandas dataframes can be represented in the chart in two +Datasets passed as pandas dataframes can be represented in the chart in two ways: - As literal dataset values in the ``data`` attribute at any level of the diff --git a/doc/user_guide/encodings/index.rst b/doc/user_guide/encodings/index.rst index 56db07138..62bcc51a8 100644 --- a/doc/user_guide/encodings/index.rst +++ b/doc/user_guide/encodings/index.rst @@ -279,7 +279,7 @@ in some data structures. The recommended thing to do when you have special characters in a column name is to rename your columns. -For example, in Pandas you could replace ``:`` with ``_`` +For example, in pandas you could replace ``:`` with ``_`` via ``df.rename(columns = lambda x: x.replace(':', '_'))``. If you don't want to rename your columns you will need to escape the special characters using a backslash: diff --git a/doc/user_guide/internals.rst b/doc/user_guide/internals.rst index 831c29cc3..c5773dad2 100644 --- a/doc/user_guide/internals.rst +++ b/doc/user_guide/internals.rst @@ -195,7 +195,7 @@ you can use the :meth:`~Chart.from_dict` method to construct the chart object: With a bit more effort and some judicious copying and pasting, we can manually convert this into more idiomatic Altair code for the same chart, -including constructing a Pandas dataframe from the data values: +including constructing a pandas dataframe from the data values: .. altair-plot:: diff --git a/doc/user_guide/large_datasets.rst b/doc/user_guide/large_datasets.rst index 2f21934ae..18376da41 100644 --- a/doc/user_guide/large_datasets.rst +++ b/doc/user_guide/large_datasets.rst @@ -290,10 +290,10 @@ whereas `vl-convert`_ is expected to provide the better performance. .. _preaggregate-and-filter: -Preaggregate and Filter in Pandas +Preaggregate and Filter in pandas ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Another common approach is to perform data transformations such as aggregations -and filters using Pandas before passing the data to Altair. +and filters using pandas before passing the data to Altair. For example, to create a bar chart for the ``barley`` dataset summing up ``yield`` grouped by ``site``, it is convenient to pass the unaggregated data to Altair: @@ -322,7 +322,7 @@ only the necessary columns: y=alt.Y("site:N").sort("-x") ) -You could also precalculate the sum in Pandas which would reduce the size of the dataset even more: +You could also precalculate the sum in pandas which would reduce the size of the dataset even more: .. altair-plot:: @@ -357,7 +357,7 @@ in Altair. color=alt.Color("Origin").legend(None) ) -If you have a lot of data, you can perform the necessary calculations in Pandas and only +If you have a lot of data, you can perform the necessary calculations in pandas and only pass the resulting summary statistics to Altair. First, let's define a few parameters where ``k`` stands for the multiplier which is used diff --git a/doc/user_guide/times_and_dates.rst b/doc/user_guide/times_and_dates.rst index 066e4d032..c1db1f526 100644 --- a/doc/user_guide/times_and_dates.rst +++ b/doc/user_guide/times_and_dates.rst @@ -13,11 +13,11 @@ Altair and Vega-Lite do their best to ensure that dates are interpreted and visualized in a consistent way. -Altair and Pandas Datetimes +Altair and pandas Datetimes --------------------------- -Altair is designed to work best with `Pandas timeseries`_. A standard -timezone-agnostic date/time column in a Pandas dataframe will be both +Altair is designed to work best with `pandas timeseries`_. A standard +timezone-agnostic date/time column in a pandas dataframe will be both interpreted and displayed as local user time. For example, here is a dataset containing hourly temperatures measured in Seattle: @@ -91,7 +91,7 @@ time of the browser that does the rendering. If you would like your dates to instead be time-zone aware, you can set the timezone explicitly in the input dataframe. Since Seattle is in the -``US/Pacific`` timezone, we can localize the timestamps in Pandas as follows: +``US/Pacific`` timezone, we can localize the timestamps in pandas as follows: .. altair-plot:: :output: repr @@ -141,7 +141,7 @@ regardless of the system location: To make your charts as portable as possible (even in non-ES6 browsers which parse timezone-agnostic times as UTC), you can explicitly work -in UTC time, both on the Pandas side and on the Vega-Lite side: +in UTC time, both on the pandas side and on the Vega-Lite side: .. altair-plot:: @@ -155,7 +155,7 @@ in UTC time, both on the Pandas side and on the Vega-Lite side: ) This is somewhat less convenient than the default behavior for timezone-agnostic -dates, in which both Pandas and Vega-Lite assume times are local +dates, in which both pandas and Vega-Lite assume times are local (except in non-ES6 browsers; see :ref:`note-browser-compliance`), but it gets around browser incompatibilities by explicitly working in UTC, which gives similar results even in older browsers. @@ -223,5 +223,5 @@ it is ES6-compliant or because your computer locale happens to be set to the UTC+0 (GMT) timezone. .. _Coordinated Universal Time (UTC): https://en.wikipedia.org/wiki/Coordinated_Universal_Time -.. _Pandas timeseries: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html +.. _pandas timeseries: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html .. _ECMAScript 6: http://www.ecma-international.org/ecma-262/6.0/ diff --git a/doc/user_guide/transform/index.rst b/doc/user_guide/transform/index.rst index 8b197ea17..541922e4d 100644 --- a/doc/user_guide/transform/index.rst +++ b/doc/user_guide/transform/index.rst @@ -7,12 +7,12 @@ Data Transformations It is often necessary to transform or filter data in the process of visualizing it. In Altair you can do this one of two ways: -1. Before the chart definition, using standard Pandas data transformations. +1. Before the chart definition, using standard pandas data transformations. 2. Within the chart definition, using Vega-Lite's data transformation tools. In most cases, we suggest that you use the first approach, because it is more straightforward to those who are familiar with data manipulation in Python, and -because the Pandas package offers much more flexibility than Vega-Lite in +because the pandas package offers much more flexibility than Vega-Lite in available data manipulations. The second approach becomes useful when the data source is not a dataframe, but, diff --git a/doc/user_guide/transform/lookup.rst b/doc/user_guide/transform/lookup.rst index 9337da64b..ab7bb550f 100644 --- a/doc/user_guide/transform/lookup.rst +++ b/doc/user_guide/transform/lookup.rst @@ -47,12 +47,12 @@ We know how to visualize each of these datasets separately; for example: If we would like to plot features that reference both datasets (for example, the average age within each group), we need to combine the two datasets. This can be done either as a data preprocessing step, using tools available -in Pandas, or as part of the visualization using a :class:`~LookupTransform` +in pandas, or as part of the visualization using a :class:`~LookupTransform` in Altair. Combining Datasets with pandas.merge ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Pandas provides a wide range of tools for merging and joining datasets; see +pandas provides a wide range of tools for merging and joining datasets; see `Merge, Join, and Concatenate `_ for some detailed examples. For the above data, we can merge the data and create a combined chart as follows: @@ -76,7 +76,7 @@ Combining Datasets with a Lookup Transform ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For some data sources (e.g. data available at a URL, or data that is streaming), it is desirable to have a means of joining data without having to download -it for pre-processing in Pandas. +it for pre-processing in pandas. This is where Altair's :meth:`~Chart.transform_lookup` comes in. To reproduce the above combined plot by combining datasets within the chart specification itself, we can do the following: