New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

docs: Improved docs on Transforms #2655

Open

tempdata73 wants to merge 5 commits into vega:main from tempdata73:improve-agg-doc

Contributor

tempdata73 commented Jul 10, 2022

Notes:

I didn't use vl's example of using argmax on the movies dataset because this whole page uses the cars dataset as a guide. I felt like using the former would break the flow of thought. Nonetheless, I can happily incorporate it if you guys prefer that one.

Sorry for messing up the other pull request (#2654), but I finally fixed my commits and branches.


          improved documentation on agg funcs

7ceec5a

Contributor

mattijn commented Jul 11, 2022

Thanks for the PR! No problem of messing up the commits. Me or @joelostblom will do a review somewhere in coming days.

betaigeuze mentioned this pull request

improve documentation on aggregation #2645

Open

dangotbanned linked an issue

that may be closed by this pull request

improve documentation on aggregation #2645

Open

dangotbanned requested review from joelostblom and mattijn

December 23, 2024 13:27

Member

dangotbanned commented Dec 23, 2024 •

edited

Loading

Thanks for the PR! No problem of messing up the commits. Me or @joelostblom will do a review somewhere in coming days.

@mattijn, @joelostblom I'm combing through old issues and came across this PR that apparently closes #2645

Obviously we'd need to get this branch up-to-date, but I wanted to check-in to see if this had actually resolved #2645?

Update

I think I've got this conflict-free now with main 😌

dangotbanned changed the title ~~Improved docs on Transforms~~ docs: Improved docs on Transforms

dangotbanned added the documentation label

dangotbanned added 4 commits

December 23, 2024 19:05


          chore: copy encoding.rst rename

cb79d5d

From vega@dfb11f5#diff-3f8dbb48ec3017cd5b2722c66cd989b66c7832d8627e30474a20b0b6048f192b


          Merge remote-tracking branch 'upstream/main' into pr/tempdata73/2655

7f82821


          fix: apply changes on top of main

50ad1a5

Previous merge was super messy, due to 2 year old PR


          revert: Undo removal of trailing comma

aa6b486

dangotbanned requested a review from dsmedia

December 23, 2024 19:46

Member

dangotbanned commented Dec 23, 2024

@dsmedia don't feel obligated to review this, just curious if you had any thoughts - since you've done a few doc PRs before?

Contributor

dsmedia commented Dec 23, 2024

@dsmedia don't feel obligated to review this, just curious if you had any thoughts - since you've done a few doc PRs before?

Sure. Will have a look this evening.

dsmedia reviewed

View reviewed changes

Contributor

dsmedia left a comment

Great doc additions! I've made some recommendations / edits here for consideration.

doc/user_guide/transform/aggregate.rst

Comment on lines +80 to +81

		Note: As mentioned in :doc:`../data`, this approach of transforming the
		data with Pandas is preferable if we already have the DataFrame at hand.

Contributor

dsmedia Dec 24, 2024

Consider 1) being more explicit about what exactly is meant by the term "at hand" and 2) being upfront in this sentence about the reason or reasons for Pandas transformations being preferable when the DataFrame is "at hand" (automatic type inference? something else also?)

Also, this suggests that data.html discusses these benefits of when a Pandas transformation is preferable, but it wasn't immediately obvious which part of this section of the docs it is referring to.

doc/user_guide/transform/aggregate.rst

Comment on lines +94 to +96

+              It is possible for aggregate functions to not
+              have an argument. In this case, aggregation will be performed on the column
+              used in the other axis.

Contributor

dsmedia Dec 24, 2024

Suggested change

      
            It is possible for aggregate functions to not
          
            have an argument. In this case, aggregation will be performed on the column
          
            used in the other axis.
          
            Aggregate functions can be used without arguments. 
          
            In such cases, the function will automatically aggregate 
          
            the data from the column specified in the other axis.```

doc/user_guide/transform/aggregate.rst

+              :code:`missing`, :code:`distinct` and :code:`valid`) are the ones that get
+              the most out of this feature.
+              Argmin / Argmax

Contributor

dsmedia Dec 24, 2024

Suggested change

      
            Argmin / Argmax
          
            Argmin and Argmax Functions

doc/user_guide/transform/aggregate.rst

Comment on lines +119 to +147

+              Both :code:`argmin` and :code:`argmax` aggregate functions can only be used
+              with the :meth:`~Chart.transform_aggregate` method. Trying to use their
+              respective shorthand notations will result in an error. This is due to the fact
+              that either :code:`argmin` or :code:`argmax` functions return an object, not
+              values.  This object then specifies the values to be selected from other
+              columns when encoding.  One can think of the returned object as being a
+              dictionary, while the column serves the purpose of being a key, which then
+              obtains its respective value.
+              The true value of these functions is appreciated when we want to compare the
+              most **distinctive** samples from two sets of data with respect to another set
+              of data.
+              As an example, suppose we want to compare the weight of the strongest cars,
+              with respect to their country/region of origin. This can be done using
+              :code:`argmax`:
+              .. altair-plot::
+                 alt.Chart(cars).mark_bar().encode(
+                    x='greatest_hp[Weight_in_lbs]:Q',
+                    y='Origin:N'
+                 ).transform_aggregate(
+                    greatest_hp='argmax(Horsepower)',
+                    groupby=['Origin']
+                 )
+              It is clear that Japan's strongest car is also the lightest, while that of USA
+              is the heaviest.

Contributor

dsmedia Dec 24, 2024

Suggested change

      
            Both :code:`argmin` and :code:`argmax` aggregate functions can only be used
          
            with the :meth:`~Chart.transform_aggregate` method. Trying to use their
          
            respective shorthand notations will result in an error. This is due to the fact
          
            that either :code:`argmin` or :code:`argmax` functions return an object, not
          
            values.  This object then specifies the values to be selected from other
          
            columns when encoding.  One can think of the returned object as being a
          
            dictionary, while the column serves the purpose of being a key, which then
          
            obtains its respective value.
          
            The true value of these functions is appreciated when we want to compare the
          
            most **distinctive** samples from two sets of data with respect to another set
          
            of data.
          
            As an example, suppose we want to compare the weight of the strongest cars,
          
            with respect to their country/region of origin. This can be done using
          
            :code:`argmax`:
          
            .. altair-plot::
          
               alt.Chart(cars).mark_bar().encode(
          
                  x='greatest_hp[Weight_in_lbs]:Q',
          
                  y='Origin:N'
          
               ).transform_aggregate(
          
                  greatest_hp='argmax(Horsepower)',
          
                  groupby=['Origin']
          
               )
          
            It is clear that Japan's strongest car is also the lightest, while that of USA
          
            is the heaviest.
          
            The :code:`argmin` and :code:`argmax` functions help you find values from
          
            one field that correspond to the minimum or maximum values in another
          
            field. For example, you might want to find the production budget of
          
            movies that earned the highest gross revenue in each genre.
          
            These functions must be used with the :meth:`~Chart.transform_aggregate`
          
            method rather than their shorthand notations. They return objects that act
          
            as selectors for values in other columns, rather than returning values
          
            directly. You can think of the returned object as a dictionary where the
          
            column serves as a key to retrieve corresponding values.
          
            To illustrate this, let's compare the weights of cars with the highest
          
            horsepower across different regions of origin:
          
            .. altair-plot::
          
               alt.Chart(cars).mark_bar().encode(
          
                  x='greatest_hp[Weight_in_lbs]:Q',
          
                  y='Origin:N'
          
               ).transform_aggregate(
          
                  greatest_hp='argmax(Horsepower)',
          
                  groupby=['Origin']
          
               )
          
            This visualization reveals an interesting contrast: among cars with the
          
            highest horsepower in their respective regions, Japanese cars are notably
          
            lighter, while American cars are substantially heavier.

doc/user_guide/transform/aggregate.rst

+              argmin     An input data object containing the minimum field value.                     N/A
+              argmax     An input data object containing the maximum field value.                     :ref:`gallery_line_chart_with_custom_legend`
+              average    The mean (average) field value. Identical to mean.                           :ref:`gallery_layer_line_color_rule`
+              count      The total count of data objects in the group.                                :ref:`gallery_simple_heatmap`

Contributor

dsmedia Dec 24, 2024

Vega-Lite docs also state

Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field.

Just mentioning in case it's worth adding here as well?

doc/user_guide/transform/aggregate.rst

Comment on lines +171 to +173

+              =========  ===========================================================================  =====================================
+              Aggregate  Description                                                                  Example
+              =========  ===========================================================================  =====================================

Contributor

dsmedia Dec 24, 2024

The vega-lite docs appear to list these in a more logical (if implicit) order, starting with count-related functions (including count, valid, values, missing, and distinct), moving to basic mathematical operations (sum, product), then to central tendency measures (mean/average, variance/variancep, stdev/stdevp, stderr, median), followed by distribution statistics (q1, q3, ci0, ci1), and finally ending with range functions (min/argmin, max/argmax). The ordering here appears to be in alphabetial order, though it's not strictly so (e.g. ci01). I would have a slight preference for the vega-lite-style functional organization scheme (and with explicit headings for the categories).

doc/user_guide/transform/aggregate.rst

    
            @@ -8,7 +8,7 @@ There are two ways to aggregate data within Altair: within the encoding itself,
          
              or using a top level aggregate transform.

              The aggregate property of a field definition can be used to compute aggregate

              summary statistics (e.g., median, min, max) over groups of data.

              summary statistics (e.g., :code:`median`, :code:`min`, :code:`max`) over groups of data.

              If at least one fields in the specified encoding channels contain aggregate,

Contributor

dsmedia Dec 24, 2024 •

edited

Loading

Re: the sentence beginning, "If at least one fields..." --> I think this sentence could be rewritten while we're at it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels