Skip to content

Commit

Permalink
Merge pull request #41 from alk-lbinet/doc-d
Browse files Browse the repository at this point in the history
Docs update
  • Loading branch information
alk-lbinet authored Jun 29, 2020
2 parents 8ca896c + 64b0c1d commit da42e1e
Show file tree
Hide file tree
Showing 15 changed files with 1,222 additions and 991 deletions.
11 changes: 0 additions & 11 deletions docs/source/advanced-usage.rst

This file was deleted.

4 changes: 1 addition & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ pandagg

introduction
user-guide
advanced-usage
Tutorial dataset <IMDB>
API reference <reference/pandagg>
Contributing <CONTRIBUTING>
Expand Down Expand Up @@ -43,8 +42,7 @@ Alternatively, you can grab the latest source code from `GitHub <https://github.
Usage
*****

The :doc:`user-guide` is the place to go to learn how to use the library and
accomplish common tasks. The more in-depth :doc:`advanced-usage` guide is the place to go for deeply nested queries.
The :doc:`user-guide` is the place to go to learn how to use the library.

An example based on publicly available IMDB data is documented in repository `examples/imdb` directory, with
a jupyter notebook to showcase some of `pandagg` functionalities: `here it is <https://gistpreview.github.io/?4cedcfe49660cd6757b94ba491abb95a>`_.
Expand Down
4 changes: 0 additions & 4 deletions docs/source/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@
Principles
##########

.. note::

This is a work in progress. Some sections still need to be furnished.


This library focuses on two principles:

Expand Down
122 changes: 122 additions & 0 deletions docs/source/user-guide.aggs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
***********
Aggregation
***********

The :class:`~pandagg.tree.aggs.aggs.Aggs` class provides :

- multiple syntaxes to declare and udpate a aggregation
- aggregation clause validation
- ability to insert clauses at specific locations (and not just below last manipulated clause)


Declaration
===========

From native "dict" query
------------------------

Given the following aggregation:

>>> expected_aggs = {
>>> "decade": {
>>> "histogram": {"field": "year", "interval": 10},
>>> "aggs": {
>>> "genres": {
>>> "terms": {"field": "genres", "size": 3},
>>> "aggs": {
>>> "max_nb_roles": {
>>> "max": {"field": "nb_roles"}
>>> },
>>> "avg_rank": {
>>> "avg": {"field": "rank"}
>>> }
>>> }
>>> }
>>> }
>>> }
>>> }

To declare :class:`~pandagg.tree.aggs.aggs.Aggs`, simply pass "dict" query as argument:

>>> from pandagg.aggs import Aggs
>>> a = Aggs(expected_aggs)

A visual representation of the query is available with :func:`~pandagg.tree.aggs.aggs.Aggs.show`:

>>> a.show()
<Aggregations>
decade <histogram, field="year", interval=10>
└── genres <terms, field="genres", size=3>
├── max_nb_roles <max, field="nb_roles">
└── avg_rank <avg, field="rank">


Call :func:`~pandagg.tree.aggs.aggs.Aggs.to_dict` to convert it to native dict:

>>> a.to_dict() == expected_aggs
True

With DSL classes
----------------

Pandagg provides a DSL to declare this query in a quite similar fashion:

>>> from pandagg.aggs import Histogram, Terms, Max, Avg
>>>
>>> a = Histogram("decade", field='year', interval=10, aggs=[
>>> Terms("genres", field="genres", size=3, aggs=[
>>> Max("max_nb_roles", field="nb_roles"),
>>> Avg("avg_rank", field="range")
>>> ]),
>>> ])

All these classes inherit from :class:`~pandagg.tree.aggs.aggs.Aggs` and thus provide the same interface.

>>> from pandagg.aggs import Aggs
>>> isinstance(a, Aggs)
True

With flattened syntax
---------------------

In the flattened syntax, the first argument is the aggregation name, the second argument is the aggregation type, the
following keyword arguments define the aggregation body:

>>> from pandagg.query import Aggs
>>> a = Aggs('genres', 'terms', size=3)
>>> a.to_dict()
{'genres': {'terms': {'field': 'genres', 'size': 3}}}


Aggregations enrichment
=======================

Aggregations can be enriched using two methods:

- :func:`~pandagg.tree.aggs.aggs.Aggs.aggs`
- :func:`~pandagg.tree.aggs.aggs.Aggs.groupby`

Both methods return a new :class:`~pandagg.tree.aggs.aggs.Aggs` instance, and keep unchanged the initial Aggregation.

For instance:

>>> from pandagg.aggs import Aggs
>>> initial_a = Aggs()
>>> enriched_a = initial_a.aggs('genres_agg', 'terms', field='genres')

>>> initial_q.to_dict()
None

>>> enriched_q.to_dict()
{'genres_agg': {'terms': {'field': 'genres'}}}

.. note::

Calling :func:`~pandagg.tree.aggs.aggs.Aggs.to_dict` on an empty Aggregation returns `None`

>>> from pandagg.aggs import Aggs
>>> Aggs().to_dict()
None


TODO
Loading

0 comments on commit da42e1e

Please sign in to comment.