Skip to content

Commit

Permalink
Add note to documentation of functions defining nunique()
Browse files Browse the repository at this point in the history
  • Loading branch information
piconti committed May 31, 2024
1 parent 5b2d3f9 commit e7e56e3
Show file tree
Hide file tree
Showing 5 changed files with 19 additions and 7 deletions.
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/versioning.doctree
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/_build/html/searchindex.js

Large diffs are not rendered by default.

9 changes: 6 additions & 3 deletions docs/_build/html/versioning.html
Original file line number Diff line number Diff line change
Expand Up @@ -1073,13 +1073,15 @@ <h1>Data Versioning<a class="headerlink" href="#data-versioning" title="Link to
<dl class="py function">
<dt class="sig sig-object py" id="impresso_commons.versioning.helpers.agg">
<span class="sig-prename descclassname"><span class="pre">impresso_commons.versioning.helpers.</span></span><span class="sig-name descname"><span class="pre">agg</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">s</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#impresso_commons.versioning.helpers.agg" title="Link to this definition"></a></dt>
<dd><p>The function which will aggregate the result from all the partitions (reduce).</p>
<dd><p>The function which will aggregate the result from all the partitions (reduce).
Part of the ggregating function(s) implementing np.nunique()</p>
</dd></dl>

<dl class="py function">
<dt class="sig sig-object py" id="impresso_commons.versioning.helpers.chunk">
<span class="sig-prename descclassname"><span class="pre">impresso_commons.versioning.helpers.</span></span><span class="sig-name descname"><span class="pre">chunk</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">s</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#impresso_commons.versioning.helpers.chunk" title="Link to this definition"></a></dt>
<dd><p>The function applied to the individual partition (map).</p>
<dd><p>The function applied to the individual partition (map).
Part of the ggregating function(s) implementing np.nunique()</p>
</dd></dl>

<dl class="py function">
Expand Down Expand Up @@ -1311,7 +1313,8 @@ <h1>Data Versioning<a class="headerlink" href="#data-versioning" title="Link to
<dl class="py function">
<dt class="sig sig-object py" id="impresso_commons.versioning.helpers.finalize">
<span class="sig-prename descclassname"><span class="pre">impresso_commons.versioning.helpers.</span></span><span class="sig-name descname"><span class="pre">finalize</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">s</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#impresso_commons.versioning.helpers.finalize" title="Link to this definition"></a></dt>
<dd><p>The optional function that will be applied to the result of the agg_tu functions.</p>
<dd><p>The optional function that will be applied to the result of the agg_tu functions.
Part of the ggregating function(s) implementing np.nunique()</p>
</dd></dl>

<dl class="py function">
Expand Down
15 changes: 12 additions & 3 deletions impresso_commons/versioning/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -799,25 +799,34 @@ def compute_stats_in_canonical_bag(


### DEFINITION of tunique ###


# define locally the nunique() aggregation function for dask
def chunk(s):
"""The function applied to the individual partition (map)."""
"""The function applied to the individual partition (map).
Part of the ggregating function(s) implementing np.nunique()
"""
return s.apply(lambda x: list(set(x)))


def agg(s):
"""The function which will aggregate the result from all the partitions (reduce)."""
"""The function which will aggregate the result from all the partitions (reduce).
Part of the ggregating function(s) implementing np.nunique()
"""
s = s._selected_obj
return s.groupby(level=list(range(s.index.nlevels))).sum()


def finalize(s):
"""The optional function that will be applied to the result of the agg_tu functions."""
"""The optional function that will be applied to the result of the agg_tu functions.
Part of the ggregating function(s) implementing np.nunique()
"""
return s.apply(lambda x: len(set(x)))


# aggregating function implementing np.nunique()
tunique = dd.Aggregation("tunique", chunk, agg, finalize)

### DEFINITION of tunique ###


Expand Down

0 comments on commit e7e56e3

Please sign in to comment.