Skip to content

selik/dict-grouping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PEP ???? -- dict.grouping

We currently have three reasonable techniques to create groups from a sequence or iterable:

  • itertools.groupby
  • collections.defaultdict
  • dict.setdefault

Unfortunately, both itertools.groupby and collections.defaultdict are error-prone, and dict.setdefault is homely (not beautiful).

The defaultdict is elegant for building a grouping, but many otherwise-expert programmers will accidentally insert new groups when they intended to raise a KeyError.

Elegant for creating groups:

>>> from collections import defaultdict
>>> groups = defaultdict(set)
>>> for x in range(7):
...     groups[x % 2].add(x)
...

Error-prone when using groups:

>>> groups
defaultdict(<class 'set'>, {0: {0, 2, 4, 6}, 1: {1, 3, 5}})
>>> len(groups[2])      # accidentally inserts a new group
0
>>> groups
defaultdict(<class 'set'>, {0: {0, 2, 4, 6}, 1: {1, 3, 5}, 2: set()})

Many users of itertools.groupby will forget to sort the data before grouping, accidentally creating two or more separate groups for the same key.

>>> from itertools import groupby
>>> mod_2 = lambda x: x % 2

Mistake:

>>> {k: set(group) for k, group in groupby(range(7), key=mod_2)}
{0: {6}, 1: {5}}

Correct:

>>> numbers = sorted(range(7), key=mod_2)
>>> {k: set(group) for k, group in groupby(numbers, key=mod_2)}
{0: {0, 2, 4, 6}, 1: {1, 3, 5}}

The dict.setdefault method is often the best choice for grouping, but suffers from a less-beautiful appearance. Secondarily, setdefault cannot easily create a grouping as an expression.

>>> groups = {}
>>> for x in range(7):
...     groups.setdefault(x % 2, set()).add(x)
...
>>> groups
{0: {0, 2, 4, 6}, 1: {1, 3, 5}}

Proposal

I propose a new dict classmethod, dict.grouping which will construct a new dictionary based on an iterable and a key-function.

>>> # grouping = dict.grouping
>>> mod_2 = lambda x: x % 2
>>> grouping(range(7), mod_2)
{0: [0, 2, 4, 6], 1: [1, 3, 5]}
>>> grouping('ababa')
{'a': ['a', 'a', 'a'], 'b': ['b', 'b']}
>>> grouping('aBAb', str.casefold)
{'a': ['a', 'A'], 'b': ['B', 'b']}
>>> grouping('aBAbaB', str.casefold)
{'a': ['a', 'A', 'a'], 'b': ['B', 'b', 'B']}

While dict.grouping creates a dict of lists, preserving the order that group members were encountered, it is often useful to create "equivalence classes" which are better modeled as a dictionary of sets.

>>> groups = grouping('aBAbaB', str.casefold)
>>> {k: sorted(set(g)) for k, g in groups.items()}
{'a': ['A', 'a'], 'b': ['B', 'b']}

If each group should be a multiset, where repetitions matter but order does not, then a dictionary of Counters is appropriate.

>>> from collections import Counter
>>> groups = grouping('aBAbaB', str.casefold)
>>> {k: Counter(g) for k, g in groups.items()}
{'a': Counter({'a': 2, 'A': 1}), 'b': Counter({'B': 2, 'b': 1})}

About

PEP: add a dict.grouping classmethod

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages