You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import cudf
dt = cudf.CategoricalDtype(["a", "b", "c"])
hash(dt) # TypeError
hash(dt.categories) # works, dt.categories is a `StringIndex`, so that's immutable, so hashing is ok.
hash(dt.categories._column) # works?! This seems bad, since a column is _not_ immutable
hash(cudf.StringIndex(["a", "b", "c"])) == hash(cudf.StringIndex(["a", "b", "c"])) # False, uh-oh
What is going on?
Recall that if you define no dunder ops on a class, then you get __eq__ and __hash__ from object, which uses id for equality and a hash of id for __hash__.
If you define a class that sets __eq__ but not __hash__ then the interpreter automatically sets __hash__ = None, to indicate that the object is unhashable. Since it can't automatically construct a hash function that satisfies the invariant x == y => hash(x) == hash(y) and x is y. See https://docs.python.org/3/reference/datamodel.html#object.__hash__
Columns and Indexs have __eq__ defined, so why can we hash them? Turns out the __eq__ method is set programmatically through a mixin class, through __init_subclass__, and it appears that this is sufficiently dynamic that the interpreter doesn't spot the __eq__ method and set __hash__ = None.
CategoricalDtype explicitly defines __eq__, hence we get __hash__ = None.
To hash this we need to compute a deterministic hash of the categories, and mix with the ordered flag (as you note). This requires, ideally, a libcudf implementation of a hash of a column of values: right now all of the hashes that are implemented produce a per-row hash.
Is your feature request related to a problem? Please describe.
Would be nice if
hash(CategoricalDtype(...))
was supportedDescribe the solution you'd like
Define
__hash__
onCategoricalDtype
Describe alternatives you've considered
Custom hash based on order and categories
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
The text was updated successfully, but these errors were encountered: