Extending the numeric tower with randomness and dice modifiers #18

bszonye · 2022-12-08T09:13:09Z

bszonye
Dec 8, 2022

So as I mentioned in posita/dyce#10, I'm working on a game analysis library, currently focusing on Warhammer combat,¹ and I've been struggling with the numeric typing. Like, the Python numeric tower is enough of a headache on its own, and then I'm trying to extend it to handle ideas like "the Attacks characteristic is usually a number like 5 but sometimes the result of a die roll like 1d6" or "a weapon's Strength characteristic can be an integer, or it can be a modifier to the wielder's Strength characteristic, or it can multiply the wielder's Strength." Then I need to deal with the subtle differences between Warhammer Age of Sigmar and Warhammer 40,000. I'm managing so far with a tangle of dataclasses and subtypes and factory functions, but it's getting messy. So far it's been easier to handle the I/O than the actual data representation, ha.

I'm going to look through your work on dyce and numerary for whatever insight that brings me. Really happy to have stumbled onto your work. In retrospect it makes sense that I'd run across somebody working on similar applications while struggling with similar problems.

https://github.com/bszonye/bones/pull/2 ↩

posita · 2022-12-18T16:50:24Z

posita
Dec 18, 2022
Maintainer

As you likely surmised, I've struggled with shoehorning problems like this into Mypy quite a bit. I've spent a fair amount of casual time stewing on this, and my current thinking is that numerical hierarchies are frustrating our ability to make forward progress. I think a better area to focus on (which Mypy can't currently support) is some way to define algorithm output types as a function of their input types without having to enumerate all possibilities. Something like this (although I think there could be some hefty syntax improvements):

TrueDivParamA = TypeVar("TrueDivParamA")
TrueDivParamB = TypeVar("TrueDivParamB")
TrueDivResult = DerivedTypeVar("TrueDivResult")  # <- sort of like a generic, but requires context to be useful

class TrueDiv(Protocol[TrueDivParamA, TrueDivParamB], derived_types=[TrueDivResult]):
    def __truediv__(self: TrueDivParamA, other: TrueDivParamB) -> TrueDivResult:  # <- this provides the context
        ...

# similarly define, e.g., MulParamA, MulParamB, MulResult, etc.

MyAlgoParamA = TypeVar("MyAlgoParamA")
MyAlgoParamB = TypeVar("MyAlgoParamB")

# The following says that the MyAlgoOutput type reflects the type of
# whatever you'd get if you called
# __mul__(__truediv__(obja: MyAlgoParamA, objb: MyAlgoParamB), objc: int)
MyAlgoOutput = MulResult[TrueDivResult[MyAlgoParamA, MyAlgoParamB], int]

# Then we could define a generic callable whose output type depends on
# its input type. Protocols would have to support something similar.
MyAlgo = Callable[[MyAlgoParamA, MyAlgoParamB], MyAlgoOutput]

# Ideally we could do things like this:
class Foo(MyAlgo[A, B]):
    def algo(obja: A, objb: B):
        return (obja / objb) * 1  # <- inferred return type of Foo.algo is MulResult[TrueDivResult[A, B], int]

reveal_type(Foo().algo(2, Fraction(8))  # -> fractions.Fraction
reveal_type(Foo().algo(2.1, Fraction(8))  # -> float

(Amidst various life stuff) I'm still wrestling with details (which is why I've been absent from that discussions.python.org thread). My intuition is that if we could express the above, we wouldn't need hierarchies like the numeric tower. They may actually just get in the way.

One issue is that TruedivResultType would have to take into account for where __truediv__(obja: MyAlgoParamA, objb: MyAlgoParamB) would result in the underlying obja.__truediv__(objb) method returning NotImplemented (or perhaps raising NotImplementedError or both), which is currently a runtime decision. I'm pretty sure Mypy could extend its AST analysis to cover many useful cases.

One other (unsolved) problem how to reconcile such a thing with @runtime_checkable. It may not be feasible (or necessary), but I haven't thought much about that.

Yours might be a good test case for whether my intuition holds for user-defined types?

0 replies

bszonye · 2022-12-19T03:34:14Z

bszonye
Dec 19, 2022
Author

Yeah, I can definitely relate. I've recently spent a bunch of time just figuring out how to implement a mapping type in a way that is both easy for humans to read and satisfactory to mypy & pyright in strict mode. I'm writing library code, so I don't want to create unnecessary gotchas for end users. That's surprisingly difficult to do for something as ubiquitous as the dict/mapping constructor interface (which accepts a mapping or iterable pairs).

As I'm modeling a probability mass function, the pairs/mappings are from discrete values to probabilities. I would like to offer shorthand notation for things like equal shares or dice ranges. For example, I'd like for all of these things to have the same meaning:

d3mapping = {1: 1, 2: 1, 3: 1}
d3 = DicePMF(d3mapping)  # mapping
d3 = DicePMF(d3.items())  # iterable pairs
d3 = DicePMF(d3.keys())  # iterable values
d3 = DicePMF(len(d3))  # values from 1 to n

The first two are the standard dict-style interface. The third one accepts iterable values instead (like Counter), and the last one just wants a number of items (like range). That's easy enough to implement with Python structural pattern matching, but it's a huge headache to overload it all and still have the mypy & pyright output make sense. Just dealing with the quirks of the first two overloads is hard enough. I eventually gave up because it's easier to just use Counter and range explicitly, e.g.:

d3 = DicePMF(Counter((1, 2, 3)))

This doesn't directly relate to the numeric tower stuff, but I think it's the same kind of thing you're struggling with around overloading.

Currently, I'm leaning toward making things simpler and more explicit. That is, I don't think it's necessarily a bad thing to use explicit adaptors instead of convenience overloads. Like, in the example above, I wanted the option to abbreviate (value, occurrences) to just value for the frequent case where occurrences is 1. However, in Python you quickly end up with the case where value is itself a sequence and so you have to disambiguate the two anyway.

For similar reasons, I have a bunch of messy Union[int, Fraction] in my code that I would love to generalize to a single Probability type, but the builtin abstractions are too leaky for that to actually work. Like, int / int is float rather than Fraction or Rational or Probability, and that fact leaks out very easily unless you're very careful about conversions, literals, and stdlib return values. Now that I think of it, that might explain why Decimal is so insular. The main use case is for financials, and they really don't want floats leaking out of their expressions.

So far I've found it easier to just accept the builtin quirks and assume that all probabilities are either int or a Fraction rather than trying to make a unified Probability type work. Unfortunately that adds to the existing inertia. 🤷‍♀️ Ultimately it's really tough to manage the existing complexity in a new builtin-like type, and so people tend not to do it unless they really need to, and then they end up creating isolated towers like what happened with Decimal.

2 replies

posita Dec 19, 2022
Maintainer

Yeah, I can definitely relate. I've recently spent a bunch of time just figuring out how to implement a mapping type in a way that is both easy for humans to read and satisfactory to mypy & pyright in strict mode. I'm writing library code, so I don't want to create unnecessary gotchas for end users. That's surprisingly difficult to do for something as ubiquitous as the dict/mapping constructor interface (which accepts a mapping or iterable pairs).

I think generic containers that support operations is among the trust tests of any typing system. I really want to be able to do something like this (extended some pseudo-crap from my comment above):

class TrueDivTuple(tuple[TrueDivParamA, ...], Generic[TrueDivParamA, TrueDivParamB], derived_types=[TrueDivResult]):
    def __truediv__(
        self,  # inferred type is tuple[TrueDivParamA, ...]
        other: Union[tuple[TrueDivParamB, ...], TrueDivParamB],
    ) -> tuple[TrueDivResult, ...]:
        ...

As I'm modeling a probability mass function, the pairs/mappings are from discrete values to probabilities. I would like to offer shorthand notation for things like equal shares or dice ranges. For example, I'd like for all of these things to have the same meaning:
d3mapping = {1: 1, 2: 1, 3: 1}
d3 = DicePMF(d3mapping)  # mapping
d3 = DicePMF(d3.items())  # iterable pairs
d3 = DicePMF(d3.keys())  # iterable values
d3 = DicePMF(len(d3))  # values from 1 to n
The first two are the standard dict-style interface. The third one accepts iterable values instead (like Counter), and the last one just wants a number of items (like range). That's easy enough to implement with Python structural pattern matching, but it's a huge headache to overload it all and still have the mypy & pyright output make sense.

Do you have a specific example? My experience with @overload is that it's finicky (especially when used in concert with Union parameters in the overloaded definitions), but can usually be ~~beaten~~ nudged into doing what you want. But you're right: I don't know of a better way to do that other than enumerating a bunch of cases. While Python tends to be a little verbose in this regard, I don't know of another language that solves this in a substantially more elegant way.

For similar reasons, I have a bunch of messy Union[int, Fraction] in my code that I would love to generalize to a single Probability type, but the builtin abstractions are too leaky for that to actually work. Like, int / int is float rather than Fraction or Rational or Probability, and that fact leaks out very easily unless you're very careful about conversions, literals, and stdlib return values. Now that I think of it, that might explain why Decimal is so insular. The main use case is for financials, and they really don't want floats leaking out of their expressions.

So far I've found it easier to just accept the builtin quirks and assume that all probabilities are either int or a Fraction rather than trying to make a unified Probability type work. Unfortunately that adds to the existing inertia. woman_shrugging Ultimately it's really tough to manage the existing complexity in a new builtin-like type, and so people tend not to do it unless they really need to, and then they end up creating isolated towers like what happened with Decimal.

Yeah, this is kind of the approach that many core Python developers naively recommend without really appreciating typing as an interface one wants to provide for one's customers. They think in terms of, "Just use the closest native type as a proxy, and then [some hand-wavy thing] at runtime when you pass something else." Yours (and mine) is an extension of that approach, just (as you mention) extending that proxy to (e.g.) fractions.Fraction. It's not generically applicable based on interface compliance. Type checkers will still complain when you hand it a rational from (e.g., sympy or sagemath), even if the interface matches.

bszonye Dec 19, 2022
Author

Adding more overloads to the PMF constructor wasn't too bad with mypy, but pyright was tougher to satisfy. It's quite eager to declare generic types "partially Unknown" in strict mode (which means that it can infer e.g. Iterable but not Iterable[whatever]). It comes up a lot when you overload different generic types like Mapping and Iterable because you can type-check the generic types but not the type parameters. The only way I found around this was to structure things simply enough that I could put in appropriate cast calls without worrying that I was lying to the type checker.

That's part of why I decided not to extend the dict-style constructor any further, and instead use adaptors or @classmethod factory functions to implement shorthand notation.

bszonye · 2022-12-19T03:42:04Z

bszonye
Dec 19, 2022
Author

As an aside, C++ really tried to do the generic algorithm thing too. I haven't followed the progress of C++ for a long time, but I recall there being similar stumbling blocks along the way. That language has very different design goals of course, with much more behavioral overloading on type, rather Python's compromise between static type linting and runtime duck typing.

I think both languages ended up "good enough" for everyday use, but they're both conceptually ugly for library writers who want to extend or unify abstractions. In library code there's always the tension between wanting to handle more general cases but only having the resources to build and test the most common ones. That ends up reinforcing the ecosystem status quo, as only the common cases are actually safe to use in production code.

1 reply

posita Dec 19, 2022
Maintainer

I also don't know the current state of the C++ world, but my memory of templates is that you have the benefit of compile-time expansion of the template at the call sites which were then also type-checked at compile time. Typescript is similar conceptually (although many details differ). I don't know if this is how Mypy thinks of generics (i.e., as a conceptual "first pass" to inform the second on examined code).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending the numeric tower with randomness and dice modifiers #18

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Extending the numeric tower with randomness and dice modifiers #18

bszonye Dec 8, 2022

Footnotes

Replies: 3 comments · 3 replies

posita Dec 18, 2022 Maintainer

bszonye Dec 19, 2022 Author

posita Dec 19, 2022 Maintainer

bszonye Dec 19, 2022 Author

bszonye Dec 19, 2022 Author

posita Dec 19, 2022 Maintainer

bszonye
Dec 8, 2022

Replies: 3 comments 3 replies

posita
Dec 18, 2022
Maintainer

bszonye
Dec 19, 2022
Author

posita Dec 19, 2022
Maintainer

bszonye Dec 19, 2022
Author

bszonye
Dec 19, 2022
Author

posita Dec 19, 2022
Maintainer