Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Enum field #2017

Merged
merged 9 commits into from
Sep 4, 2022
Merged

Add Enum field #2017

merged 9 commits into from
Sep 4, 2022

Conversation

lafrech
Copy link
Member

@lafrech lafrech commented Jul 20, 2022

Replaces @orenc17's #1885.

This is a long standing feature request. See for instance discussions in #267.

The status quo is that people can use @justanr's marshmallow_enum.

Currently, this lib doesn't seem actively maintained. I don't blame the author. A few things are outdated, deprecated stuff. The changes since MA3 have been user contributed and some are still missing.

I'm thinking if we recommend this lib as the reference choice, why not integrate it in code? This is what I've been trying to do.

I started by importing the code and updating it for PY3/MA3.

I also removed the dump/load asymmetry (dump by value, load by name) that I found awkward and revamped the error message generation.

I was still not happy with the by_value/by_name mechanism. I serialize by name, so I could live with only this, but I understand the need for serializing by value (to get an int or a nicer string). However, I like the type to be well-defined so that we can inherit deserialization from basic fields, and to help documenting the type (e.g. in apispec).

My proposal is therefore an Enum field serializing by name, and typed enum fields StringEnum and IntegerEnum, to serialize by value with a given type. Users would be free to create new types but I believe those cover most cases.

Transition from marshmallow_enum should be relatively smooth for users who don't use the dump/load asymmetry and who don't rely on exact error messages.

This still lacks a bit of doc but I'd rather get feedback before polishing.

Note we can't use OneOf validator because validation occurs after deserialization and wrong values are caught at deserialization already.

Comment on lines +1906 to +1909
if value is None:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangent: This pattern is repeated in nearly every field. Is there any reason this isn't in Field.serialize? The complimentary None handling is implemented in Field.deserialize. Maybe something to tack onto #2012.

src/marshmallow/fields.py Show resolved Hide resolved
tests/base.py Show resolved Hide resolved
raise self.make_error("unknown", choices=self.choices) from exc


class StringEnum(TypedEnum, String):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I was doing everything all over again, I'd leave it at just the string representation of the enum and call it a day. The only value with using an underlying numeric is that when sending to some other languages they can successfully decode an unknown value into their enum type. But that just seems like asking for subtle bugs instead of being hit with "Cannot deserialize MyEnum.FooBar15" or whatever they'd spit out. I've been away from python long enough that I don't recall what Python does off hand but if I had to guess Python barfs when attempting that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than str(value), which would expose the enum name, I'd rather use the member name (SymbolEnum). Offering to serialize by value allows users to provide human-readable names as enum values, for instance, or at least decouple business logic and API layer a bit.

But thanks for the feedback. Your comment confirms my intuition that the dump/load asymmetry is too much API surface for the need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the symbol name is what I meant, just worded it kind of poorly. The asymmetry was a pain in the ass for tests and it tripped me up a few times. Pretending that they're just a union of string literals is the way to go imo

@justanr
Copy link
Contributor

justanr commented Jul 21, 2022

Funnily enough, I showed up here because I got notified I'm a maintainer of a critical package, which turns out is the one I question.

@lafrech
Copy link
Member Author

lafrech commented Jul 22, 2022

Renamed Enum -> EnumSymbol, TypedEnum -> EnumValue. I'll squash later on.

Docstrings still missing.

@lafrech
Copy link
Member Author

lafrech commented Aug 18, 2022

What about EnumValue(String) and EnumValue(Integer)? That is consistent with Nested/List API, improves readability, and minimizes the API surface area.

Just tried this and it worked a treat!

I'm happy with this implementation.

Still requires docstrings and perhaps testing with a DateTime field or some other field for which ser/deser is less trivial to assert ser/deser actually happens.

@lafrech lafrech force-pushed the enum branch 2 times, most recently from e96d92d to d77392a Compare August 19, 2022 07:20
@lafrech
Copy link
Member Author

lafrech commented Aug 19, 2022

I think I'm done.

@sloria @deckar01 @justanr I'll wait for a few days in case someone wants to review this and then I'll merge.

Copy link
Member Author

@lafrech lafrech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As PR author, I can't request changes.

Please don't merge until I fix.

def __init__(self, cls_or_instance: Field | type, enum: type[Enum], **kwargs):
super().__init__(**kwargs)
self.enum = enum
self.choices = ", ".join([str(m.value) for m in enum])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong, we need to use the field to dump the value. str only work for trivial cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a change to let the field serialize the values.

Our API kinda sucks because serialize expects attr and obj parameters and we don't have those and actually I don't think they are ever used. There might be a nice opportunity for a refactor here. I opened #2039.

Copy link
Member Author

@lafrech lafrech Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.field._serialize(m.value, None, None)

This won't work for fields that need attr or obj (but is there any ? #2039) or fields that are not symmetric (don't dump what they loaded). I'm not talking about fields that don't round-trip, I mean if several equivalent representation are accepted at load time and only one is used at dump time, this one is the one that will be displayed here and that's fine, I believe. It would be more of a problem for custom field that would purposely have asymmetric load/dump methods, like using different time formats, for instance.

Since those must be quite rare, and using them in this enum field would be even more rare, I'm fine with this limitation.

We could remove the list of accepted values from the error message but I think it is useful in the "normal" case (99% of the time) and it will be needed for documentation purpose anyway (apispec).

Copy link
Member Author

@lafrech lafrech Sep 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.field._serialize(m.value, None, None)

is already used in Mapping.

@lafrech
Copy link
Member Author

lafrech commented Aug 25, 2022

PR fixed. I'm not 100% happy with self.field._serialize(m.value, None, None) but that's our API.

@lafrech
Copy link
Member Author

lafrech commented Sep 4, 2022

Alright, moving on with this.

@lafrech
Copy link
Member Author

lafrech commented Sep 4, 2022

Just after merging this, as I was updating the changelog to release, I realized the naming didn't follow our habits of naming fields by object type.

I pushed a new PR to propose merge of both fields into an Enum field: #2044.

@lafrech lafrech mentioned this pull request Sep 15, 2022
@ThiefMaster
Copy link

Would you consider adding a short migration guide for marshmallow_enum users? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants