Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add dataclasses for data model #4

Closed
wants to merge 4 commits into from

Conversation

tangkong
Copy link
Contributor

@tangkong tangkong commented Mar 1, 2024

Description

  • Adds basic datacalsses for Entry, Parameter, Value, Collection, and Snapshot
  • Adds initial validation tests, and a sample database fixture

Motivation and Context

Continue fleshing out structure of this package's backend.

(same description as #3 that I'm leaving behind, since we should make filled objects not track their parents/origin, duplicating that data instead.)

Notes on validation

I've chosen apischema here since it is the beast we know, and has similar functionality to pydantic. apischema performs type validation happens on serialization/deserialization, but not outside of that. If we are to be constructing these dataclasses we'll want to validate them before attempting to save them to a backend (whether serialized or not).

To help with that, the .validate() method has been added. This calls the apischema validation routine. Adding routines can be done with the @validator decorator, quite easily. I will note that these routines get called on deserialization, meaning we do duplicate some of apischema's validation when we check type hints. I think this gives us hooks to write more validation routines, as long as those routines can also be run at deserialization. Maybe in the future we stop using apischema to gather the routines, but for now this makes sense to me.

In pursuit of lazy loading

This section is also the "Why hint Union[UUID, Entry] everywhere?" section

I expect lazy loading will become a requirement, especially as database size grows. I imagine a world where we have a huge root tree structure, with deep nesting of collections and snapshots. In this case, we wouldn't want to pull the whole database's contents with every app load.

Instead we might first grab the metadata information (name, description, creation time), then fill its children when their information is requested. Exactly how we accomplish this should be fleshed out in the client/backend implementations, but the data-model would have to allow "unfilled" references to Entry objects.

How Has This Been Tested?

Added a few tests, need to flesh them out a bit

Where Has This Been Documented?

This PR, soon pre-release notes

Pre-merge checklist

  • Code works interactively
  • Code follows the style guide
  • Code contains descriptive docstrings, including context and API
  • New/changed functions and methods are covered in the test suite where possible
  • Test suite passes locally
  • Test suite passes on GitHub Actions
  • Ran docs/pre-release-notes.sh and created a pre-release documentation page
  • Pre-release docs include context, functional descriptions, and contributors as appropriate

@tangkong tangkong requested review from ZLLentz and shilorigins March 1, 2024 17:50
@tangkong
Copy link
Contributor Author

tangkong commented Mar 1, 2024

I left the old PR up, but we should look at this one instead.

@tangkong tangkong mentioned this pull request Mar 1, 2024
8 tasks
Copy link
Collaborator

@shilorigins shilorigins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included some inline suggestions and a couple of questions in this review, but I'm primarily having trouble following what's happening in serialization.py, particularly as_tagged_union. Would you be able to give me a high-level summary of what it's for and how it works?

Func = TypeVar("Func", bound=Callable)


def alternative_constructor(func: Func) -> Func:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def alternative_constructor(func: Func) -> Func:
def register_alternative_constructor(func: Func) -> Func:

This naming is clearer to me, but not sure if that's the case for everyone.

Comment on lines +41 to +44
def _capitalized(name: str) -> str:
return name[0].upper() + name[1:]

return "".join((cls.__name__, *(_capitalized(arg.__name__) for arg in args)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _capitalized(name: str) -> str:
return name[0].upper() + name[1:]
return "".join((cls.__name__, *(_capitalized(arg.__name__) for arg in args)))
return "".join((cls.__name__, *(arg.__name__.capitalize() for arg in args)))

return "".join((cls.__name__, *(_capitalized(arg.__name__) for arg in args)))


generic_name = type_name(_get_generic_name_factory)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the docs and source for apischema, but I can't figure out what the purpose of type_name is. It associates types with a TypeNameFactory? And then how is that supposed to be used?

# attempt to cast
self.data_type = data_type(self.data)

if (self.data_type == 'unset'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (self.data_type == 'unset'):
if (self.data_type == UNSET):

class Value(Parameter):
"""An Entry that stores a PV name and data pair"""
data: AnyEpicsType = _unset_data
data_type: str = 'unset'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data_type: str = 'unset'
data_type: str = UNSET

Comment on lines +93 to +106
while origin: # Drill down through nested types, only Lists currently
if origin is Union:
break
elif origin in (list, Sequence):
hint = get_args(hint)[0] # Lists only have one type
origin = get_origin(hint)
is_list = True
else:
origin = get_origin(hint)
hint = get_args(hint)

# end condition
if origin is None:
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while origin: # Drill down through nested types, only Lists currently
if origin is Union:
break
elif origin in (list, Sequence):
hint = get_args(hint)[0] # Lists only have one type
origin = get_origin(hint)
is_list = True
else:
origin = get_origin(hint)
hint = get_args(hint)
# end condition
if origin is None:
break
while origin is not None and origin is not Union: # Drill down through nested types, only Lists currently
if origin in (list, Sequence):
hint = get_args(hint)[0] # Lists only have one type
origin = get_origin(hint)
is_list = True
else:
origin = get_origin(hint)
hint = get_args(hint)

This suggestion is logically equivalent to the current code, except that hint doesn't get reassigned to get_args(hint) in the case that origin starts out as None.

Although this suggestion is a little cleaner, I'm not sure this while loop is doing what it should. For a simple type hint (e.g. int), origin will be None and hint will be (), ultimately resulting in a ValidationError.

Comment on lines +119 to +120
f'improper type ({type(field_value)}) found in field '
f'(expecting {hint}])'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the error message reflect that an element in the list broke validation, rather than the list as a whole?

Comment on lines +110 to +113
if (origin is Union):
inner_hint = get_args(hint)
else:
inner_hint = hint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (origin is Union):
inner_hint = get_args(hint)
else:
inner_hint = hint

The docs for isinstance say it supports Union as the classinfo parameter.

f'improper type ({type(field_value)}) found in field '
f'(expecting {hint}])'
)
elif not isinstance(field_value, inner_hint):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif not isinstance(field_value, inner_hint):
elif not isinstance(field_value, hint):

The docs for isinstance say it supports Union as the classinfo parameter.

inner_hint = hint

if is_list:
list_comp = (isinstance(it, inner_hint) for it in field_value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
list_comp = (isinstance(it, inner_hint) for it in field_value)
list_comp = (isinstance(it, hint) for it in field_value)

The docs for isinstance say it supports Union as the classinfo parameter.

@tangkong
Copy link
Contributor Author

Closing this after extensive discussions on the data model. This material is likely still useful, but should be superseded by future work. (That I hope will include similar tests)

@tangkong tangkong closed this Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants