-
Notifications
You must be signed in to change notification settings - Fork 0
Package : Utils : Reference
Nuwan Waidyanatha edited this page Nov 23, 2023
·
1 revision
The reference package is designed to handle domain specific system-wide static data; essentially taxonomical (or categorical) lookup data. It is designed to maintain reference data in any form such as in an RDBMS, NoSQL, or File forms.
Note - the current implementation of the reference data is maintained in the form of a util_refer
table in an RDBMS.
- export/import reference data from and into structured or semi-structure data files (e.g. JSON, CSV)
- Read/write reference data to be used in the domain specific functionality (e.g. lookup and assign categorical data)
Set and Get reference data as a apache spark dataframe. A typical dataframe holds the following data elements:
ref_pk # system generated integer upon insert
realm # schema table entity the realm is associated with
category # a category within the table the lookup is for
code # alternate code to the reference value
value # the used reference value for the specific category
description # description about the reference value
source_uuid # uuid or pk of the record from the data source
data_source # storage data was taken e.g. S3 folder
data_owner # reference to the origin of the data
- The realm_list comprises all the realm values in the util_refer table
- When getting the realm_list, if it is empty, the property getter will retrieve all values
- The realm is domain specific.
- For example, a entity may several categorical values defining the category, type, or other taxonomical static values.
- A
realm
value must be a value of therealm_list
- The property provides a realm specific filtered list of values
- The setter will add a value to the realm list, if it is unique
- Each realm can have multiple categories
- The property getter will validate the category against the category_list
- To support filtering the realm_list a category value can be set
A class function for retrieving a realm
and category
specific reference data from storage.
get_reference(
realm: str = None,
category:str=None,
**kwargs
)
- returns a spark dataframe (self._data is set with the newly filtered dataset)
- retrieve all the reference data and then filter by the realm and category, if specified
Rezaware abstract BI augmented AI/ML entity framework © 2022 by Nuwan Waidyanatha is licensed under Creative Commons Attribution 4.0 International