You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#191 Milestone 2 focuses on adding functionality to the Ensemble that allows it to track and manage EnsembleFrame objects, for the purposes of the milestone, this would be added without considering too much removal of current ensemble functionality. The goals are fairly straightforward:
A Ensemble.frames property added to track the EnsembleFrames tied to the Ensemble
Ensemble.frames = {“source”:SourceFrame, “object”:ObjectFrame, “Result1”: EnsembleFrame, “model_params”: EnsembleFrame} # Where each value is an instance of the class
Ensemble.source and Ensemble.object should be shorthands to access the required SourceFrame and ObjectFrame objects
The following functions would be a minimum set API:
*Ensemble.select_frame(frame_label): Returns the associated EnsembleFrame object, would allow a user to work with the EnsembleFrame directly using the Dask API. We’d need to make sure that the Ensemble itself is updated cleanly as the user works with their data.
Ensemble.frame_info(frames=None): Returns the information for a subset or all of the available frames, showing column information, memory usage, etc.
Ensemble.add_frame(dataframe, frame_label): Adds a dataframe to the Ensemble, useful for filtering an EnsembleFrame and adding the result to a new view.
Ensemble.update_frame(EnsembleFrame): Similar to add_frame, but uses the label to automatically update Ensemble.frames.
Ensemble.drop_frame(frame_label): Drops a frame from the ensemble and closes it so that it doesn’t persist in memory
Ensemble.from_parquet(file_path, col_mapper=None): Loader functions would be at the Ensemble level and would yield a new EnsembleFrame tied to the provided label. This may not be the best format for loading Object and Source data. [from_parquet already exists as an Ensemble function, it may also be fine to wait on implementing this until further milestones]
Ensemble.objsor_from_parquet(source_file, object_file, column_mapper) (?): A more structured function that loads in the ObjectFrame and SourceFrame data, with associated column_mappings. I will admit the function name here leaves a lot to be desired, maybe there’s a better way to approach this.
Finally, the above API may not be the optimal implementation. If there are thoughts on alternatives that may feel more intuitive to users, please feel free to explore them!
The text was updated successfully, but these errors were encountered:
#191 Milestone 2 focuses on adding functionality to the Ensemble that allows it to track and manage EnsembleFrame objects, for the purposes of the milestone, this would be added without considering too much removal of current ensemble functionality. The goals are fairly straightforward:
Ensemble.frames = {“source”:SourceFrame, “object”:ObjectFrame, “Result1”: EnsembleFrame, “model_params”: EnsembleFrame} # Where each value is an instance of the class
The following functions would be a minimum set API:
*Ensemble.select_frame(frame_label): Returns the associated EnsembleFrame object, would allow a user to work with the EnsembleFrame directly using the Dask API. We’d need to make sure that the Ensemble itself is updated cleanly as the user works with their data.
Finally, the above API may not be the optimal implementation. If there are thoughts on alternatives that may feel more intuitive to users, please feel free to explore them!
The text was updated successfully, but these errors were encountered: