Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Data Processors to be internal to Data Services #1095

Open
Tracked by #832
jchate6 opened this issue Nov 1, 2024 · 0 comments
Open
Tracked by #832

Move Data Processors to be internal to Data Services #1095

jchate6 opened this issue Nov 1, 2024 · 0 comments
Labels
Data Services Data Services

Comments

@jchate6
Copy link
Contributor

jchate6 commented Nov 1, 2024

In the new DataServices regime, especially with the existence of schema for ReducedDatums, the way we handle DataProducts and DataProcessors, needs to be refactored.

Current relationship between ReducedDatums, DataProducts, and Data Processors:

We need some way to translate data from a given source (currently this is often ingested as a DataProduct) from its ingested format into individual Reduced Datums that can be stored, displayed, and analyzed by the TOM. The current process for this is inconsistent, but one scenario is as follows:

  • DataService Query results in Data
  • A data product is created with a TYPE specific to the Data Service/Query type
  • A data processor (Specific to the DataProduct TYPE) is used to translate this data into a format expected by the TOM
  • Users can overwrite this processor

Proposed refactor:

We propose that in v3.0 we introduce a DEFAULT Data Schema that can serve as an intermediate data state between Data Services and User defined Reduced Datums. Fundamentally, the idea is that a DatService would contain all of the data processing functions required to convert a query output or DataProduct upload into the Default Schema, and then the User defined processors could convert perform any reductions or analysis required and produce the final Reduced Datum with a potentially different schema.

@github-project-automation github-project-automation bot moved this to Triage in TOM Toolkit Nov 1, 2024
@jchate6 jchate6 added the Data Services Data Services label Nov 8, 2024
@jchate6 jchate6 moved this from Triage to Backlog in TOM Toolkit Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Services Data Services
Projects
Status: Backlog
Development

No branches or pull requests

1 participant