-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to Table Syncing #245
Comments
Additional points from meeting discussion:
Overall, it seems like we're leaning towards minimizing the amount of additional effort TAPE puts in to workflows, and rather just make sure that users have the tooling to do things more explicitly. Any fluff/flourish we're doing on the backend can really pile up on users looking to work with large datasets. |
I'm going to take this, and implement a lean sync (mentioned above) as a replacement for the current sync. We can still think about adding more sync options/generalization, but for now the lean mode seems to be the correct baseline behavior. |
As of #254, we are now syncing just the IDs against one another ("lean" sync as described above) |
Related to #243
Table Syncs are a key component of the TAPE Ensemble design, where Object and Source are kept up to date with one another as the user operates on one or both tables. Currently with Table Syncs, we do the following:
With the Source to Object Sync, we are doing much more work to perform the sync, because we assume that these nobs columns will be useful to users. However, from discussion and some real workflows, it's clear that some users will not need these nobs columns for their work. In these cases, we are introducing needless operations to the workflow.
Given this, we should introduce more options for ensemble syncing.
Beyond this, nobs information is likely not the only information that a user may want to sync. As @hombit mentioned: duration (per band), mean mag, variability index, etc. are all examples of potentially trackable metrics. There is potential to abstract syncs to enable users to setup their own sync criteria, something where given a set of input columns from one table, a function, and a set of output columns TAPE is instructed to run that function whenever the dirty flag is set. We should make sure that any implementation has use cases demonstrating clear upside to the alternative, of users just manually running functions to update columns when they need that column for the next analysis step.
The text was updated successfully, but these errors were encountered: