Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration of normalising data pipelines #37

Open
ghost opened this issue Jun 25, 2020 · 4 comments
Open

Configuration of normalising data pipelines #37

ghost opened this issue Jun 25, 2020 · 4 comments
Assignees

Comments

@ghost
Copy link

ghost commented Jun 25, 2020

Data Enhancement
Which enhancement options available in the pipeline for Stage 2 processing needs to be configurable by the user.

This app, like the last one, uses a system of pipelines that perform certain actions on the data.

Is the work here to enable configuration options so it's easy to turn certain pipes on and off?

Are there other use cases you're looking to meet here?

@ghost ghost assigned nickevansuk and thill-odi Jun 25, 2020
@robredpath
Copy link
Collaborator

@robredpath
Copy link
Collaborator

@rhiaro could you give a quick outline of what a particular pipe might encompass? Is it, for example, carrying out a particular normalisation, or "the geo stuff"?

Do we know if there are any dependencies between pipes that would mean that particular ones can't be disabled without other ones being useless/pointless/unreliable?

@rhiaro
Copy link
Collaborator

rhiaro commented Jun 29, 2020

There will be pipes for "the geo stuff", "the activity tag stuff" and the "organisation stuff" aka the enhancement pipes.

There will be pipes for particular normalisations, but in several cases these are functionally the same, so are merged into one pipe.

There are object types that will need to pass through more than one pipe to be completed, eg. an EventSeries with subEvents that are Events - instead of duplicating the Event normalisation in an EventSeries pipe, we pass it through the EventSeries pipe to slurp the necessary data out of the parent object, then it goes through the normal Event pipe for the rest (or vice versa) - at least this is how some worked last time, but could be rearchitected if pipe dependencies is going to be a problem.

Today I've been thinking about breaking it up a bit so there are pipes for things that are common between all/several pipes, eg. dealing with the presence of invalid fields. However, these could potentially be reorganised as methods on the parent Pipe that all the other pipes can call on instead. It would be helpful to know exactly what sorts of things will need turning on and off to architect this better.

@rhiaro
Copy link
Collaborator

rhiaro commented Jul 1, 2020

Requirements are the ability to turn the enhancement pipes off at runtime, but normalisation pipes don't need this. Dependencies between normalisation pipes should be noted, in case someone alters the code to disable some, but it's not a requirement we need to explicitly support at the CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants