-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that different data pipelines are logically separate #375
Comments
The problem Ian stated was valid, in that data relay was not designed to be "multi-tasking," more specifically, it can only sequentially execute the planned tasks. Therefore, if the plan changes, such as in the backfill situation where we need to go back in history to redo the relay, the current data relay has to choose either completely cancel current existing plans and focus on the backfill, or continue to execute the existing plan all the way until its completion, after which other tasks are possible to execute. The root cause is the single queue design. To mitigate the situation that the last task wait too long, I change the design to be multi-task-queue like below: |
Thanks for the thoughtful discussion.
The next step is to execute the plan. |
Pingping successfully implemented the changes outlined above. As a next step before closing this issue Pingping will like to implement a dashboard to track performance of the different queues. Pingping will meet with @ian-r-rose when he returns next week for input on the KPIs to use. |
Next step on this task is to document the code. Pingping will create a separate issue for creating a dashboard to track performance. |
Per @pingpingxiu-DOT-ca-gov there is not a need to add a new dashboard, existing dashboards should capture any issues. Next step on this issue is for Pingping and @ian-r-rose to meet and review the code. |
Per @pingpingxiu-DOT-ca-gov this is waiting on the virtual environments PR to be completed. Then Pingping will submit a PR for this to be further reviewed before completion. |
At one point the 30-second raw data pipeline was tightly coupled with the config table uploads. This meant that an incident in one pipeline could affect the others. As an example, in June there was an incident where the data relay server was down for a couple of weeks. It took almost a week of data crawling to recover, and the config table uploads were scheduled behind the 30-second data uploads. Because of the tight coupling, it took a long time to update the config tables (the scripts for which can run in under a minute), even though they are logically separate.
Going forward, the data relay server should be able to schedule the different parts of the pipeline independently so that incidents (or incident recovery) in one of them do not affect the others.
Note: there has been some refactoring of the upload scripts since the above incident, so the coupling may not be the same now.
The text was updated successfully, but these errors were encountered: