-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shibuya distributed mode #19
Comments
Because of these goroutines, currently controller is not stateless. This is the path we can follow: [1]. If all of the stateful logic could be moved to workers. If we need 2, then we need to consider what happens during leader election, for example, during release or the leader goes down. Another challenge is that once we have replicas for controller, Prom could not get the metrics.
This is required for continue reading the metrics when the controller process gets restarted. [1]
This is for raw metrics streaming. Currently the metrics are collected in heap memory. We need the workers to report the metrics to a broker(Redis is a good candidate) and let the controller be the consumer. [1]
We track the progress of running plan and stop(gc) everything when the duration is reached. Currently we fetch all the running plans. Seems pretty difficult to move such logic into worker. [2]
This is for showing the engine metric usage in the executors side. Currently we fetch all the engines by
This is to clean Prom data. Easy to move. [1]
This is the GC process to clean idle engines. We use |
Before going into details, there are also some items needs to done.
|
A big one. Let me put them into smaller tasks:
Local development environments. p0
Controller should have a leader as only one process should do the GC/progress check related work. (Maybe we can move the logic into worker as well)? p1
Allow Shibuya to be deployed as central or distributed mode. optional
Current engine metric reading related logic should be extracted and built as a standalone container. This is essentially the worker. p0
Communication between the controller and the worker. p0
Collect the metrics read by worker. p0
Worker release steps
The text was updated successfully, but these errors were encountered: