You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
or a test deployment you can start very small; even a single-node Kubernetes
deployment on one machine of the type m2.xlarge (8 CPU, 16 GB RAM) should be
more than enough.
However, if you are already thinking about creating a small multi-user
multi-node cluster that could easily grow with time, then I'd recommend to
start with at least 5 such nodes, using the following responsibilities:
- 1 node labelled reana.io/system=infrastructure that will run the REANA
frontend and backend infrastructure services;
- 1 node labelled reana.io/system=infrastructuredb that will run the PostgreSQL
DB service (unless you have some already-existing DB service running outside
of the cluster that could be reused without hosting DB yourself);
- 1 node labelled reana.io/system=infrastructuremq that will run the Rabbit MQ
service;
- 1 node labelled reana.io/system=runtimebatch that will run the user runtime
batch workflow orchestration pods (CWL/Serial/Snakemake/Yadage);
- 1 node labelled reana.io/system=runtimejobs that will run the user runtime
job pods (generated by those workflow orchestration pods).
With such a setup, you can keep 3 infrastructure nodes and scale the 2 runtime
nodes (1 batch, 1 jobs) to e.g. 50 runtime nodes (10 batch, 40 jobs) as your
needs grow:
For example, 1 runtime batch node can comfortably run 8 concurrent user
workflows at the full speed (since 1 node has 8 cores). So, if you need 80
users running at full speed, then 10 such runtime batch nodes may be wanted.
For example, if the nature of your physics workflows is usually such that 1
workflow typically generates 4 parallel n-tupling jobs, then you may want to
add 4x more runtime job nodes for one runtime batch node in the system, so that
everything can run optimally at sustainable full speed. (Provided memory is
enough; if not, then higher RAM machine types may be necessary.)
We have tried the above three-infrastructure-node setup for clusters of O(1k)
core size and everything was scaling very nicely.
If you'd like to aim even higher, say 5k cores, then using 8 CPU 16 GB nodes is
not optimal. We have made some tests and saw slowdowns and huge loads on the
Kubernetes master node in our scalability tests. Using larger machine type
flavours (32 CPU or more) would be preferable here. But I guess these types of
considerations can wait for now.
The text was updated successfully, but these errors were encountered:
Take notes below by @tiborsimko and use them to extend our "Deployment at scale" page.
The text was updated successfully, but these errors were encountered: