Improve app deployment time by starting time-consuming process first #644

leochr · 2024-10-24T21:19:39Z

Improve the initial reconcile time by kicking off steps that take time (certificate requests, Semeru cloud compiler setup, LTPA Job) all at once and then check the status towards the end of the reconciliation (as opposed to checking after each process). This should result in application becoming ready sooner with fewer reconciles

leochr · 2024-10-29T19:03:37Z

Meeting Tue, October 29, 2024:

LTPA cache (in-memory as opposed to reading yaml) and concurrent reconcile help to improve the time it takes for new instances to become ready.
Next steps:
Start the service certificate request and the Job to generate LTPA key in parallel as they take long time
Measure the time it takes for new instances (i.e. 5) to become ready when the cluster already wlapp instances (i.e 50)

kabicin · 2024-11-12T16:12:42Z

On average, LTPA Caching does improve performance by a few seconds when testing 25 existing + 1 new instance, so it is good to include.

There is no performance improvement when concurrently running time-consuming processes first with a high number of instances 100 existing + 5 new. I suppose this is because the time to wrap around the 100 instances to check each reconcile ready condition exceeds the time it takes to wait for the 5 new instances' resources. It is still possible that running time-consuming processes first can help, just not under busy conditions reconciling many ready instances.

Max Concurrent Reconciles	Svc Certificate + LTPA Job Ran Concurrently First	Time until 100+5 instances are up
1	Yes	4m8s
1	No	4m8s
4	Yes	4m7s
4	No	4m7s
8	Yes	4m8s
8	No	4m8s
16	Yes	4m52s
16	No	4m5s

In another test, I cache within operator memory using the following heuristic:

(reconciled CR instances) AND (CR instances dependent on erroring CR instances) yield to (erroring CR instances) if there exists one or more erroring CR instances.

This technique decreases wrap around time for the 100, but it does not show a clear performance improvement when concurrently running time-consuming processes first. The numbers are likely impacted by when the k8s Job scheduler decides to pick up the LTPA Job.

Max Concurrent Reconciles	Svc Certificate + LTPA Job Ran Concurrently First	Time until 5 instances are up
1	Yes	70s
1	No	54s
8	Yes	62s
8	No	80s

Next steps: The above test only creates 1 LTPA key/config Job pair, so I need to run more tests such as across multiple namespaces each requesting LTPA to get a better idea of if concurrent LTPA Job prioritization helps.

leochr added the zenhub-dev label Oct 24, 2024

leochr assigned kabicin Oct 29, 2024

leochr added vNext and removed vNext labels Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve app deployment time by starting time-consuming process first #644

Improve app deployment time by starting time-consuming process first #644

leochr commented Oct 24, 2024

leochr commented Oct 29, 2024

kabicin commented Nov 12, 2024 •

edited

Loading

Improve app deployment time by starting time-consuming process first #644

Improve app deployment time by starting time-consuming process first #644

Comments

leochr commented Oct 24, 2024

leochr commented Oct 29, 2024

kabicin commented Nov 12, 2024 • edited Loading

kabicin commented Nov 12, 2024 •

edited

Loading