Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve app deployment time by starting time-consuming process first #644

Open
leochr opened this issue Oct 24, 2024 · 2 comments
Open

Improve app deployment time by starting time-consuming process first #644

leochr opened this issue Oct 24, 2024 · 2 comments
Assignees

Comments

@leochr
Copy link
Member

leochr commented Oct 24, 2024

Improve the initial reconcile time by kicking off steps that take time (certificate requests, Semeru cloud compiler setup, LTPA Job) all at once and then check the status towards the end of the reconciliation (as opposed to checking after each process). This should result in application becoming ready sooner with fewer reconciles

@leochr
Copy link
Member Author

leochr commented Oct 29, 2024

Meeting Tue, October 29, 2024:

  • LTPA cache (in-memory as opposed to reading yaml) and concurrent reconcile help to improve the time it takes for new instances to become ready.
    Next steps:
  • Start the service certificate request and the Job to generate LTPA key in parallel as they take long time
  • Measure the time it takes for new instances (i.e. 5) to become ready when the cluster already wlapp instances (i.e 50)

@kabicin
Copy link
Member

kabicin commented Nov 12, 2024

On average, LTPA Caching does improve performance by a few seconds when testing 25 existing + 1 new instance, so it is good to include.

There is no performance improvement when concurrently running time-consuming processes first with a high number of instances 100 existing + 5 new. I suppose this is because the time to wrap around the 100 instances to check each reconcile ready condition exceeds the time it takes to wait for the 5 new instances' resources. It is still possible that running time-consuming processes first can help, just not under busy conditions reconciling many ready instances.

Max Concurrent Reconciles Svc Certificate + LTPA Job Ran Concurrently First Time until 100+5 instances are up
1 Yes 4m8s
1 No 4m8s
4 Yes 4m7s
4 No 4m7s
8 Yes 4m8s
8 No 4m8s
16 Yes 4m52s
16 No 4m5s

In another test, I cache within operator memory using the following heuristic:

  • (reconciled CR instances) AND (CR instances dependent on erroring CR instances) yield to (erroring CR instances) if there exists one or more erroring CR instances.

This technique decreases wrap around time for the 100, but it does not show a clear performance improvement when concurrently running time-consuming processes first. The numbers are likely impacted by when the k8s Job scheduler decides to pick up the LTPA Job.

Max Concurrent Reconciles Svc Certificate + LTPA Job Ran Concurrently First Time until 5 instances are up
1 Yes 70s
1 No 54s
8 Yes 62s
8 No 80s

Next steps: The above test only creates 1 LTPA key/config Job pair, so I need to run more tests such as across multiple namespaces each requesting LTPA to get a better idea of if concurrent LTPA Job prioritization helps.

@leochr leochr added vNext and removed vNext labels Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants