Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LnD: investigation into restart submission report generation #7551

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

colinbruce
Copy link
Contributor

@colinbruce colinbruce commented Dec 20, 2024

What

When CCMS submissions were restarted on the morning of 20 Dec the submissions restarted but the reports did not generate as expected.

This PR identified an issue where CCMS submissions have been paused during an evening downtime. Previously we had only seen 2 or 3 submissions maximum.

On this occasion there were 10 applications to restart. This turned out to be too many!

I recreated the situation, 10 paused applications, and turned the submissions back on. The first 5 (each sidekiq worker has 5 threads) requested a CCMS case reference and then tried to generate reports at the same time... the worker container ran out of memory and crashed. When a new pod was spawned, the next 5 started. This left all 10 applications trying to build reports and failing.

Working as a team we identified that Sidekiq Capsules should work for us, I restored a backup of the test branch with 10 paused applications and created a new report_creator queue with two threads. This allowed two applications to build reports simultaneously and successfully cleared the paused submissions

Checklist

Before you ask people to review this PR:

  • Tests and rubocop should be passing: bundle exec rake
  • Github should not be reporting conflicts; you should have recently run git rebase main.
  • The standards in the Git Workflow document on Confluence should be followed
  • There should be no unnecessary whitespace changes. These make diffs harder to read and conflicts more likely.
  • The PR description should say what you changed and why, with a link to the JIRA story.
  • You should have looked at the diff against main and ensured that nothing unexpected is included in your changes.
  • You should have checked that the commit messages say why the change was made.

@colinbruce colinbruce force-pushed the lnd/restart-submissions-test branch 3 times, most recently from ba220af to 26e98e2 Compare December 30, 2024 12:21
@colinbruce colinbruce added the ready for review Please review label Dec 30, 2024
@colinbruce colinbruce marked this pull request as ready for review December 30, 2024 14:39
@colinbruce colinbruce requested a review from a team as a code owner December 30, 2024 14:39
This allows us to check flow using open_search if an error
occurs during state transitions.  Rather than taking up DB
space we can check logs and record output if needed
This sets up a new capusle with a concurrency of 2,
this allows two applications to have their reports
created at the same time.  The current handling allows
5 and we saw crashes after a pause in CCMS submissions
@colinbruce colinbruce force-pushed the lnd/restart-submissions-test branch 3 times, most recently from 4019fa1 to fdda8fe Compare January 15, 2025 13:06
@colinbruce colinbruce force-pushed the lnd/restart-submissions-test branch from fdda8fe to 26916a2 Compare January 15, 2025 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for review Please review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants