Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2205 - Comp and Pen Batches #2206

Merged
merged 17 commits into from
Dec 27, 2024
Merged

#2205 - Comp and Pen Batches #2206

merged 17 commits into from
Dec 27, 2024

Conversation

k-macmillan
Copy link
Member

@k-macmillan k-macmillan commented Dec 26, 2024

Description

Test turned full implementation. Rather than process all records from the table with Celery Beat we instead run AWS Glue. Glue has been used as an ETL. It pulls all records from dynamodb, changes them into the required format, and then sends them to SQS in "batches" (homebrew batches, not sqs batches). This completed in under a minute on prod with 65,580 records, so DelaySeconds was added to slow it down.

Removed Comp and Pen feature flag. Removed notification_type from the bypass route because it was redundant.

issue #2205

How Has This Been Tested?

Deployed the notification-api and then ran the Glue script. It ingested all the records as expected and processed them correctly. The celery broker picked up the work and sent it to Celery tasks for processing.

The prod-bip-consumer-dead-letter-queue was purged and all prod records were sent there so they can be seen, though they contain an extra field that I have since deleted.
65,580 / 25 = 2623.2 (which would round to one more send_message call, or 2624):
image

It managed to process and enqueue all of those in under a minute:
image

Unit tests cover 100% of the new code:
image

All table entries being processed:
image

Checklist

  • I have assigned myself to this PR
  • PR has an appropriate title: #9999 - What the thing does
  • PR has a detailed description, including links to specific documentation
  • I have added the appropriate labels to the PR.
  • I did not remove any parts of the template, such as checkboxes even if they are not used
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to any documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works. Testing guidelines
  • I have ensured the latest main is merged into my branch and all checks are green prior to review
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • The ticket was moved into the DEV test column when I began testing this change

@k-macmillan k-macmillan self-assigned this Dec 26, 2024
@k-macmillan k-macmillan marked this pull request as ready for review December 26, 2024 22:01
@k-macmillan k-macmillan requested a review from a team as a code owner December 26, 2024 22:01
template,
sms_sender_id,
reply_to_text,
[DynamoRecord(**item) for item in records],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitty and no fix required.

Double iteration - We're iterating over all the records to convert them in a DynamoRecord object. Then in _send_comp_and_pen_sms we are iterating over thru all the records again. Could we do both these things in the same iteration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good callout. I had gone back and forth on creating the dataclass at each iteration vs in a single call. I opted to go this route for simplicity in the downstream method since the other methods acts as a "pre-processor".

Copy link

@MackHalliday MackHalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved: Reviewed new code and compared to old logic. Reviewed new testing..
Reviewed Glue script . @k-macmillan says he working with Corey to add script to vanotify-commons.

Left one comment about a double iteration which is not a big deal because our batch size is relatively small.

We may need follow up work to return the is_processed attribute from DynamoDB as we're no longer using that attribute to determine batches.


@notify_celery.task(name='comp-and-pen-batch-process')
@statsd(namespace='tasks')
def comp_and_pen_batch_process(records: list[dict[str, str]]) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we want to leave a comment about this being triggered by a Glue Script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do that with other code. I don't believe that's necessary here. I do need to update some of our other docs though.

@k-macmillan k-macmillan merged commit 0b61019 into main Dec 27, 2024
13 checks passed
@k-macmillan k-macmillan deleted the 2205-comp-and-pen-batches branch December 27, 2024 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants