Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create SQS wrapper and services for polling messages #5

Open
jcadam14 opened this issue Dec 16, 2024 · 4 comments · May be fixed by #6
Open

Create SQS wrapper and services for polling messages #5

jcadam14 opened this issue Dec 16, 2024 · 4 comments · May be fixed by #6
Assignees

Comments

@jcadam14
Copy link
Collaborator

Create the SQS version of the lambda work Le did in #4

@jcadam14 jcadam14 self-assigned this Dec 16, 2024
@jcadam14
Copy link
Collaborator Author

jcadam14 commented Dec 19, 2024

Things I had to do:

  • switch service account cfpb-ci-sa-sqs role cfpb-dev-regtech-jadam-devpub-sqs-access policy to the v1.28 identity
  • Create a Role and RoleBinding in the cluster to allow for job creation via the Kubernetes API
  • Created 2 new queues cfpb-regtech-dev-res-aggregate and cfpb-regtech-dev-pqs-validate
  • Created 2 new S3 events on the upload/2024/1234364890REGTECH002/ bucket to watch for .done_pqs and .done_res files, sends to respective queues
  • Reorganized the repo since there are some common code used by the lambda and sqs. Reorganize again as desired if names, structure, whatever isn’t liked by anyone but me (it’s not the best but seems a good start?)
  • Flow is as follows:
    • .csv file drops into S3 location upload/2024/1234364890REGTECH002
    • message gets put on cfpb-regtech-dev-s3-queue-test queue (I didn’t update this name, already existed from previous sqs testing. I know, lazy)
    • sqs_listenere.py in sqs_csv_to_parquet pulls down message and calls csv_to_parquet code
    • csv_to_parquet code writes out parquets to _pqs folder and when successfully completed, creates a .done_pqs file
    • S3 event watching for .done_pqs creates a message in the cfpb-regtech-dev-pqs-validate queue
    • sqs_listenere.py in sqs_parquet_validation pulls message and creates a validation job in cluster
    • Validation job runs image with validation_job.py which scans in each parquet in the _pqs folder and validates
    • Job writes out _res folder with validation dataframe parquet files
    • If successful, writes out a .done_res file
    • S3 event watching for .done_res creates a message in the cfpb-regtech-dev-res-aggregate queue
    • sqs_listener.py in sqs_validation_aggregator pulls message from queue and calls code in results_aggregator
    • Loops over the parquet files in the _res folder, concats the lazy frames and builds the report csv and json, stores in submission object

Going to draw this up in a diagram, also, since I think that might be easier to visualize.

Wiki pages: https://github.com/cfpb/sbl-filing-api-validations/wiki

@jcadam14
Copy link
Collaborator Author

Extra stuff done to aggregator:

  • Fixed aggregator code that would fail on validations that resulted in no errors or warnings
  • Fixed aggregator code that was getting error/warning counts after initial max_error truncation (we want the actual original value of total errors/warnings before truncation)
  • Fixed the max_errors + 1 thing, the + 1 isn't necessary (see comments in code)
  • Simplified that code a little

@jcadam14
Copy link
Collaborator Author

The wiki here documents which container is running what code. The sequence diagram explains the interaction of all that. Also, the do_sqs.sh is a good place to see what is getting built where and with what image. And the files in the helm directory in the PR (eventually moved to the regtech-deployments-internal repo once we decide on a direction) are also good to see what image/container is talking to what.

@jcadam14
Copy link
Collaborator Author

jcadam14 commented Dec 20, 2024

Finally, everything is deployed and working in devpub under the release sqs-test. The sqs-poller containers I deployed individually with the helm values and the standard charts in regtech-deployment. I setup the cfpb-devpub-regtech-sbl-filing-main bucket at the upload level so any LEI will trigger the sqs stuff. And updated the configmap in the filing-api sqs-test release to use that bucket instead.

Can access it via https://sqs-test-sbl-frontend-eks.dev-public.aws.cfpb.gov/

@jcadam14 jcadam14 linked a pull request Dec 20, 2024 that will close this issue
@jcadam14 jcadam14 linked a pull request Dec 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant