Jap/simple forms/s3 spike #18338

pennja · 2024-09-05T21:16:58Z

Note: Delete the description statements, complete each step. None are optional, but can be justified as to why they cannot be completed as written. Provide known gaps to testing that may raise the risk of merging to production.

Summary

SubmissionArchiver
- Handles archiving a single submission’s PDFs and all related data/attachments/remediation docs within S3
SubmissionArchiveBuilder
- Builds the archive bundle, complete with all remediation documents
- If the submitted form/attachments are no longer present locally, they are built from saved form submission data
SubmissionArchiveHandler
- Takes in a collection of benefits_intake_uuids and calls SubmissionArchiver on each, wrapping each one in logging and error handling
SubmissionArchiveHandlerJob
- An asynchronous Sidekiq job which calls the handler. (Our primary process doesn’t call for this to happen, this is more for remediation)
ArchiveUploader
- A SharePoint client which will take a file path and upload the file found within up to SharePoint (largely unfinished until Tricia/Matthew have their OFO meeting)
SharePoint::Client
- Contains all the base logic to connect to and interact with SharePoint (hopefully)
ArchiveUploaderJob
- An asynchronous job which handles downloading the submission folder from S3, zipping it up, and calling the ArchiveUploader to drop it into SharePoint

Related issue(s)

Link to ticket created in va.gov-team repo OR screenshot of Jira ticket if your team uses Jira
Link to previous change of the code/bug (if applicable)
Link to epic if not included in ticket

Testing done

New code is covered by unit tests
Describe what the old behavior was prior to the change
Describe the steps required to verify your changes are working as expected. Exclusively stating 'Specs run' is NOT acceptable as appropriate testing
If this work is behind a flipper:
- Tests need to be written for both the flipper on and flipper off scenarios. Docs.
- What is the testing plan for rolling out the feature?

Screenshots

Note: Optional

What areas of the site does it impact?

(Describe what parts of the site are impacted andifcode touched other areas)

Acceptance criteria

I fixed|updated|added unit tests and integration tests for each feature (if applicable).
No error nor warning in the console.
Events are being sent to the appropriate logging solution
Documentation has been updated (link to documentation)
No sensitive information (i.e. PII/credentials/internal URLs/etc.) is captured in logging, hardcoded, or specs
Feature/bug has a monitor built into Datadog (if applicable)
If app impacted requires authentication, did you login to a local build and verify all authenticated routes work as expected
I added a screenshot of the developed feature

Requested Feedback

(OPTIONAL)What should the reviewers know in addition to the above. Is there anything specific you wish the reviewer to assist with. Do you have any concerns with this PR, why?

github-actions · 2024-09-05T21:55:41Z

	1 Error
🚫	This PR changes `632` LoC (not counting whitespace/newlines). In order to ensure each PR receives the proper attention it deserves, those exceeding `500` will not be reviewed, nor will they be allowed to merge. Please break this PR up into smaller ones. If you have reason to believe that this PR should be granted an exception, please see the Submitting pull requests for approval - FAQ. File Summary Files modules/simple_forms_api/app/controllers/simple_forms_api/v1/uploads_controller.rb (+8/-3) modules/simple_forms_api/app/services/simple_forms_api/s3/jobs/archive_uploader_job.rb (+35/-0) modules/simple_forms_api/app/services/simple_forms_api/s3/jobs/submission_archive_handler_job.rb (+27/-0) modules/simple_forms_api/app/services/simple_forms_api/s3/submission_archive_builder.rb (+125/-0) modules/simple_forms_api/app/services/simple_forms_api/s3/submission_archive_handler.rb (+45/-0) modules/simple_forms_api/app/services/simple_forms_api/s3/submission_archiver.rb (+92/-0) modules/simple_forms_api/app/services/simple_forms_api/s3/utils.rb (+29/-0) modules/simple_forms_api/app/services/simple_forms_api/share_point/archive_uploader.rb (+60/-0) modules/simple_forms_api/app/services/simple_forms_api/share_point/client.rb (+87/-0) modules/simple_forms_api/app/services/simple_forms_api/share_point/jobs/build_archive_and_upload_job.rb (+35/-0) modules/simple_forms_api/spec/services/s3/submission_archive_builder_spec.rb (+35/-0) modules/simple_forms_api/spec/services/s3/submission_archiver_spec.rb (+51/-0) Note: We exclude files matching the following when considering PR size: `.csv, .json, .tsv, .txt, .md, Gemfile.lock, app/swagger, modules/mobile/docs, spec/fixtures/, spec/support/vcr_cassettes/, modules/mobile/spec/support/vcr_cassettes/, db/seeds, modules/vaos/app/docs, modules/meb_api/app/docs, modules/appeals_api/app/swagger/, .bru, *.pdf` Big PRs are difficult to review, often become stale, and cause delays.

Generated by 🚫 Danger

kylesoskin · 2024-09-09T12:05:20Z

modules/simple_forms_api/app/services/simple_forms_api/s3_service/archive_submission_to_pdf.rb

+      end
+
+      def sign_s3_file_url(pdf)
+        signed_url = pdf.presigned_url(:get, expires_in: 1.year.to_i)


Using pre-signed urls for long durations, from what I think I have heard, is not recommended, for security reasons.

In fact, I think 7 days is the longest it supports, with a recommended length of 1min to 12hrs

Curious what you are doing with the pre-signed urls to see if there is something else that could be done instead of using pre-signed url.

Usually I think for the S3 objects we would store the path to the file(s) in the DB and assume whatever needs to use it is authenticated itself to pull it down.

Yeah I think the plan is to generate a signed URL that lasts for like 30 minutes so that the user can download their submitted PDF on the confirmation page. Then we can regenerate signed URLs in other locations as necessary. @kylesoskin Do you know what we could store in our database that lets us reference and generate signed URLs for the item in the S3 bucket later? Some sort of S3 ID? Or maybe a path?

Ahhh, yeah that all makes sense guys. I think I was misunderstanding how they work. I can do some more research and change the supporting logic accordingly.

@ndsprinkle I'll bet all we'd need is the path to the file in S3, I'll look into that.

@ndsprinkle I'll bet all we'd need is the path to the file in S3, I'll look into that.

Yea, same way your plopping stuff in there, using that same S3 obj you should be able to pull stuff down. There is examples in vets-api somewhere if you grep for the S3 class thing.

But... you gotta make sure that is super locked down if it is passing back to the user. If that is what youa re doing, providing back the doc to an end user, I almost like the signed url better.... That is scoped only to that 1 document. Allowing a user to use our backend to make calls to pull arbitrary files down from S3 makes me nervous that it could be exploited, IE user 1 changes some parameters and pulls down someone elses medical records. Since our backend can pull anything, presigned shortlived document scoped url MAY actually be best here... something to discuss for sure

Maybe some combo of both, us storing the s3 details, but then generating a presigned url whenever the user wants to download it, gets the best of both worlds? But if this is a one time only thing then what you have here seems good. If it were something that the user could come back to at any time, and you want to be able to provide it, then maybe store s3 path, and generate short lived url each time they want to might be better. That way you arent storing things that are going to expire, can still use short-lived urls, and we can auth the user and confirm ownership of the file before generating that presigned url. Idk.

edit: nvm I see this is what @ndsprinkle said in his initial comment

modules/simple_forms_api/app/services/simple_forms_api/s3_service/archive_submission_to_pdf.rb

kylesoskin · 2024-09-09T12:12:04Z

modules/simple_forms_api/app/services/simple_forms_api/s3_service/archive_submission_to_pdf.rb

+      end
+
+      def write_as_text_archive
+        save_file_to_s3("#{output_directory_path}/form_text_archive.txt", form_text_archive.to_json)


Previously I think when we did this we would pull everything down, and then zip it, and then only upload 1 zip file, with the name of the file being the va.gov internal submission ID, so that you had 1 thing to pass along and knew 1 file essentially represented 1 single claim. So less work to pass/move all around, and also less chance of only passing around partial amounts of claims and such. It also eliminated partial uploads, if the zip was there, it worked in its entirety, if the zip wasn't there, something didn't work. With this it is possible for some portions of a claim to upload, and someone think it is complete just looking from the S3 side, when in fact components of it may have failed. This is probably for @SamStuckey as I think he changed this from zip to not-zip.

Ahh, good callout. I'll look at the original implementation again and determine the best approach to zipping up the files.

Well, @SamStuckey moved away from that. There might have been a good reason, Im just not sure.

Do we want to zip anything? I thought we'd want a .pdf file so we can supply it directly to the end user. Or are we expecting ourselves (or the user) to unzip it upon download?

Hmm, it might depend on if there are attachments. If there are attachments, then zip but otherwise maybe just provide the pdf?

modules/simple_forms_api/app/services/simple_forms_api/s3_service/archive_submission_to_pdf.rb

modules/simple_forms_api/app/controllers/simple_forms_api/v1/uploads_controller.rb

kylesoskin · 2024-09-09T12:36:14Z

modules/simple_forms_api/app/services/simple_forms_api/s3_service/archive_submission_to_pdf.rb

+          quiet_pdf_failures: true, # skip PDF generation silently
+          quiet_upload_failures: true, # skip problematic uploads silently
+          run_quiet: true, # silence but record errors, logged at the end


With all the stuff over the last year with the code yellow and silent failures, I think VA people blow a gasket if they saw these lol, especially defaulted to true.

Do you think it'd be better just to get rid of the option to silence failures or just change the defaults? I could also add a "dry_run" option if that seems more appropriate.

Im not a huge fan of hard-coding stuff to support testing into where it could potentially get used in prod, but at the very least would default it false so that if anyone uses this without knowing about it, they are not by default in a state of silently failing.

If I needed to do things to test I would mock of the calls in the rspec, or just let it do the thing. There is mocked S3 returns in the mockdata repo that allow it to be coded like it was actually uploading to S3, and the testing framework/mock-data repo just intercepts the request and knows to return a thing that looks like an S3 thing.

For example:

vets-api/spec/uploaders/form1010cg/poa_uploader_spec.rb

Line 125 in 02ecc74

VCR.use_cassette("s3/object/put/#{form_attachment_guid}/doctors-note.jpg", vcr_options) do

...simple_forms_api/app/services/simple_forms_api/s3_service/user_submission_archive_handler.rb

Thrillberg · 2024-09-09T17:11:44Z

modules/simple_forms_api/app/services/simple_forms_api/s3_service/utils.rb

+      end
+
+      def s3_resource
+        @s3_resource ||= Reports::Uploader.new_s3_resource


@kylesoskin @pennja It looks like we're using Reports::Uploader here (which relies on Settings.reports.aws) rather than something that relies on Settings.form526_backup.aws or Settings.form526_export.aws. Do we think it matters which secrets we use? I guess we're using Reports::Uploader because that is what form526.rake uses? Is that right?

github-actions bot added require-backend-approval test-failure labels Sep 5, 2024

va-vfs-bot temporarily deployed to jap/simple-forms/s3-spike/main/main September 5, 2024 21:23 Inactive

pennja force-pushed the jap/simple-forms/s3-spike branch from 2f24200 to 40c1642 Compare September 5, 2024 21:54

github-actions bot added the lint-failure label Sep 5, 2024

va-vfs-bot temporarily deployed to jap/simple-forms/s3-spike/main/main September 5, 2024 21:56 Inactive

github-actions bot removed the lint-failure label Sep 5, 2024

va-vfs-bot temporarily deployed to jap/simple-forms/s3-spike/main/main September 5, 2024 23:39 Inactive

pennja force-pushed the jap/simple-forms/s3-spike branch from ef6ec33 to fa3ab67 Compare September 6, 2024 15:00

github-actions bot added the lint-failure label Sep 6, 2024

va-vfs-bot temporarily deployed to jap/simple-forms/s3-spike/main/main September 6, 2024 15:04 Inactive

va-vfs-bot temporarily deployed to jap/simple-forms/s3-spike/main/main September 6, 2024 15:24 Inactive

va-vfs-bot temporarily deployed to jap/simple-forms/s3-spike/main/main September 6, 2024 15:32 Inactive

github-actions bot added test-passing and removed test-failure test-passing labels Sep 6, 2024

va-vfs-bot temporarily deployed to jap/simple-forms/s3-spike/main/main September 6, 2024 16:33 Inactive

pennja force-pushed the jap/simple-forms/s3-spike branch from 983445a to 9543280 Compare September 6, 2024 21:17

va-vfs-bot temporarily deployed to jap/simple-forms/s3-spike/main/main September 6, 2024 21:26 Inactive

github-actions bot added the test-passing label Sep 6, 2024

va-vfs-bot temporarily deployed to jap/simple-forms/s3-spike/main/main September 6, 2024 21:37 Inactive

kylesoskin reviewed Sep 9, 2024

View reviewed changes

modules/simple_forms_api/app/services/simple_forms_api/s3_service/archive_submission_to_pdf.rb Outdated Show resolved Hide resolved

kylesoskin reviewed Sep 9, 2024

View reviewed changes

modules/simple_forms_api/app/services/simple_forms_api/s3_service/archive_submission_to_pdf.rb Outdated Show resolved Hide resolved

kylesoskin reviewed Sep 9, 2024

View reviewed changes

modules/simple_forms_api/app/controllers/simple_forms_api/v1/uploads_controller.rb Show resolved Hide resolved

kylesoskin reviewed Sep 9, 2024

View reviewed changes

...simple_forms_api/app/services/simple_forms_api/s3_service/user_submission_archive_handler.rb Outdated Show resolved Hide resolved

Thrillberg reviewed Sep 9, 2024

View reviewed changes

pennja added 26 commits September 12, 2024 14:15

more code consolidation

298792d

more changes to make VFF forms work

57b7f2f

remove verbose inheritance

621e210

updates in accordance with PR feedback

79f4015

more misc changes

fd3c324

many changes, stripping unnecessary logic

814f2e8

misc tweaks

4593583

rip off debt management center sharepoint service

3788cdf

first pass at code refinement and exploration

146ca82

one more comment

c2d5c24

lots of renaming and note taking

b8fe57f

more s3 changes

8206250

more renaming

f006b3c

manifest notes

eca30a8

archiver optimizations, make one s3 call instead of many

fff0df3

more cleanup

1e137b0

split up share point logic

195110f

add job to handle moving S3 stuff to SharePoint

6b8c713

pull out submission specific logic from archiver

df18c45

more sharepoint service changes

96059f6

updates in accordance with remediation documentation

cb3696c

minor sharepoint tweaks, archive builder tests

4ffc337

add job to circumvent S3 if necessary

d00cbcf

add missing library to utils

95455f7

update test coverage for submission archiver

c8fde24

fix random thing

929380e

pennja force-pushed the jap/simple-forms/s3-spike branch from 0804f91 to 929380e Compare September 12, 2024 18:15

va-vfs-bot deployed to jap/simple-forms/s3-spike/main/main September 12, 2024 18:19 View deployment

pennja closed this Sep 19, 2024

pennja deleted the jap/simple-forms/s3-spike branch September 19, 2024 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jap/simple forms/s3 spike #18338

Jap/simple forms/s3 spike #18338

pennja commented Sep 5, 2024 •

edited

Loading

github-actions bot commented Sep 5, 2024 •

edited

Loading

Files

kylesoskin Sep 9, 2024 •

edited

Loading

ndsprinkle Sep 9, 2024

pennja Sep 9, 2024

pennja Sep 9, 2024

kylesoskin Sep 9, 2024

kylesoskin Sep 9, 2024 •

edited

Loading

kylesoskin Sep 9, 2024 •

edited

Loading

kylesoskin Sep 9, 2024

pennja Sep 9, 2024

kylesoskin Sep 9, 2024

Thrillberg Sep 9, 2024

pennja Sep 10, 2024

kylesoskin Sep 9, 2024

pennja Sep 9, 2024

kylesoskin Sep 9, 2024

Thrillberg Sep 9, 2024

Jap/simple forms/s3 spike #18338

Jap/simple forms/s3 spike #18338

Conversation

pennja commented Sep 5, 2024 • edited Loading

Summary

Related issue(s)

Testing done

Screenshots

What areas of the site does it impact?

Acceptance criteria

Requested Feedback

github-actions bot commented Sep 5, 2024 • edited Loading

Files

kylesoskin Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kylesoskin Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

kylesoskin Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pennja commented Sep 5, 2024 •

edited

Loading

github-actions bot commented Sep 5, 2024 •

edited

Loading

kylesoskin Sep 9, 2024 •

edited

Loading

kylesoskin Sep 9, 2024 •

edited

Loading

kylesoskin Sep 9, 2024 •

edited

Loading