-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jap/simple forms/s3 spike #18338
Jap/simple forms/s3 spike #18338
Conversation
2f24200
to
40c1642
Compare
Generated by 🚫 Danger |
ef6ec33
to
fa3ab67
Compare
983445a
to
9543280
Compare
end | ||
|
||
def sign_s3_file_url(pdf) | ||
signed_url = pdf.presigned_url(:get, expires_in: 1.year.to_i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using pre-signed urls for long durations, from what I think I have heard, is not recommended, for security reasons.
In fact, I think 7 days is the longest it supports, with a recommended length of 1min to 12hrs
Curious what you are doing with the pre-signed urls to see if there is something else that could be done instead of using pre-signed url.
Usually I think for the S3 objects we would store the path to the file(s) in the DB and assume whatever needs to use it is authenticated itself to pull it down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the plan is to generate a signed URL that lasts for like 30 minutes so that the user can download their submitted PDF on the confirmation page. Then we can regenerate signed URLs in other locations as necessary. @kylesoskin Do you know what we could store in our database that lets us reference and generate signed URLs for the item in the S3 bucket later? Some sort of S3 ID? Or maybe a path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, yeah that all makes sense guys. I think I was misunderstanding how they work. I can do some more research and change the supporting logic accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ndsprinkle I'll bet all we'd need is the path to the file in S3, I'll look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ndsprinkle I'll bet all we'd need is the path to the file in S3, I'll look into that.
Yea, same way your plopping stuff in there, using that same S3 obj you should be able to pull stuff down. There is examples in vets-api somewhere if you grep for the S3 class thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But... you gotta make sure that is super locked down if it is passing back to the user. If that is what youa re doing, providing back the doc to an end user, I almost like the signed url better.... That is scoped only to that 1 document. Allowing a user to use our backend to make calls to pull arbitrary files down from S3 makes me nervous that it could be exploited, IE user 1 changes some parameters and pulls down someone elses medical records. Since our backend can pull anything, presigned shortlived document scoped url MAY actually be best here... something to discuss for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some combo of both, us storing the s3 details, but then generating a presigned url whenever the user wants to download it, gets the best of both worlds? But if this is a one time only thing then what you have here seems good. If it were something that the user could come back to at any time, and you want to be able to provide it, then maybe store s3 path, and generate short lived url each time they want to might be better. That way you arent storing things that are going to expire, can still use short-lived urls, and we can auth the user and confirm ownership of the file before generating that presigned url. Idk.
edit: nvm I see this is what @ndsprinkle said in his initial comment
modules/simple_forms_api/app/services/simple_forms_api/s3_service/archive_submission_to_pdf.rb
Outdated
Show resolved
Hide resolved
end | ||
|
||
def write_as_text_archive | ||
save_file_to_s3("#{output_directory_path}/form_text_archive.txt", form_text_archive.to_json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously I think when we did this we would pull everything down, and then zip it, and then only upload 1 zip file, with the name of the file being the va.gov internal submission ID, so that you had 1 thing to pass along and knew 1 file essentially represented 1 single claim. So less work to pass/move all around, and also less chance of only passing around partial amounts of claims and such. It also eliminated partial uploads, if the zip was there, it worked in its entirety, if the zip wasn't there, something didn't work. With this it is possible for some portions of a claim to upload, and someone think it is complete just looking from the S3 side, when in fact components of it may have failed. This is probably for @SamStuckey as I think he changed this from zip to not-zip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, good callout. I'll look at the original implementation again and determine the best approach to zipping up the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, @SamStuckey moved away from that. There might have been a good reason, Im just not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to zip anything? I thought we'd want a .pdf
file so we can supply it directly to the end user. Or are we expecting ourselves (or the user) to unzip it upon download?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it might depend on if there are attachments. If there are attachments, then zip but otherwise maybe just provide the pdf?
modules/simple_forms_api/app/services/simple_forms_api/s3_service/archive_submission_to_pdf.rb
Outdated
Show resolved
Hide resolved
modules/simple_forms_api/app/controllers/simple_forms_api/v1/uploads_controller.rb
Show resolved
Hide resolved
quiet_pdf_failures: true, # skip PDF generation silently | ||
quiet_upload_failures: true, # skip problematic uploads silently | ||
run_quiet: true, # silence but record errors, logged at the end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all the stuff over the last year with the code yellow and silent failures, I think VA people blow a gasket if they saw these lol, especially defaulted to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it'd be better just to get rid of the option to silence failures or just change the defaults? I could also add a "dry_run" option if that seems more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not a huge fan of hard-coding stuff to support testing into where it could potentially get used in prod, but at the very least would default it false so that if anyone uses this without knowing about it, they are not by default in a state of silently failing.
If I needed to do things to test I would mock of the calls in the rspec, or just let it do the thing. There is mocked S3 returns in the mockdata repo that allow it to be coded like it was actually uploading to S3, and the testing framework/mock-data repo just intercepts the request and knows to return a thing that looks like an S3 thing.
For example:
VCR.use_cassette("s3/object/put/#{form_attachment_guid}/doctors-note.jpg", vcr_options) do |
...simple_forms_api/app/services/simple_forms_api/s3_service/user_submission_archive_handler.rb
Outdated
Show resolved
Hide resolved
end | ||
|
||
def s3_resource | ||
@s3_resource ||= Reports::Uploader.new_s3_resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kylesoskin @pennja It looks like we're using Reports::Uploader
here (which relies on Settings.reports.aws
) rather than something that relies on Settings.form526_backup.aws
or Settings.form526_export.aws
. Do we think it matters which secrets we use? I guess we're using Reports::Uploader
because that is what form526.rake
uses? Is that right?
0804f91
to
929380e
Compare
Note: Delete the description statements, complete each step. None are optional, but can be justified as to why they cannot be completed as written. Provide known gaps to testing that may raise the risk of merging to production.
Summary
Related issue(s)
Testing done
Screenshots
Note: Optional
What areas of the site does it impact?
(Describe what parts of the site are impacted andifcode touched other areas)
Acceptance criteria
Requested Feedback
(OPTIONAL)What should the reviewers know in addition to the above. Is there anything specific you wish the reviewer to assist with. Do you have any concerns with this PR, why?