Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add getAttachments to PDFDocument #80

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sverch
Copy link

@sverch sverch commented Nov 28, 2024

What?

I got some requests to move this PR from Hopding#1242, since that one isn't being maintained currently.

This adds the getAttachments method to get all the attachments from a pdf document.

Why?

This is basically implementing the suggestion in Hopding#534 (comment), to allow for getting attachments.

How?

It's been a while since I wrote this, but I remember pretty much "following the rails" and filling out the implementation based on what was in the PDF and what the code around it did/expected.

Testing?

There's some interesting behavior here, as the attachments don't show up until after save is called. That's tested in the test and described in the README, just so it's clear that's the behavior.

I don't know how to do all that great jsfiddle setup, so for the README I just copied the basic example from that issue for now.

New Dependencies?

Screenshots

Suggested Reading?

It appears that the link is broken now unfortunately...

Anything Else?

Checklist

  • I read CONTRIBUTING.md.
  • I read MAINTAINERSHIP.md#pull-requests.
  • I added/updated unit tests for my changes.
  • I added/updated integration tests for my changes.
    • I haven't, but I see "They also require manual inspection of the PDFs they create to make sure nothing is broken", which makes me wonder about adding something there for this change which is about reading files rather than writing them.
  • I ran the integration tests.
  • I tested my changes in Node, Deno, and the browser.
    • See above about this being read only.
  • I viewed documents produced with my changes in Adobe Acrobat, Foxit Reader, Firefox, and Chrome.
    • See above about this being read only.
  • I added/updated doc comments for any new/modified public APIs.
    • I added a bit of documentation, open to suggestions if there's more that someone would want to know though.
  • My changes work for both new and existing PDF files.
    • This is where the weird save behavior comes in, so I'm not sure.
  • I ran the linter on my changes.

This adds the getAttachments method to get all the attachments from a
pdf document.

This is basically implementing the suggestion in
Hopding#534 (comment)

There's some interesting behavior here, as the attachments don't show up
until after save is called. That's tested in the test and described in
the README, just so it's clear that's the behavior.

I don't know how to do all that great jsfiddle setup, so for the README
Ijust copied the basic example from that issue for now.
@thecodingshrimp
Copy link

🥳

@Sharcoux
Copy link
Collaborator

Thanks. We'll study this and try to merge it.

fs.writeFileSync(csv.name, csv.data)
```

> NOTE: If you are building a pdf file with this library, any attachments you've
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason why not? This might be confusing for users...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the attachments being stored before they are serialized? We can probably include that location when returning the attachments, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be better. It's been a while since I've looked at this code, but I vaguely remember that being difficult. I see a comment on PDFEmbeddedFile.embed that says something to the effect of it not being embedded until save is called. I also don't see how to get this information out of PDFEmbeddedFile.

I won't be able to spend a lot of time on this soon, but I'm planning to come back at some point and give it another shot to see if what you're saying is possible.

const rawAttachments = this.getRawAttachments();
return rawAttachments.map(({ fileName, fileSpec }) => {
const stream = fileSpec
.lookup(PDFName.of('EF'), PDFDict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we make sure "EF" and "F" key exist?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. These lookup and of functions don't ever return undefined in their signatures, and I'm not sure what the default value is if they don't find anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants