Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange MediaBox and CropBox of PDF page #59

Closed
2 tasks done
pedromdev opened this issue Aug 2, 2024 · 7 comments
Closed
2 tasks done

Strange MediaBox and CropBox of PDF page #59

pedromdev opened this issue Aug 2, 2024 · 7 comments
Labels
bug Something isn't working needs-triage

Comments

@pedromdev
Copy link

What were you trying to do?

I'm trying to get the page boxes information to centralize an image.

How did you attempt to do it?

I tried to get information from getMediaBox() or getCropBox() methods from PDFPage object to calculate the page center position.

What actually happened?

I got strange informations about the page boxes. In some cases, height and width of a page is a negative value.

What did you expect to happen?

Get the correct information about the page boxes.

How can we reproduce the issue?

I added a comment in an old issue about the MediaBox and CropBox. I added the PDF that I tested and a piece of code.

I have a PDF whose first page has different box information than the pages. However, when I retrieve this information using pdfinfo, I get information that differs from the information that pdf-lib gives me.

image image image image

I drew some circles using the CropBox information and this is how the 2 tests turned out. The first printout was using the pdfinfo information. The second printout was using the information that pdf-lib gives me through the getCropBox() method.

image image image

How is this MediaBox and CropBox information obtained in pdf-lib?

The example PDF is below:

input2.pdf

Note: I understood later that height and width are used to calculate xEnd and yEnd of PDF page, but even if I calculate the end point I don't get the same information.

Version

2.2.0

What environment are you running pdf-lib in?

Node

Checklist

  • My report includes a Short, Self Contained, Correct (Compilable) Example.
  • I have attached all PDFs, images, and other files needed to run my SSCCE.

Additional Notes

I did test in both versions and I got the same results:

  • pdf-lib: 1.17.1
  • @cantoo/pdf-lib: 2.2.0
@pedromdev pedromdev added bug Something isn't working needs-triage labels Aug 2, 2024
@Sharcoux
Copy link
Collaborator

Sharcoux commented Aug 5, 2024

The pdf seems malformed, but we can update pdf-lib to handle this malformation. How did this pdf get generated?

@Sharcoux
Copy link
Collaborator

Sharcoux commented Aug 5, 2024

Solved in @cantoo/pdf-lib: 2.2.0

@Sharcoux
Copy link
Collaborator

Sharcoux commented Aug 5, 2024

I would still like to know how the pdf has been generated, though.

@pedromdev
Copy link
Author

Hi @Sharcoux.

The PDF I added here is a partial PDF that I created from from another just for reproduce the behavior. According to pdfinfo, the original PDF was created in Adobe InDesign CS6.

image

@Sharcoux
Copy link
Collaborator

Sharcoux commented Aug 5, 2024

According to the specs,

The MediaBox is defined as an array of four numbers, typically in the format [llx lly urx ury], where:

    llx: The lower-left x-coordinate.
    lly: The lower-left y-coordinate.
    urx: The upper-right x-coordinate.
    ury: The upper-right y-coordinate.

In your provided pdf, the mediabox inverted the 2 y coordinates, leading to the wrong result. So, I don't know who is the culprit during the file generation, but the file is definitely malformed.

@RippleRurigaki
Copy link

I have seen this issue and have looked into it.

PDF specs,
Looking, 7.7.3.3 Page Object

MediaBox type is "rectangle"

Looking, 7.9.5 Rectangles

Rectangles are used to describe locations on a page and bounding boxes for a variety of objects.
A rectangleshall be written as an array of four numbers giving the coordinates of a pair of diagonally opposite corners.

NOTE Although rectangles are conventionally specified by their lower-left and upper-right corners,
it is acceptable to specify any two diagonally opposite corners.

I understand that it does not have to be [lower left, upper right], although that is not common.

I thought it would be easy to modify the values obtained,
but I am concerned that it will not affect the other placement coordinates.

I have noticed this but have not been able to confirm it yet.
At this time I do not have the time.

@Sharcoux
Copy link
Collaborator

Sharcoux commented Aug 6, 2024

Ok. Well, anyway, from version 2.2.1, both will be supported, so I think I'll just close this. Thanks for the clarification.

@Sharcoux Sharcoux closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage
Projects
None yet
Development

No branches or pull requests

3 participants