PDF-Text Blog: Supplementary Material

This repository contains supplementary material for my five-part Blog series on understanding the representation of text in PDF:

What's Here

These posts use sample PDF files. Here you can find an actual PDF file along with a text file that, when opened in a text editor, looks like the PDF file's code. If you download the original PDF and open it in a text editor that gracefully handles binary data (like emacs), you can ignore the txt version. GitHub refuses to display a binary file, so the text files are copies of the PDF files with binary streams redacted and other characters encoded in UTF-8 for display. The posts link directly to lines in the txt file, but in all cases within the blog, the line numbers match up.

For Part 2:

basic.pdf -- A PDF file with simple text using built-in fonts
basic.pdf.txt -- A viewable text version

For Parts 3 through 5:

advanced.pdf -- A PDF file with non-Latin characters, emoji, and other features
advanced.pdf.txt -- A viewable text version

If you want to follow along with the blog posts or see PDF fragments in context, you can either download the PDF and open it in a text/binary editor, or you can follow along with the text file right from GitHub. The blogs are self-contained and included referenced fragments embedded as GitHub gists.

External References

Blog post: The Structure of a PDF File
My earlier post: Examining a PDF File with qpdf
From the PDF Association: PDF Operators Cheat Sheet
The Wikipedia article on Unicode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-Text Blog: Supplementary Material

What's Here

External References

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
advanced.pdf		advanced.pdf
advanced.pdf.txt		advanced.pdf.txt
basic.pdf		basic.pdf
basic.pdf.txt		basic.pdf.txt

jberkenbilt/pdf-text-blog

Folders and files

Latest commit

History

Repository files navigation

PDF-Text Blog: Supplementary Material

What's Here

External References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages