Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding the Font, Widths and FontDescriptor #7

Open
caspervanpomeren opened this issue Sep 20, 2019 · 4 comments
Open

Question regarding the Font, Widths and FontDescriptor #7

caspervanpomeren opened this issue Sep 20, 2019 · 4 comments

Comments

@caspervanpomeren
Copy link

I am (still) working through the PDF spec (ISO 32000-2) and these examples are one of the few I could find that actually try to explain/showcase the new PDF 2.0 spec. So once again, thank you very much for these examples.

The topic I am currenty looking into is fonts and I noticed some things in the pdf20examples. For example if we look at "Simple PDF 2.0 file.pdf" it contains the following code:

5 0 obj
  <</Type /Font
    /Subtype /Type1
    /BaseFont /Helvetica
    /FirstChar 33
    /LastChar 126
    /Widths 6 0 R
    /FontDescriptor 7 0 R
  >>
endobj

6 0 obj
[ 278 355 556 556 889 667 222 333 333 389 584 278 333 278 278 556
  556 556 556 556 556 556 556 556 556 278 278 584 584 584 556 1015
  667 667 722 722 667 611 778 722 278 500 667 556 833 722 778 667
  778 722 667 611 722 667 944 667 667 611 278 278 278 469 556 222
  556 556 500 556 556 278 556 556 222 222 500 222 833 556 556 556
  556 333 500 278 556 500 722 500 500 500 334 260 334 584 ]
endobj

% This FontDescriptor contains only the required entries for PDF 2.0
% for unembedded standard 14 fonts that contain Latin characters
7 0 obj
  <</Type /FontDescriptor
    /FontName /Helvetica
    /Flags 32
    /FontBBox [ -166 -225 1000 931 ]
    /ItalicAngle 0
    /Ascent 718
    /Descent -207
    /CapHeight 718
    /StemV 88
    /MissingWidth 0  
  >>
endobj

My questions:

  1. Why is /FirstChar 33 and not 32? If I look at the PDF core font information of Helvetica (found here: link), I see these CharMetrics:
StartCharMetrics 315
C 32 ; WX 278 ; N space ; B 0 0 0 0 ;
C 33 ; WX 278 ; N exclam ; B 90 0 187 718 ;
C 34 ; WX 355 ; N quotedbl ; B 70 463 285 718 ;
C 35 ; WX 556 ; N numbersign ; B 28 0 529 688 ;
C 36 ; WX 556 ; N dollar ; B 32 -115 520 775 ;
C 37 ; WX 889 ; N percent ; B 39 -19 850 703 ;
C 38 ; WX 667 ; N ampersand ; B 44 -15 645 718 ;
C 39 ; WX 222 ; N quoteright ; B 53 463 157 718 ;
C 40 ; WX 333 ; N parenleft ; B 68 -207 299 733 ;
C 41 ; WX 333 ; N parenright ; B 34 -207 265 733 ;
C 42 ; WX 389 ; N asterisk ; B 39 431 349 718 ;
C 43 ; WX 584 ; N plus ; B 39 0 545 505 ;
C 44 ; WX 278 ; N comma ; B 87 -147 191 106 ;
C 45 ; WX 333 ; N hyphen ; B 44 232 289 322 ;
C 46 ; WX 278 ; N period ; B 87 0 191 106 ;
C 47 ; WX 278 ; N slash ; B -17 -19 295 737 ;
C 48 ; WX 556 ; N zero ; B 37 -19 519 703 ;
C 49 ; WX 556 ; N one ; B 101 0 359 703 ;
C 50 ; WX 556 ; N two ; B 26 0 507 703 ;
C 51 ; WX 556 ; N three ; B 34 -19 522 703 ;
C 52 ; WX 556 ; N four ; B 25 0 523 703 ;
C 53 ; WX 556 ; N five ; B 32 -19 514 688 ;
C 54 ; WX 556 ; N six ; B 38 -19 518 703 ;
C 55 ; WX 556 ; N seven ; B 37 0 523 688 ;
C 56 ; WX 556 ; N eight ; B 38 -19 517 703 ;
C 57 ; WX 556 ; N nine ; B 42 -19 514 703 ;
C 58 ; WX 278 ; N colon ; B 87 0 191 516 ;
C 59 ; WX 278 ; N semicolon ; B 87 -147 191 516 ;
C 60 ; WX 584 ; N less ; B 48 11 536 495 ;
C 61 ; WX 584 ; N equal ; B 39 115 545 390 ;
C 62 ; WX 584 ; N greater ; B 48 11 536 495 ;
C 63 ; WX 556 ; N question ; B 56 0 492 727 ;
C 64 ; WX 1015 ; N at ; B 147 -19 868 737 ;
C 65 ; WX 667 ; N A ; B 14 0 654 718 ;
C 66 ; WX 667 ; N B ; B 74 0 627 718 ;
C 67 ; WX 722 ; N C ; B 44 -19 681 737 ;
C 68 ; WX 722 ; N D ; B 81 0 674 718 ;
C 69 ; WX 667 ; N E ; B 86 0 616 718 ;
C 70 ; WX 611 ; N F ; B 86 0 583 718 ;
C 71 ; WX 778 ; N G ; B 48 -19 704 737 ;
C 72 ; WX 722 ; N H ; B 77 0 646 718 ;
C 73 ; WX 278 ; N I ; B 91 0 188 718 ;
C 74 ; WX 500 ; N J ; B 17 -19 428 718 ;
C 75 ; WX 667 ; N K ; B 76 0 663 718 ;
C 76 ; WX 556 ; N L ; B 76 0 537 718 ;
C 77 ; WX 833 ; N M ; B 73 0 761 718 ;
C 78 ; WX 722 ; N N ; B 76 0 646 718 ;
C 79 ; WX 778 ; N O ; B 39 -19 739 737 ;
C 80 ; WX 667 ; N P ; B 86 0 622 718 ;
C 81 ; WX 778 ; N Q ; B 39 -56 739 737 ;
C 82 ; WX 722 ; N R ; B 88 0 684 718 ;
C 83 ; WX 667 ; N S ; B 49 -19 620 737 ;
C 84 ; WX 611 ; N T ; B 14 0 597 718 ;
C 85 ; WX 722 ; N U ; B 79 -19 644 718 ;
C 86 ; WX 667 ; N V ; B 20 0 647 718 ;
C 87 ; WX 944 ; N W ; B 16 0 928 718 ;
C 88 ; WX 667 ; N X ; B 19 0 648 718 ;
C 89 ; WX 667 ; N Y ; B 14 0 653 718 ;
C 90 ; WX 611 ; N Z ; B 23 0 588 718 ;
C 91 ; WX 278 ; N bracketleft ; B 63 -196 250 722 ;
C 92 ; WX 278 ; N backslash ; B -17 -19 295 737 ;
C 93 ; WX 278 ; N bracketright ; B 28 -196 215 722 ;
C 94 ; WX 469 ; N asciicircum ; B -14 264 483 688 ;
C 95 ; WX 556 ; N underscore ; B 0 -125 556 -75 ;
C 96 ; WX 222 ; N quoteleft ; B 65 470 169 725 ;
C 97 ; WX 556 ; N a ; B 36 -15 530 538 ;
C 98 ; WX 556 ; N b ; B 58 -15 517 718 ;
C 99 ; WX 500 ; N c ; B 30 -15 477 538 ;
C 100 ; WX 556 ; N d ; B 35 -15 499 718 ;
C 101 ; WX 556 ; N e ; B 40 -15 516 538 ;
C 102 ; WX 278 ; N f ; B 14 0 262 728 ; L i fi ; L l fl ;
C 103 ; WX 556 ; N g ; B 40 -220 499 538 ;
C 104 ; WX 556 ; N h ; B 65 0 491 718 ;
C 105 ; WX 222 ; N i ; B 67 0 155 718 ;
C 106 ; WX 222 ; N j ; B -16 -210 155 718 ;
C 107 ; WX 500 ; N k ; B 67 0 501 718 ;
C 108 ; WX 222 ; N l ; B 67 0 155 718 ;
C 109 ; WX 833 ; N m ; B 65 0 769 538 ;
C 110 ; WX 556 ; N n ; B 65 0 491 538 ;
C 111 ; WX 556 ; N o ; B 35 -14 521 538 ;
C 112 ; WX 556 ; N p ; B 58 -207 517 538 ;
C 113 ; WX 556 ; N q ; B 35 -207 494 538 ;
C 114 ; WX 333 ; N r ; B 77 0 332 538 ;
C 115 ; WX 500 ; N s ; B 32 -15 464 538 ;
C 116 ; WX 278 ; N t ; B 14 -7 257 669 ;
C 117 ; WX 556 ; N u ; B 68 -15 489 523 ;
C 118 ; WX 500 ; N v ; B 8 0 492 523 ;
C 119 ; WX 722 ; N w ; B 14 0 709 523 ;
C 120 ; WX 500 ; N x ; B 11 0 490 523 ;
C 121 ; WX 500 ; N y ; B 11 -214 489 523 ;
C 122 ; WX 500 ; N z ; B 31 0 469 523 ;
C 123 ; WX 334 ; N braceleft ; B 42 -196 292 722 ;
C 124 ; WX 260 ; N bar ; B 94 -225 167 775 ;
C 125 ; WX 334 ; N braceright ; B 42 -196 292 722 ;
C 126 ; WX 584 ; N asciitilde ; B 61 180 523 326 ;
C 161 ; WX 333 ; N exclamdown ; B 118 -195 215 523 ;
C 162 ; WX 556 ; N cent ; B 51 -115 513 623 ;
C 163 ; WX 556 ; N sterling ; B 33 -16 539 718 ;
C 164 ; WX 167 ; N fraction ; B -166 -19 333 703 ;
C 165 ; WX 556 ; N yen ; B 3 0 553 688 ;
C 166 ; WX 556 ; N florin ; B -11 -207 501 737 ;
C 167 ; WX 556 ; N section ; B 43 -191 512 737 ;
C 168 ; WX 556 ; N currency ; B 28 99 528 603 ;
C 169 ; WX 191 ; N quotesingle ; B 59 463 132 718 ;
C 170 ; WX 333 ; N quotedblleft ; B 38 470 307 725 ;
C 171 ; WX 556 ; N guillemotleft ; B 97 108 459 446 ;
C 172 ; WX 333 ; N guilsinglleft ; B 88 108 245 446 ;
C 173 ; WX 333 ; N guilsinglright ; B 88 108 245 446 ;
C 174 ; WX 500 ; N fi ; B 14 0 434 728 ;
C 175 ; WX 500 ; N fl ; B 14 0 432 728 ;
C 177 ; WX 556 ; N endash ; B 0 240 556 313 ;
C 178 ; WX 556 ; N dagger ; B 43 -159 514 718 ;
C 179 ; WX 556 ; N daggerdbl ; B 43 -159 514 718 ;
C 180 ; WX 278 ; N periodcentered ; B 77 190 202 315 ;
C 182 ; WX 537 ; N paragraph ; B 18 -173 497 718 ;
C 183 ; WX 350 ; N bullet ; B 18 202 333 517 ;
C 184 ; WX 222 ; N quotesinglbase ; B 53 -149 157 106 ;
C 185 ; WX 333 ; N quotedblbase ; B 26 -149 295 106 ;
C 186 ; WX 333 ; N quotedblright ; B 26 463 295 718 ;
C 187 ; WX 556 ; N guillemotright ; B 97 108 459 446 ;
C 188 ; WX 1000 ; N ellipsis ; B 115 0 885 106 ;
C 189 ; WX 1000 ; N perthousand ; B 7 -19 994 703 ;
C 191 ; WX 611 ; N questiondown ; B 91 -201 527 525 ;
C 193 ; WX 333 ; N grave ; B 14 593 211 734 ;
C 194 ; WX 333 ; N acute ; B 122 593 319 734 ;
C 195 ; WX 333 ; N circumflex ; B 21 593 312 734 ;
C 196 ; WX 333 ; N tilde ; B -4 606 337 722 ;
C 197 ; WX 333 ; N macron ; B 10 627 323 684 ;
C 198 ; WX 333 ; N breve ; B 13 595 321 731 ;
C 199 ; WX 333 ; N dotaccent ; B 121 604 212 706 ;
C 200 ; WX 333 ; N dieresis ; B 40 604 293 706 ;
C 202 ; WX 333 ; N ring ; B 75 572 259 756 ;
C 203 ; WX 333 ; N cedilla ; B 45 -225 259 0 ;
C 205 ; WX 333 ; N hungarumlaut ; B 31 593 409 734 ;
C 206 ; WX 333 ; N ogonek ; B 73 -225 287 0 ;
C 207 ; WX 333 ; N caron ; B 21 593 312 734 ;
C 208 ; WX 1000 ; N emdash ; B 0 240 1000 313 ;
C 225 ; WX 1000 ; N AE ; B 8 0 951 718 ;
C 227 ; WX 370 ; N ordfeminine ; B 24 405 346 737 ;
C 232 ; WX 556 ; N Lslash ; B -20 0 537 718 ;
C 233 ; WX 778 ; N Oslash ; B 39 -19 740 737 ;
C 234 ; WX 1000 ; N OE ; B 36 -19 965 737 ;
C 235 ; WX 365 ; N ordmasculine ; B 25 405 341 737 ;
C 241 ; WX 889 ; N ae ; B 36 -15 847 538 ;
C 245 ; WX 278 ; N dotlessi ; B 95 0 183 523 ;
C 248 ; WX 222 ; N lslash ; B -20 0 242 718 ;
C 249 ; WX 611 ; N oslash ; B 28 -22 537 545 ;
C 250 ; WX 944 ; N oe ; B 35 -15 902 538 ;
C 251 ; WX 611 ; N germandbls ; B 67 -15 571 728 ;
C -1 ; WX 278 ; N Idieresis ; B 13 0 266 901 ;
C -1 ; WX 556 ; N eacute ; B 40 -15 516 734 ;
C -1 ; WX 556 ; N abreve ; B 36 -15 530 731 ;
C -1 ; WX 556 ; N uhungarumlaut ; B 68 -15 521 734 ;
C -1 ; WX 556 ; N ecaron ; B 40 -15 516 734 ;
C -1 ; WX 667 ; N Ydieresis ; B 14 0 653 901 ;
C -1 ; WX 584 ; N divide ; B 39 -19 545 524 ;
C -1 ; WX 667 ; N Yacute ; B 14 0 653 929 ;
C -1 ; WX 667 ; N Acircumflex ; B 14 0 654 929 ;
C -1 ; WX 556 ; N aacute ; B 36 -15 530 734 ;
C -1 ; WX 722 ; N Ucircumflex ; B 79 -19 644 929 ;
C -1 ; WX 500 ; N yacute ; B 11 -214 489 734 ;
C -1 ; WX 500 ; N scommaaccent ; B 32 -225 464 538 ;
C -1 ; WX 556 ; N ecircumflex ; B 40 -15 516 734 ;
C -1 ; WX 722 ; N Uring ; B 79 -19 644 931 ;
C -1 ; WX 722 ; N Udieresis ; B 79 -19 644 901 ;
C -1 ; WX 556 ; N aogonek ; B 36 -220 547 538 ;
C -1 ; WX 722 ; N Uacute ; B 79 -19 644 929 ;
C -1 ; WX 556 ; N uogonek ; B 68 -225 519 523 ;
C -1 ; WX 667 ; N Edieresis ; B 86 0 616 901 ;
C -1 ; WX 722 ; N Dcroat ; B 0 0 674 718 ;
C -1 ; WX 250 ; N commaaccent ; B 87 -225 181 -40 ;
C -1 ; WX 737 ; N copyright ; B -14 -19 752 737 ;
C -1 ; WX 667 ; N Emacron ; B 86 0 616 879 ;
C -1 ; WX 500 ; N ccaron ; B 30 -15 477 734 ;
C -1 ; WX 556 ; N aring ; B 36 -15 530 756 ;
C -1 ; WX 722 ; N Ncommaaccent ; B 76 -225 646 718 ;
C -1 ; WX 222 ; N lacute ; B 67 0 264 929 ;
C -1 ; WX 556 ; N agrave ; B 36 -15 530 734 ;
C -1 ; WX 611 ; N Tcommaaccent ; B 14 -225 597 718 ;
C -1 ; WX 722 ; N Cacute ; B 44 -19 681 929 ;
C -1 ; WX 556 ; N atilde ; B 36 -15 530 722 ;
C -1 ; WX 667 ; N Edotaccent ; B 86 0 616 901 ;
C -1 ; WX 500 ; N scaron ; B 32 -15 464 734 ;
C -1 ; WX 500 ; N scedilla ; B 32 -225 464 538 ;
C -1 ; WX 278 ; N iacute ; B 95 0 292 734 ;
C -1 ; WX 471 ; N lozenge ; B 10 0 462 728 ;
C -1 ; WX 722 ; N Rcaron ; B 88 0 684 929 ;
C -1 ; WX 778 ; N Gcommaaccent ; B 48 -225 704 737 ;
C -1 ; WX 556 ; N ucircumflex ; B 68 -15 489 734 ;
C -1 ; WX 556 ; N acircumflex ; B 36 -15 530 734 ;
C -1 ; WX 667 ; N Amacron ; B 14 0 654 879 ;
C -1 ; WX 333 ; N rcaron ; B 61 0 352 734 ;
C -1 ; WX 500 ; N ccedilla ; B 30 -225 477 538 ;
C -1 ; WX 611 ; N Zdotaccent ; B 23 0 588 901 ;
C -1 ; WX 667 ; N Thorn ; B 86 0 622 718 ;
C -1 ; WX 778 ; N Omacron ; B 39 -19 739 879 ;
C -1 ; WX 722 ; N Racute ; B 88 0 684 929 ;
C -1 ; WX 667 ; N Sacute ; B 49 -19 620 929 ;
C -1 ; WX 643 ; N dcaron ; B 35 -15 655 718 ;
C -1 ; WX 722 ; N Umacron ; B 79 -19 644 879 ;
C -1 ; WX 556 ; N uring ; B 68 -15 489 756 ;
C -1 ; WX 333 ; N threesuperior ; B 5 270 325 703 ;
C -1 ; WX 778 ; N Ograve ; B 39 -19 739 929 ;
C -1 ; WX 667 ; N Agrave ; B 14 0 654 929 ;
C -1 ; WX 667 ; N Abreve ; B 14 0 654 926 ;
C -1 ; WX 584 ; N multiply ; B 39 0 545 506 ;
C -1 ; WX 556 ; N uacute ; B 68 -15 489 734 ;
C -1 ; WX 611 ; N Tcaron ; B 14 0 597 929 ;
C -1 ; WX 476 ; N partialdiff ; B 13 -38 463 714 ;
C -1 ; WX 500 ; N ydieresis ; B 11 -214 489 706 ;
C -1 ; WX 722 ; N Nacute ; B 76 0 646 929 ;
C -1 ; WX 278 ; N icircumflex ; B -6 0 285 734 ;
C -1 ; WX 667 ; N Ecircumflex ; B 86 0 616 929 ;
C -1 ; WX 556 ; N adieresis ; B 36 -15 530 706 ;
C -1 ; WX 556 ; N edieresis ; B 40 -15 516 706 ;
C -1 ; WX 500 ; N cacute ; B 30 -15 477 734 ;
C -1 ; WX 556 ; N nacute ; B 65 0 491 734 ;
C -1 ; WX 556 ; N umacron ; B 68 -15 489 684 ;
C -1 ; WX 722 ; N Ncaron ; B 76 0 646 929 ;
C -1 ; WX 278 ; N Iacute ; B 91 0 292 929 ;
C -1 ; WX 584 ; N plusminus ; B 39 0 545 506 ;
C -1 ; WX 260 ; N brokenbar ; B 94 -150 167 700 ;
C -1 ; WX 737 ; N registered ; B -14 -19 752 737 ;
C -1 ; WX 778 ; N Gbreve ; B 48 -19 704 926 ;
C -1 ; WX 278 ; N Idotaccent ; B 91 0 188 901 ;
C -1 ; WX 600 ; N summation ; B 15 -10 586 706 ;
C -1 ; WX 667 ; N Egrave ; B 86 0 616 929 ;
C -1 ; WX 333 ; N racute ; B 77 0 332 734 ;
C -1 ; WX 556 ; N omacron ; B 35 -14 521 684 ;
C -1 ; WX 611 ; N Zacute ; B 23 0 588 929 ;
C -1 ; WX 611 ; N Zcaron ; B 23 0 588 929 ;
C -1 ; WX 549 ; N greaterequal ; B 26 0 523 674 ;
C -1 ; WX 722 ; N Eth ; B 0 0 674 718 ;
C -1 ; WX 722 ; N Ccedilla ; B 44 -225 681 737 ;
C -1 ; WX 222 ; N lcommaaccent ; B 67 -225 167 718 ;
C -1 ; WX 317 ; N tcaron ; B 14 -7 329 808 ;
C -1 ; WX 556 ; N eogonek ; B 40 -225 516 538 ;
C -1 ; WX 722 ; N Uogonek ; B 79 -225 644 718 ;
C -1 ; WX 667 ; N Aacute ; B 14 0 654 929 ;
C -1 ; WX 667 ; N Adieresis ; B 14 0 654 901 ;
C -1 ; WX 556 ; N egrave ; B 40 -15 516 734 ;
C -1 ; WX 500 ; N zacute ; B 31 0 469 734 ;
C -1 ; WX 222 ; N iogonek ; B -31 -225 183 718 ;
C -1 ; WX 778 ; N Oacute ; B 39 -19 739 929 ;
C -1 ; WX 556 ; N oacute ; B 35 -14 521 734 ;
C -1 ; WX 556 ; N amacron ; B 36 -15 530 684 ;
C -1 ; WX 500 ; N sacute ; B 32 -15 464 734 ;
C -1 ; WX 278 ; N idieresis ; B 13 0 266 706 ;
C -1 ; WX 778 ; N Ocircumflex ; B 39 -19 739 929 ;
C -1 ; WX 722 ; N Ugrave ; B 79 -19 644 929 ;
C -1 ; WX 612 ; N Delta ; B 6 0 608 688 ;
C -1 ; WX 556 ; N thorn ; B 58 -207 517 718 ;
C -1 ; WX 333 ; N twosuperior ; B 4 281 323 703 ;
C -1 ; WX 778 ; N Odieresis ; B 39 -19 739 901 ;
C -1 ; WX 556 ; N mu ; B 68 -207 489 523 ;
C -1 ; WX 278 ; N igrave ; B -13 0 184 734 ;
C -1 ; WX 556 ; N ohungarumlaut ; B 35 -14 521 734 ;
C -1 ; WX 667 ; N Eogonek ; B 86 -220 633 718 ;
C -1 ; WX 556 ; N dcroat ; B 35 -15 550 718 ;
C -1 ; WX 834 ; N threequarters ; B 45 -19 810 703 ;
C -1 ; WX 667 ; N Scedilla ; B 49 -225 620 737 ;
C -1 ; WX 299 ; N lcaron ; B 67 0 311 718 ;
C -1 ; WX 667 ; N Kcommaaccent ; B 76 -225 663 718 ;
C -1 ; WX 556 ; N Lacute ; B 76 0 537 929 ;
C -1 ; WX 1000 ; N trademark ; B 46 306 903 718 ;
C -1 ; WX 556 ; N edotaccent ; B 40 -15 516 706 ;
C -1 ; WX 278 ; N Igrave ; B -13 0 188 929 ;
C -1 ; WX 278 ; N Imacron ; B -17 0 296 879 ;
C -1 ; WX 556 ; N Lcaron ; B 76 0 537 718 ;
C -1 ; WX 834 ; N onehalf ; B 43 -19 773 703 ;
C -1 ; WX 549 ; N lessequal ; B 26 0 523 674 ;
C -1 ; WX 556 ; N ocircumflex ; B 35 -14 521 734 ;
C -1 ; WX 556 ; N ntilde ; B 65 0 491 722 ;
C -1 ; WX 722 ; N Uhungarumlaut ; B 79 -19 644 929 ;
C -1 ; WX 667 ; N Eacute ; B 86 0 616 929 ;
C -1 ; WX 556 ; N emacron ; B 40 -15 516 684 ;
C -1 ; WX 556 ; N gbreve ; B 40 -220 499 731 ;
C -1 ; WX 834 ; N onequarter ; B 73 -19 756 703 ;
C -1 ; WX 667 ; N Scaron ; B 49 -19 620 929 ;
C -1 ; WX 667 ; N Scommaaccent ; B 49 -225 620 737 ;
C -1 ; WX 778 ; N Ohungarumlaut ; B 39 -19 739 929 ;
C -1 ; WX 400 ; N degree ; B 54 411 346 703 ;
C -1 ; WX 556 ; N ograve ; B 35 -14 521 734 ;
C -1 ; WX 722 ; N Ccaron ; B 44 -19 681 929 ;
C -1 ; WX 556 ; N ugrave ; B 68 -15 489 734 ;
C -1 ; WX 453 ; N radical ; B -4 -80 458 762 ;
C -1 ; WX 722 ; N Dcaron ; B 81 0 674 929 ;
C -1 ; WX 333 ; N rcommaaccent ; B 77 -225 332 538 ;
C -1 ; WX 722 ; N Ntilde ; B 76 0 646 917 ;
C -1 ; WX 556 ; N otilde ; B 35 -14 521 722 ;
C -1 ; WX 722 ; N Rcommaaccent ; B 88 -225 684 718 ;
C -1 ; WX 556 ; N Lcommaaccent ; B 76 -225 537 718 ;
C -1 ; WX 667 ; N Atilde ; B 14 0 654 917 ;
C -1 ; WX 667 ; N Aogonek ; B 14 -225 654 718 ;
C -1 ; WX 667 ; N Aring ; B 14 0 654 931 ;
C -1 ; WX 778 ; N Otilde ; B 39 -19 739 917 ;
C -1 ; WX 500 ; N zdotaccent ; B 31 0 469 706 ;
C -1 ; WX 667 ; N Ecaron ; B 86 0 616 929 ;
C -1 ; WX 278 ; N Iogonek ; B -3 -225 211 718 ;
C -1 ; WX 500 ; N kcommaaccent ; B 67 -225 501 718 ;
C -1 ; WX 584 ; N minus ; B 39 216 545 289 ;
C -1 ; WX 278 ; N Icircumflex ; B -6 0 285 929 ;
C -1 ; WX 556 ; N ncaron ; B 65 0 491 734 ;
C -1 ; WX 278 ; N tcommaaccent ; B 14 -225 257 669 ;
C -1 ; WX 584 ; N logicalnot ; B 39 108 545 390 ;
C -1 ; WX 556 ; N odieresis ; B 35 -14 521 706 ;
C -1 ; WX 556 ; N udieresis ; B 68 -15 489 706 ;
C -1 ; WX 549 ; N notequal ; B 12 -35 537 551 ;
C -1 ; WX 556 ; N gcommaaccent ; B 40 -220 499 822 ;
C -1 ; WX 556 ; N eth ; B 35 -15 522 737 ;
C -1 ; WX 500 ; N zcaron ; B 31 0 469 734 ;
C -1 ; WX 556 ; N ncommaaccent ; B 65 -225 491 538 ;
C -1 ; WX 333 ; N onesuperior ; B 43 281 222 703 ;
C -1 ; WX 278 ; N imacron ; B 5 0 272 684 ;
C -1 ; WX 556 ; N Euro ; B 0 0 0 0 ;
EndCharMetrics

Which clearly starts at character 32, so why was character 33 used as FirstChar?

  1. Similar to the first question, why is /LastChar 126 and not 251 (or -1)? I tried -1 myself and filled the widths array, but it doesn't work and I found no way to give all the -1 characters a width and as a result all these characters don't show up correctly (for example € doesn't show correctly). How would I give all 315 character codes the correct width?

  2. In the FontDescriptor I see:
    /Flags 32
    What is the logic behind the value 32? The spec doesn't help me either, because it gives examples like:
    /Flags 262178 %Bits 2, 6, and 19
    How do you get to these values? I would expect a value like 2619 or something, but the values just don't make any sense. Same thing with the 32 I would have expected 23 as a combination of flag 2 and 3, but apparently it's bit position 32 and that means high-order? Even though this bit position doesn't even exist in the table of Font flags in the PDF spec. I also tried every unsigned 32-bit integer calculator to make sense of these numbers, but nothing worked.

  3. In the fontDescriptor I also see:
    /MissingWidth 0
    Shouldn't this be used since we don't supply the width for all the character codes? Strangely enough it does get used in the "PDF 2.0 with page level output intent.pdf" example, see:
    /MissingWidth 278
    And then I am wondering, how do you get that value of 278, is that random or can I actually find that in the font information somewhere?

Hopefully someone can help me with these questions or point me in the right direction.

Thanks in advance

@pdfa-mattk
Copy link
Contributor

Hello @caspervanpomeren,

For characters that are outside the bounds of the Widths array as given by the FirstChar and LastChar entries, the value contained in the MissingWidths key is used for the character's width. In this example, because the space character is outside the bounds, it uses the '0' value from the MissingWidths - that's why there's no visual spacing between the two words "Hello" and 'World" when viewing the PDF file. To give the space character an actual width, you would change the FirstChar value to '32', and add the desired width (likely '278' based on the metrics for the font) as the first entry of the Widths array.
Per Table 109, the FirstChar and LastChar entries give the expected size of the Widths and therefore need to be valid for the size of the Widths array. '-1' is not a vaild value here. '251' could be a valid value if the appropriate entries were added to the Widths array to make the correct size.

The encoding of this font, as a Type 1 with no specified encoding and with the font Flags value having the 6th bit set (with a value of '32', only the 6th bit is set in this binary bit flag), is the StandardEncoding detailed in Annex D and for which the character encoding is given in Table D.2. A Type 1 font is a single-byte font and therefore cannot encode more than 255 characters [note: do not encode character 0 in a font, it will confuse many processor]. Character encodings are how a writer describes what content stream data corresponds to what font characters. To use characters in Helvetica that are not contained in StandardEncoding (or are not in one of the other pre-defined encodings, which you could use by supplying the name for the Encoding value in the font dictionary), you need to make an encoding dictionary as described in section 9.6.5 and add the characters you'd like to encode. You'd also want to adjust the Widths array suitably.

Note: these examples use unembedded Standard 14 fonts primarily for the sake of compactness. In most cases, I highly recommend embedding (and subsetting if desired) fonts that are used in PDF files.

Hope this helps!

@caspervanpomeren
Copy link
Author

Hi @pdfa-mattk ,

Thanks for the quick and detailed response. It took a while to digest all the information and respond (had lots of reading/testing to do and was a bit sick), but here it is.

For characters that are outside the bounds of the Widths array as given by the FirstChar and LastChar entries, the value contained in the MissingWidths key is used for the character's width. In this example, because the space character is outside the bounds, it uses the '0' value from the MissingWidths - that's why there's no visual spacing between the two words "Hello" and 'World" when viewing the PDF file. To give the space character an actual width, you would change the FirstChar value to '32', and add the desired width (likely '278' based on the metrics for the font) as the first entry of the Widths array.
Per Table 109, the FirstChar and LastChar entries give the expected size of the Widths and therefore need to be valid for the size of the Widths array. '-1' is not a vaild value here. '251' could be a valid value if the appropriate entries were added to the Widths array to make the correct size.

I understand, MissingWidths is basically the fallback if you didn't specify a width for a character code.

(with a value of '32', only the 6th bit is set in this binary bit flag)

I finally understand the logic behind this! See the following JavaScript code:

(32).toString(2)
--> Returns: "100000"
From the right, the 6th bit is set (to 1)

And the other example I gave that I didn't understand (for future readers):

/Flags 262178  %Bits 2, 6, and 19
(262178).toString(2)
--> Returns: "1000000000000100010"
From the right, the 2nd, the 6th and the 19th bit is set (to 1)

And if I want to set a certain flag, for example 2, 6 and 19 I can use this logic to get the correct value:

parseInt("1000000000000100010", 2)
--> Returns: 262178

It took me a while to understand the whole bits concept, but it finally clicked.

A Type 1 font is a single-byte font and therefore cannot encode more than 255 characters

So if I understand correctly, the Helvetica font has 315 characters but I can only encode 255 characters when using a Type 1 font? This means I will not be able to encode all the 315 characters of the Helvetica font?

To use characters in Helvetica that are not contained in StandardEncoding (or are not in one of the other pre-defined encodings, which you could use by supplying the name for the Encoding value in the font dictionary), you need to make an encoding dictionary as described in section 9.6.5 and add the characters you'd like to encode. You'd also want to adjust the Widths array suitably.

So I tried playing around with this and basically tried four scenarios:

  1. Used StandardEncoding and tried using all the characters it contains;
  2. Used StandardEncoding + Differences array to get to different characters not usually in StandardEncoding;
  3. Used WinAnsiEncoding and tried using all the characters it contains;
  4. Tried using PDFDocEncoding, but later read that this is used for: "Encoding for text strings in a PDF document outside the document's content streams". Well unfortunately I started with this one and made a complete width's array for all it's characters... At least I now know how the octal number system works.

Here are my findings/questions based on these scenarios:

  1. Scenario: Used StandardEncoding and it's characters it contains.
    Here is my (relevant) code:
4 0 obj
  <</Length 154
  >>
stream
BT
  /F1 24 Tf
  72 696 Td
  (Hel¡lo ) Tj
  /F1 24 Tf
  (32000-2) Tj
  /F1 24 Tf
  156.1 0 Td
  (wor) Tj
  (ld) Tj
ET

133.3 694.2 m
221.4 694.2 l
1.2 w
S
endstream
endobj

5 0 obj
  <</Type /Font
    /Subtype /Type1
    /BaseFont /Helvetica
    /FirstChar 32
    /LastChar 251
    /Widths 6 0 R
    /FontDescriptor 7 0 R
  >>
endobj

6 0 obj
[ 278 278 355 556 556 889 667 222 333 333
  389 584 278 333 278 278 556 556 556 556
  556 556 556 556 556 556 278 278 584 584
  584 556 1015 667 667 722 722 667 611 778
  722 278 500 667 556 833 722 778 667 778
  722 667 611 722 667 944 667 667 611 278
  278 278 469 556 222 556 556 500 556 556
  278 556 556 222 222 500 222 833 556 556
  556 556 333 500 278 556 500 722 500 500
  500 334 260 334 584 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 333
  556 556 167 556 556 556 556 191 333 556
  333 333 500 500 0 556 556 556 278 0
  537 350 222 333 333 556 1000 1000 0
  611 0 333 333 333 333 333 333 333 333
  0 333 333 0 333 333 333 1000 0 0
  0 0 0 0 0 0 0 0 0 0
  0 0 0 0 1000 0 370 0 0 0
  0 556 778 1000 365 0 0 0 0 0
  889 0 0 0 278 0 0 222 611 944
  611 ]
endobj

7 0 obj
  <</Type /FontDescriptor
    /FontName /Helvetica
    /Flags 32
    /FontBBox [ -166 -225 1000 931 ]
    /ItalicAngle 0
    /Ascent 718
    /Descent -207
    /CapHeight 718
    /StemV 88
    /MissingWidth 0  
  >>
endobj

The problem here is, the code returns the following text: "Hel´¡lo 32000-2world". The text I was expecting was: "Hel¡lo 32000-2world".

What causes this extra ´?

This happens with all kinds of characters, for example if I try to insert: "HelØlo 32000-2world", I get "Helˆ�lo 32000-2world"

What am I doing wrong? I think I correctly filled in the widths array and these characters are included in the StandardEncoding if I look at Annex D.

The only thing I could think of was that I filled the widths array with widths of zero to compensate for the holes between the available character codes that were available in the encoding, but this was the only logical thing that actually made the widths array correctly work for me.

  1. Scenario: Used StandardEncoding + Differences array.
    Here is my (relevant) code:
4 0 obj
  <</Length 154
  >>
stream
BT
  /F1 24 Tf
  72 696 Td
  (Hel$lo ) Tj
  /F1 24 Tf
  (32000-2) Tj
  /F1 24 Tf
  156.1 0 Td
  (wor) Tj
  (ld) Tj
ET

133.3 694.2 m
221.4 694.2 l
1.2 w
S
endstream
endobj

5 0 obj
  <</Type /Font
    /Subtype /Type1
    /BaseFont /Helvetica
    /Encoding 6 0 R
    /FirstChar 33
    /LastChar 126
    /Widths 7 0 R
    /FontDescriptor 8 0 R
  >>
endobj

6 0 obj
  <</Type /Encoding
    /Differences [36 /Euro]
  >>
endobj

7 0 obj
[ 278 355 556 556 889 667 222 333 333 389 584 278 333 278 278 556
  556 556 556 556 556 556 556 556 556 278 278 584 584 584 556 1015
  667 667 722 722 667 611 778 722 278 500 667 556 833 722 778 667
  778 722 667 611 722 667 944 667 667 611 278 278 278 469 556 222
  556 556 500 556 556 278 556 556 222 222 500 222 833 556 556 556
  556 333 500 278 556 500 722 500 500 500 334 260 334 584 556 ]
endobj

8 0 obj
  <</Type /FontDescriptor
    /FontName /Helvetica
    /Flags 32
    /FontBBox [ -166 -225 1000 931 ]
    /ItalicAngle 0
    /Ascent 718
    /Descent -207
    /CapHeight 718
    /StemV 88
    /MissingWidth 0  
  >>
endobj

This returns: "Hel€lo 32000-2world" as expected. The only thing I noticed was I can't use character names like sterling in the differences array since it already exists as a character code in the StandardEncoding. Is it not allowed to have two character codes for the same thing or am I missing something?

  1. Scenario: Used WinAnsiEncoding and it's characters it contains.
    Here is my (relevant) code:
4 0 obj
  <</Length 154
  >>
stream
BT
  /F1 24 Tf
  72 696 Td
  (Hel¡lo ) Tj
  /F1 24 Tf
  (32000-2) Tj
  /F1 24 Tf
  156.1 0 Td
  (wor) Tj
  (ld) Tj
ET

133.3 694.2 m
221.4 694.2 l
1.2 w
S
endstream
endobj

5 0 obj
  <</Type /Font
    /Subtype /Type1
    /BaseFont /Helvetica
    /Encoding /WinAnsiEncoding
    /FirstChar 32
    /LastChar 255
    /Widths 6 0 R
    /FontDescriptor 7 0 R
  >>
endobj

6 0 obj
[ 278 278 355 556 556 889 667 191 333 333 
  389 584 278 333 278 278 556 556 556 556 
  556 556 556 556 556 556 278 278 584 584 
  584 556 1015 667 667 722 722 667 611 778 
  722 278 500 667 556 833 722 778 667 778 
  722 667 611 722 667 944 667 667 611 278
  278 278 469 556 333 556 556 500 556 556 
  278 556 556 222 222 500 222 833 556 556 
  556 556 333 500 278 556 500 722 500 500 
  500 334 260 334 584 0 556 0 222 556 
  333 1000 556 556 333 1000 667 333 1000 0 
  611 0 0 222 222 333 333 350 556 1000
  333 1000 500 333 944 0 500 667 0 333 
  556 556 556 556 260 556 333 737 370 556
  584 0 737 333 400 584 333 333 333 556 
  537 278 333 333 365 556 834 834 834 611
  667 667 667 667 667 667 1000 722 667 667 
  667 667 278 278 278 278 722 722 778 778 
  778 778 778 584 778 722 722 722 722 667 
  667 611 556 556 556 556 556 556 889 500 
  556 556 556 556 278 278 278 278 556 556 
  556 556 556 556 556 584 611 556 556 556 
  556 500 556 500 ]
endobj

7 0 obj
  <</Type /FontDescriptor
    /FontName /Helvetica
    /Flags 32
    /FontBBox [ -166 -225 1000 931 ]
    /ItalicAngle 0
    /Ascent 718
    /Descent -207
    /CapHeight 718
    /StemV 88
    /MissingWidth 0  
  >>
endobj

The problem here is, the code returns the following text: "Hel¡lo 32000-2 world". The text I was expecting was: "Hel¡lo 32000-2world".

What causes this extra Â?

This is basically the same problem as in scenario 1.

Another problem, when I try using this text: "Hel€lo 32000-2world", I get this: "Hel€lo 32000w-2orld". Even though WinAnsiEncoding does include the character Euro.

Note: these examples use unembedded Standard 14 fonts primarily for the sake of compactness. In most cases, I highly recommend embedding (and subsetting if desired) fonts that are used in PDF files.

I understand, but I am currently trying to understand the entire spec and make examples that explain everything from the spec. In these examples I also want to benefit from the compactness of unembedded standard fonts. I plan on making a pull request with all my examples and you can decide if you want them added.

When I will use my knowledge of the spec, I will certainly use embedding and subsetting fonts. Do you know if there are any simple examples of embedding/subsetting of fonts? This is basically the next step I am going to work on.

To conclude, a more generic question: How do people generally learn to write pdf by hand? Do they just read the spec and go from there? Or are there certain resources that are recommended? Or are there certain communities on IRC/Discord etc? Because while I have found some tutorials/knowledge online, it isn't a huge amount. It's especially hard since lots of information isn't based on the latest spec and the term "pdf" is used so much in relation to other things that search engines don't really return what I am looking for.

Some extra background information, my end goal is to create a JavaScript library that can automatically write pdf that complies with the latest pdf spec and almost completely supports every aspect of it. So I am starting by doing everything by hand and understanding how everything works, and then I am going to translate that knowledge to JavaScript code,

Thanks again for the help. I also understand that this is quite some text, so please take your time and even if you can only answer one thing I would really appreciate it.

If I need to clarify anything please let me know.

Casper

@pdfa-mattk
Copy link
Contributor

Hi @caspervanpomeren, let me see if I can answer some of your questions here:

So if I understand correctly, the Helvetica font has 315 characters but I can only encode 255 characters when using a Type 1 font? This means I will not be able to encode all the 315 characters of the Helvetica font?

As a Type1 font, this is correct. This is a general characteristic of Type1 fonts as defined in PDF. Because they use single byte values in the content stream to reference characters in their encodings, and a single byte can only hold up to 256 different values, this sets the limit on the number of different characters that can be encoded in a given instance of a Type1 font.
You could make two different instances of Helvetica, with two different font dictionaries that use two different Encodings that have different Differences arrays, and between the two encode all 315 characters. Then, you just need to use the correct instance to set the appropriate text strings. They can even both be named "Helvetica". The PDF will have two different fonts, but these will both be instances of Helvetica.

The problem here is, the code returns the following text: "Hel´¡lo 32000-2world". The text I was expecting was: "Hel¡lo 32000-2world".
What causes this extra ´?

For Type1 fonts, strings in content streams are read as individual bytes. I suspect that the text you're putting into the content stream might be encoded in UTF-8 - this would cause the ¡ (inverted exclamation mark, U+00A1) to be expressed as two bytes: 0xC2 0xA1. The "extra" character is likely that extra 0xC2 that I suspect you're putting into the content stream. The same concept looks like it explains the other odd and extra characters you're seeing when trying to put other characters in.
The bytes of the content stream are interpreted as individual bytes, and those single bytes are used to look up characters in the font's Encoding. The content stream bytes are not in UTF-8, they're presumed to be expressed specifically in the font's Encoding.

The only thing I noticed was I can't use character names like sterling in the differences array since it already exists as a character code in the StandardEncoding. Is it not allowed to have two character codes for the same thing or am I missing something?

I don't know of any restriction on doing this. You should be able to use any name in any position in the Differences array, and as long as the font used for display has that character it should work. Could you tell me what error or behavior you were seeing when you tried this?

a more generic question: How do people generally learn to write pdf by hand? Do they just read the spec and go from there?

Most everything in PDF 2.0 is shared with earlier versions of PDF, so your best resources are mostly going to be written for earlier versions of PDF. Many people learn by reading the spec, examining PDFs that are generated from other libraries or programs, and experimenting.
Two books I can recommend are Leonard Rosenthal's "Developing with PDF: Dive Into the Portable Document Format" and John Whitington's "PDF Explained: The ISO Standard for Document Exchange". https://brendanzagaeski.appspot.com/0004.html has some good introduction and starting points as well.
Most everything you learn about making PDF files from versions before PDF 2.0 will apply to PDF 2.0 as well, so don't worry about looking for PDF 2.0 - specific tutorials.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants