Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character markup in ToC results in invalid XML in OSIS output #102

Open
mmartin9684-sil opened this issue Jan 14, 2020 · 4 comments
Open

Comments

@mmartin9684-sil
Copy link

When a Table of Contents line in the source USFM file includes character markup (e.g., italics), the resulting XML in the OSIS output file is not well formed and results in a parsing error.

  • Source USFM Line:

\toc2 \it (Lista de lectura)\it*

  • OSIS Output File:

36225:<milestone type="x-usfm-toc2" n="(Lista de lectura)" />

  • Error:

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 36225, column 33

@adyeths
Copy link
Owner

adyeths commented Jan 15, 2020

The osis line you shared here is valid. The problem must be on a different line. I would have to see more to see what is actually going on here. (It may very well be the character markup. I didn't anticipate that in this location.)

Is the USFM character markup even valid here?

@mmartin9684-sil
Copy link
Author

mmartin9684-sil commented Jan 15, 2020 via email

@adyeths
Copy link
Owner

adyeths commented Jan 15, 2020

I can't find anything in the documentation for usfm to suggest the \it tags are valid in this location. And looking at the default paratext stylesheet, it doesn't indicate that it's valid either. If it is, I'll need to see something to suggest it is in order to do something about this particular issue.

@DavidHaslam
Copy link

It’s not likely to be valid USFM to have character level markup within the ToC strings.

Aside: I did once come across a language in which the alphabet included an italicised letter that was pronounced differently to the normal letter. I suppose there’s a remote possibility that a bookname in that language might contain one or more such letter.

Suggest put the validity question to the UBS ICAP team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants