-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Character markup in ToC results in invalid XML in OSIS output #102
Comments
The osis line you shared here is valid. The problem must be on a different line. I would have to see more to see what is actually going on here. (It may very well be the character markup. I didn't anticipate that in this location.) Is the USFM character markup even valid here? |
<!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
-->Ryan – Apologies, the GitHub issue tracking page seems to have interpreted some of the OSIS markup as formatting for the issue itself. I didn’t notice that when posting the issue. Here’s the original USFM markup: \id XXB - Aguaruna (Awajun) NT -Peru 2009 (DBL-2013)\toc1 A'na pi'i marëáchin ya'ipi Yosë quiricanën nontahua'\toc2 \it (Lista de lectura)\it*\mt1 Lista de lectura para leer este volumen en un año Here is the OSIS markup that’s being produced: <div type="x-other"><milestone type="x-usfm-toc1" n="A'na pi'i marëáchin ya'ipi Yosë quiricanën nontahua'" /><milestone type="x-usfm-toc2" n="<hi type="italic">(Lista de lectura)</hi>" /><!-- mt1 --><title level="1" type="main">Lista de lectura para leer este volumen en un año</title><div type="introduction"> I suspect the issue has to do with the double-quotes (“) both surrounding and embedded in the value for n. I’ve tried using Paratext to process this source file, and it does appear that it recognizes italics markup on a ToC entry: Again, my apologies for not catching the problem with the way the GitHub issue tracker was stripping out the OSIS markup. Regards, Michael A. MartinSILM: +1.908.432.8677 From: RyanSent: Tuesday, January 14, 2020 9:55 PMTo: adyeths/u2oCc: Michael A. Martin; AuthorSubject: Re: [adyeths/u2o] Character markup in ToC results in invalid XML in OSIS output (#102) The osis line you shared here is valid. The problem must be on a different line. I would have to see more to see what is actually going on here. (It may very well be the character markup. I didn't anticipate that in this location.)Is the USFM character markup even valid here?—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.
|
I can't find anything in the documentation for usfm to suggest the \it tags are valid in this location. And looking at the default paratext stylesheet, it doesn't indicate that it's valid either. If it is, I'll need to see something to suggest it is in order to do something about this particular issue. |
It’s not likely to be valid USFM to have character level markup within the ToC strings. Aside: I did once come across a language in which the alphabet included an italicised letter that was pronounced differently to the normal letter. I suppose there’s a remote possibility that a bookname in that language might contain one or more such letter. Suggest put the validity question to the UBS ICAP team. |
When a Table of Contents line in the source USFM file includes character markup (e.g., italics), the resulting XML in the OSIS output file is not well formed and results in a parsing error.
\toc2 \it (Lista de lectura)\it*
36225:<milestone type="x-usfm-toc2" n="(Lista de lectura)" />
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 36225, column 33
The text was updated successfully, but these errors were encountered: