Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing space (&/or terminating period) in a reference element? #38

Open
DavidHaslam opened this issue Dec 8, 2017 · 3 comments
Open

Comments

@DavidHaslam
Copy link

Reference elements generated by u2o.py

Here's an example of a reference element containing a trailing space.

<note placement="foot"><reference type="annotateRef">8:27 </reference><catchWord>افود: </catchWord>عام طور پر عبرانی میں افود کا مطلب امامِ اعظم کا بالاپوش تھا (دیکھئے خروج <seg type="x-nested"><reference>28:4</reference></seg>)، لیکن یہاں اِس سے مراد بُت پرستی کی کوئی چیز ہے۔ </note>

Do you think it might be advisable to move the trailing space to after the </reference> ?

i.e. Such that this example becomes:

<note placement="foot"><reference type="annotateRef">8:27</reference> <catchWord>افود: </catchWord>عام طور پر عبرانی میں افود کا مطلب امامِ اعظم کا بالاپوش تھا (دیکھئے خروج <seg type="x-nested"><reference>28:4</reference></seg>)، لیکن یہاں اِس سے مراد بُت پرستی کی کوئی چیز ہے۔ </note>

cf. Some Bible translators add a terminating period (full-stop) after the last reference in a cross-reference marker. Likewise, the terminating period should be shifted to after the </reference>, n'est-ce pas?

@DavidHaslam DavidHaslam changed the title Trailing spaces in a reference element? Trailing space (&/or terminating period) in a reference element? Dec 8, 2017
@adyeths
Copy link
Owner

adyeths commented Dec 8, 2017

Does the original usfm contain the trailing space or period in the reference markup? If yes, then that's why it's in the osis.

@DavidHaslam
Copy link
Author

DavidHaslam commented Dec 8, 2017

First, we should recognise that Bible translators are rarely as rigorous about markup as we programmers would like them to be.

Let's address the terminating period first.

That's not part of a valid reference, though it is part of a cross-reference note.

So after converting \x + ...\x* to OSIS, each reference (if there's more than one) goes into a reference element, and the separating punctuation (together with any space) goes between the reference elements.

The terminating period is just like these separating punctuation marks, only that it

  1. Happens to be at the end.
  2. Often happens to be a different character (not so with Polish).

The fact that the translator may have included a space just before \+xt* is usually just due to not realising that such an extra space isn't really required. There's usually a space just after \+xt*.

IMHO, that trailing space (superfluous as it might be) is better placed after </reference> in order to keep the text wrapped by the reference element to be a pure reference, rather than a reference plus a space.

Likewise, the translator may have include a superfluous space just before \fk as happened in my example. The space isn't really part of the real argument of the \fr part, which should be a pure reference, so it would be better treated as something to go between the annotateRef reference and the catchWord.

If you like, this is a reasonable adjustment that makes good sense now that the text is in XML.

@adyeths
Copy link
Owner

adyeths commented Dec 8, 2017

These issues should really be fixed in the usfm source. u2o is a converter not a corrector. It was not designed (and was never intended) to fix problems that are present in the usfm markup. It's only designed (and intended) to convert usfm markup to osis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants