Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing chapterless references ? #36

Open
DavidHaslam opened this issue Dec 8, 2017 · 5 comments
Open

Parsing chapterless references ? #36

DavidHaslam opened this issue Dec 8, 2017 · 5 comments

Comments

@DavidHaslam
Copy link

DavidHaslam commented Dec 8, 2017

The implementation in ParaTExt 8 for evaluating \xt treats references without book abbreviations as being to the same book. It treats numbers without a chapter / verse separator as being to that chapter in the current book (verseless reference).

There is no support for a chapterless reference (it would be ambiguous with verseless reference -- unless there was a regular syntax to specify a character or abbreviation for verse, like "v13" - which there is not right now).

Some examples of ParaTExt 8 parsing for reference texts (including \io and \r and \xt) are shown in the attached image.

image

@DavidHaslam
Copy link
Author

That image illustrated the use and potential problems very well.

This reminds me of something I observed earlier this year while processing references for a Polish Bible translation.

Chapterless verse references were prefixed with either "w. " for a single verse or "ww. " for a verse range or sequence. e.g.

    \x + \xt ww. 13-22.\x*
    \x + \xt w. 1.\x*
    \x + \xt ww. 13.19.28; Ps 50,15; Oz 5,15. \x*

This underscores the fact that the potential syntactical verse prefix is language specific, and therefore should be specified in the locale for the translation language.

Likewise, whether the translators use a period and a space after these two Polish abbreviations (equivalent in meaning to "verse" and "verses") is also a matter of choice!

Aside: That the punctuation for a sequence of verses can be the period rather than the usual comma was also a challenge! How Scripture references are punctuated varies from language to language, and sometimes even between Bible versions in the same language!

@DavidHaslam
Copy link
Author

DavidHaslam commented Dec 8, 2017

It is necessary to capture quite a bit about the language and project level details for syntax, book names etc.

ParaTExt does provide for this through a Scripture Reference Settings interface, as in the following example.

image2

There is also the Book Names tab (where Abbreviation, Short and Complete book names are specified), and a place to configure how xref origins (\xo) are expressed in the text.

All of this is input into:

  1. validating references, and
  2. generating an inline consistent machine readable form of all vernacular references when the text is exported to USX (XML).

The USFM specification does not explicitly define "rules" for what references mean (they vary so widely), but Paratext implements a specification so that they can be tested, and then exported to a standardized form.

@DavidHaslam
Copy link
Author

I'm indebted to Jeff Klassen of UBS ICAP for providing answers to some of my questions.

I trust that the above information may help @adyeths to further develop orefs.py to cover these points.

@DavidHaslam
Copy link
Author

It's not yet clear how USFM might be enhanced to support chapterless verse references.

Nevertheless, these are items that we've encountered in the real word, especially in translations that were edited outside the ParaTExt software environment.

I have added a suitable comment in issue 34 for USFM.

@DavidHaslam
Copy link
Author

See also #43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant