Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjusting the OSIS to avoid orphaned verse tags in SWORD? #82

Open
DavidHaslam opened this issue Feb 9, 2019 · 9 comments
Open

Adjusting the OSIS to avoid orphaned verse tags in SWORD? #82

DavidHaslam opened this issue Feb 9, 2019 · 9 comments

Comments

@DavidHaslam
Copy link

Orphaned verse tags can occur in a number of contexts, some simple to describe and some rather more complex. This issue will initially focus on the simpler contexts in Bibles that are paragraphed.

To make things clearer to follow, the XML layout has been re-arranged in the snippets that are provided as examples, but this is only the same kind of cosmetic change that you can obtain using xmllint.

An orphaned verse tag is where the displayed verse tag is alone on one line and the verse text is displayed on the next line. It looks sloppy when the module is viewed in a SWORD front-end and especially so when there are lots of these in the same module.

Example:

					<verse sID="Matt.4.22" osisID="Matt.4.22" n="22" />And they immediately left the ship and their father, and followed him.
					<verse eID="Matt.4.22" />
				</p>
				<p>
					<verse sID="Matt.4.23" osisID="Matt.4.23" n="23" />And Jesus went about all Galilee, teaching in their synagogues, and preaching the gospel of the kingdom, and curing every disease, and every kind of sickness among the people.
					<verse eID="Matt.4.23" />

Observe that the verse eID milestone is before the paragraph break. This causes an orphaned verse tag for verse 23. Here's the fix:

					<verse sID="Matt.4.22" osisID="Matt.4.22" n="22" />And they immediately left the ship and their father, and followed him.
				</p>
				<p>
					<verse eID="Matt.4.22" />
					<verse sID="Matt.4.23" osisID="Matt.4.23" n="23" />And Jesus went about all Galilee, teaching in their synagogues, and preaching the gospel of the kingdom, and curing every disease, and every kind of sickness among the people.
					<verse eID="Matt.4.23" />

Here, the verse eID milestone has been moved into the next paragraph. This eliminates the display problem in SWORD front-ends.

Workaround:

For this simple scenario it's feasible to fix all the similar locations by a PCRE search and replace.
Here's a tab delimited single line replace list that does that.

(<verse eID="\S+"\s?/>\s*)(</p>\s*<p>\s*)(<verse sID)	$2$$1$$3
  • My workaround was implemented using TextPipe.
  • Non-greedy matching is set outside the list - in the filter UI.
  • The above search pattern is impervious to XML layout.

Harder cases:

The more complex contexts that give rise to orphans are harder to deal with, but the principle is the same. Here's a partial list of possible scenarios.

  • eID milestone before section title, etc.
  • eID milestone before other paragraph types.
  • eID milestone in poetry , lists, tables, etc.

Conclusion:

My view is that tackling the simplest and most common context first should

  • Provide some insight into the nature of the issue
  • Separate the wheat from the chaff by having fixed the major cause of the issue
  • After that, we can research the more difficult cases having got the others out of the way.
@DavidHaslam
Copy link
Author

DavidHaslam commented Feb 9, 2019

Matters arising:

  • Should there be any other cause for u2o.py to drop a verse eID milestone other than when it encounters either a verse sID milestone or a chapter eID milestone?
  • Is there any aspect of the OSIS transform that osis2mod does "under the hood" that may have a bearing on this issue?
  • We should be curious as to why SWORD behaves in this way.
  • Does the same display problem also occur with JSword based front-ends?
  • Is there anything misleading in either the OSIS Manual or in the SWORD developers' wiki that relates to the issue?

@adyeths
Copy link
Owner

adyeths commented Feb 9, 2019

This actually sounds more like a frontend issue to me rather than an osis formatting problem. One of the reasons I formatted the osis in this manner was to ensure that tags were properly nested so as to avoid display issues with orphaned verses.

Before I tried ensuring that there was proper nesting of tags, I had issues with how bibles were displayed in some sword frontends. Adjusting the tags to try to ensure proper nesting eliminated the display issues for me in all of the frontends that I use.

@DavidHaslam
Copy link
Author

Is Xiphos one of the front-ends that you use for module testing?

  • I have certainly observed orphaned verse tags when a module is displayed using Xiphos for Windows.
  • I have often needed to apply such a workaround as described above in order to eliminate them.

How do you understand "nesting" in the context of the milestone versions of verse and chapter elements?

@DavidHaslam
Copy link
Author

I should add that my workaround does not cause osis2mod to report any NESTING errors.

Nor does it cause the OSIS to fail to validate to the .DTD schema.

@adyeths
Copy link
Owner

adyeths commented Feb 9, 2019

I test using xiphos, bibletime, and AndBible. They are the only frontends that I can use at this time. Bibletime never had problems but I think it does it's own formatting.

I understand nesting with milestone tags the same as with other tags. The only difference being they can cross boundaries since they are milestones.

@DavidHaslam
Copy link
Author

DavidHaslam commented Feb 9, 2019

FWIW, I've just developed a new experimental TextPipe filter that seeks to fix all possible contexts.
i.e. As a more comprehensive workaround.

This uses a pattern that does not make use of the paragraph p element.

Here's the pseudo-code for my new method:

  • Insert a tilde ~ just before each verse sID milestone element
  • Restrict to (text) between verse sID milestone and tilde - Send variable 1 to subfilter
    • Restrict to not including a chapter boundary
      • Move the verse eID milestone down as far as it can go
  • Remove all tilde

Aside: Using a tilde simply ensures that the processing does not miss 50% of the verses.

Observations:

  • It fixed all the previous locations for the same OSIS file
  • It fixed the 3 extra places I'd fixed by hand in Romans 1
  • It fixed 4 more places that I'd failed to spot earlier

For the latter, it indicates that I could edit the SFM file to remove some spurious extra \p tags.

Joy, pure joy!

TextPipe is a superb tool for trying out new algorithms. I wouldn't be without it.

@DavidHaslam
Copy link
Author

My workaround is based on the notion that there should not be any content between
a verse eID milestone and the next verse sID milestone (in the same chapter).

  • Pre-verse content is specially handled by osis2mod.
  • It becomes a div element with type="x-preverse".

This ensures that all module content can be referenced by SWORD.

Nesting

  • Milestone nesting can happen in a badly prepared OSIS file and is detected by osis2mod.
  • Milestone nesting is not detected during OSIS validation.

@DavidHaslam
Copy link
Author

FIO: Here is a clipboard copy of my new TextPipe filter.

Clipboard copy of Fix OSIS verse eID milestones.txt

  • This should be understandable by non-TextPipe users that are skilled programmers.
  • I can supply the .fll file upon request, should anyone be interested.

@DavidHaslam
Copy link
Author

Any further thoughts on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants