Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CollateX, bug, now with e-mail #85

Open
kkynde opened this issue Jul 9, 2023 · 2 comments
Open

CollateX, bug, now with e-mail #85

kkynde opened this issue Jul 9, 2023 · 2 comments

Comments

@kkynde
Copy link

kkynde commented Jul 9, 2023

I have an issue on CollateX, a bug actually, in the program, which I would like to present and ask how to deal with. Is there an e-mail to contact - I would like to include a (short) example.

Karsten Kynde
[email protected]

@rhdekker
Copy link
Member

Hi Karsten,

Gregor Middell forwarded your example data files to me. I ran the collation and looked at the internal alignment result, represented as a variant graph, meaning the result independent of the chosen output format and it looks as follows:

alignment_result_colx1_colx2

Which means that CollateX only finds two points of variation. One being a ":" (W1) replaced by a "," (W2). The other being "night" (W1) replaced by "knight" (W2). This seems to be correct to me.

If you agree with this then the question becomes how that internal result should be represented in the requested output format.

In the TEI output there is a <app><rdg wit="w1">Now, It was a dark and stormy</rdg><rdg wit="w2">Now, it was a dark and stormy</rdg></app><app> reading which is what I suspect your report Is about. CollateX doesn't find a meaningful semantic difference here, but it notices a differences in casing here: "it" versus "It". Changes in upper- and lowercasing are by default ignored during alignment, but we have to put them somewhere in the TEI to be able to reconstruct the original witnesses from the output. I think this is what causes the confusion or in other words the difference in expectations.

Before we discuss possible solutions: am I thinking in the right direction so far or is there something else that you wanted to bring to our attention with the example in your report?

Best,
Ronald

@kkynde
Copy link
Author

kkynde commented Jul 20, 2023

Dear Ronald Dekker

Thank you, very much, for your rapid reply. You are indeed thinking in the right direction.

It does confirm my suspicion that change of case somehow is a difference, somehow not.

Your graph is correct in the sense that the different cases does not constitute a 'semantical difference'. Never the less you have saved the 'not semantically different' version (It) somewhere. It is not represented in the graph (nor in the --format graphml output), but it is in the TEI output by two separate elements.

My problem is, that the not semantically difference (it vs. It) this way is mixed up with the truly invariant text surrounding it, which may be very comprehensive. I would have expected either (it and It are different)

Now, <app><rdg wit="w1">It</rdg><rdg wit="w2">it</rdg></app> was a dark and stormy

or (it an It are not different, consistent with the graph)

Now, it was a dark and stormy

I do catch your remark that the latter would prevent you to reconstruct the original witnesses. I also think the former was to prefer (the could be attributed type="notSemanticalDifference"), but you would not be able to construct it from your graph unless you make a recursive collation on the readings.

Have I missed something in the documentation that changes in upper and lower casing are by default ignored during alignment (BTW the same counts for change in spacing)? And does 'by default' mean that I can change it, like suggested in the documentation, by the --script option? If so, how do I do this (back to the first question)?

Yours,
Karsten Kynde

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants