Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misalignment #58

Open
djbpitt opened this issue Aug 16, 2018 · 2 comments
Open

Misalignment #58

djbpitt opened this issue Aug 16, 2018 · 2 comments

Comments

@djbpitt
Copy link
Collaborator

djbpitt commented Aug 16, 2018

In CollateX Python 2.1.3rc2, the input:

from collatex import *
collation = Collation()
collation.add_plain_witness("A", "The big, gray, fuzzy koala.")
collation.add_plain_witness("B","The big, old, gray koala:")
collation.add_plain_witness("C","The big, gray, fuzzy wombat.")
table = collate(collation, segmentation=False, near_match=True)
print(table)

produces (with or without near matching):

+---+-----+-----+---+------+---+-------+--------+---+
| A | The | big | , | gray | , | fuzzy | koala  | . |
| B | The | big | , | old  | , | gray  | koala  | : |
| C | The | big | , | gray | , | fuzzy | wombat | . |
+---+-----+-----+---+------+---+-------+--------+---+

This fails to align “gray”, which matches exactly in all witnesses. The desired alignment is:

+---+-----+-----+---+------+---+------+---+-------+--------+---+
| A | The | big | , |      |   | gray | , | fuzzy | koala  | . |
| B | The | big | , | old  | , | gray |   |       | koala  | : |
| C | The | big | , |      |   | gray | , | fuzzy | wombat | . |
+---+-----+-----+---+------+---+------+---+-------+--------+---+

This appears to be a transposition situation, where CollateX aligns the commas in preference to the words. A philologist would prioritize aligning the words.

@rhdekker
Copy link
Member

There are no token types in the Gothenburg model. The aligner does not know what punctuation is or what words are for that matter. If this feature is desired the model would need to be extended and the implementation(s) updated.

@tla
Copy link
Member

tla commented Nov 17, 2018

This is a great use case for the suggestion I just made in #69 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants