Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format: include :l if identical to :correct? #80

Open
nschneid opened this issue Apr 1, 2023 · 0 comments
Open

Format: include :l if identical to :correct? #80

nschneid opened this issue Apr 1, 2023 · 0 comments

Comments

@nschneid
Copy link
Contributor

nschneid commented Apr 1, 2023

Lemmas are specified if distinct from the word form, but what if there is a correction and the lemma is not distinct from the correction? It seems the trees are not consistent on this point. align_tokens.py is adding lemmas from UD despite identical corrections.

Consider insertions and deletions. Is it odd to specify an empty-lemma for a deleted word?

Perhaps the most intuitive policy is that if there is a :correct field, that value takes precedence when deciding whether the lemma needs to be explicit (and any deletion, i.e. :correct "", is assumed to have no lemma).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant