-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FONEM Rule V-10 #175
Comments
Hello @gewy. Would you have some time to open a PR on the subject along with a unit test? |
Hello @gewy. I just pushed a commit fixing rule V-10. I add to interpret some details of the paper to make this work because the way the algo is described is not completely sound. What do you think of the solution? |
Hi, My implementation in Java : BTW I will check but I am not sure that C-27 and C-28 are corrects either. |
Unfortunately JavaScript does not support lookbehind assertions in regex (at least not all engines, since lookbehinds were added recently to the specs).
Fair enough. Tell me when you know and I'll make the required changes on my side. |
new Rule("V-10", "(^|[^aeiouy])y|y([^aeiouy]|$)", "$1I$2"); |
C-27 the document says Z with vowels BEFORE and you regex is Z(?=${V}) |
C-28 exclude SS between vowels, your regex check the right side only (cf. V-10) |
I have simplified V-10 rule as per your suggestion. Concerning C-27, I have an interpretation question: should |
I have updated rule C-28. |
Same feeling about rules. Anyway I was using uppercase and lowercase to
easily identify the applied rules for my testing. Then I add the
CASE_INSENSITIVE property to the matcher object.
Le mer. 2 sept. 2020 à 17:22, Guillaume Plique <[email protected]> a
écrit :
… Also, I rely on some weird adhoc rule ordering because the paper's rules
were not finely thought out but you seem to rely on an uppercase/lowercase
trick to do the same. Do you find it easier likewise?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#175 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALPWWDSFWDFU7NUU6LM723SDZPL5ANCNFSM4QGWQY3A>
.
|
So what did you choose regarding C-27? Do you get |
Well, with all the phonetic algorithms on family names I have tested, I
had counter examples.
If you try to change a rule for one case you will probably trigger other
weird cases.
Le mer. 2 sept. 2020 à 17:21, Guillaume Plique <[email protected]> a
écrit :
… I have simplified V-10 rule as per your suggestion. Concerning C-27, I
have an interpretation question: should OZOUADE finally be OSWADE then (I
am fine with this). But should POUYEZ become POUYES as per C-27 (I am
less fine with this).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#175 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALPWWAHAFEQAK4YB4Y6MKDSDZPIHANCNFSM4QGWQY3A>
.
|
Yes, I try to apply the rules strictly as they are in the document (or as I
understand them...)
Anyway I am more disturbed by this cases :
MAINARD -> MINAR
MENNAR -> MENAR
MEINNART -> MEINAR
RAIMOND -> RINON
RAYMOND -> RAIMON
May be linked to the rules order.
(V-18)[rINond](C-29)[rINon]RAIMOND -> RINON
(V-10)[raImond](C-29)[raImon]RAYMOND -> RAIMON
If I put V-10 before V-18
(V-18)[rINond](C-29)[rINon]RAIMOND -> RINON
(V-10)[raImond](V-18)[rINond](C-29)[rINon]RAYMOND -> RINON
Anyway :
REIMON -> REIMON
(C-28)[remont](C-29)[remon]REMMONT -> REMON
REMON -> REMON
Anyway I still don't have validate the choice to use this algorithm.
Le jeu. 3 sept. 2020 à 10:53, Guillaume Plique <[email protected]> a
écrit :
… So what did you choose regarding C-27? Do you get POUYES?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#175 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALPWWGVNQODSQTQ2CKTWXTSD5KRZANCNFSM4QGWQY3A>
.
|
Yes, this algorithm is not very good outside of its original goal to match names from Saguenay etc. I work on a personal algorithm for French that is way better but is geared to keep vocalization. |
Hi,
Rule V-10 seams to be incorrect.
The paper say : "Replace Y by I except if Y is between two vowels".
TYOU and YOU should give TIOU and IOU and not be inchanged.
Regards
The text was updated successfully, but these errors were encountered: