-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deinflect Json overhaul #433
Conversation
View Playwright Report (note: open the "playwright-report" artifact) |
|
ill leave 古語 to a later pr as this one is quite large on its own |
@toasted-nutbread about the CI test fail, am I just supposed to run |
One more concern I have about this PR is how noisy the added deinflection rules could potentially be, but I guess we will have to see after some user testing. |
You can run |
CodSpeed Performance ReportMerging #433 will degrade performances by 26.28%Falling back to comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but honestly I don't quite understand how these rules work yet so I'll leave the final review to toasted-nutbread.
Are there formal names for these word forms in Japanese that you are aware of? |
The kansai-ben transformations are not really word forms per se but more sound changes. See ウ音便 section in the wikipedia page Theres probably a name for the other transformations I have added though other than |
I think their i18n could be added together with the other transforms in another pr |
Yeah I'm fine with that, I was just curious. I'm probably not going to be super helpful at being able to add all of those myself, so hoping others are more knowledgeable than me on that front when it comes to adding those. |
Actually, I noticed that this causes ありがたい to show up before ありがとう when you hover over ありがとう. Is there some way we can make that not happen? I.e., prefer a direct match over a deinflected word. Also, in the case of ありがとう, it's not really accurate that it's "kansai-ben" tbh... Same with おはようございます and so on. Maybe we should just call it u-onbin. Though that's less understandable for the cases that are actually kansai-ben... 🤔 |
weird, I thought that it was always the case that direct matches were precedented over deinflections. Maybe something broke with the recent deinflector changes. Or maybe it was never the case to begin with. |
I think the job of the language transformer is just to list every possible transform, so it is unavoidable that theres noisy information, provided that its making transformations based on a prescriptive ruleset. The closest we can get is probably a blacklist for certain exceptions. |
Possibly related: FooSoft#2082 I think the issue here could be addressed separately. |
I replicated the problem with JMDict but cannot replicate with Jitendex. It is probably some quirky dictionary-specific problem. |
OK, let's make sure to try and fix FooSoft#2082 or whatever the issue is here. |
See #432