Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deinflect Json overhaul #433

Merged
merged 18 commits into from
Feb 4, 2024
Merged

Conversation

Casheeew
Copy link
Member

See #432

Copy link

github-actions bot commented Dec 23, 2023

⚠️ Visual differences introduced by this PR; please validate if they are desirable.

View Playwright Report (note: open the "playwright-report" artifact)

@Casheeew
Copy link
Member Author

Casheeew commented Dec 28, 2023

@Casheeew
Copy link
Member Author

Casheeew commented Feb 2, 2024

ill leave 古語 to a later pr as this one is quite large on its own

@Casheeew
Copy link
Member Author

Casheeew commented Feb 2, 2024

@toasted-nutbread about the CI test fail, am I just supposed to run npm run test-code-write?

@Casheeew
Copy link
Member Author

Casheeew commented Feb 2, 2024

One more concern I have about this PR is how noisy the added deinflection rules could potentially be, but I guess we will have to see after some user testing.

@toasted-nutbread
Copy link

You can run npm run test-code-write at any time to find out if that's the issue, if it's not the cause of the problem, the file just won't change. At worst, it will write erroneous test result data into the files, but that would be caught upon pull review.

@Casheeew Casheeew marked this pull request as ready for review February 3, 2024 02:11
@Casheeew Casheeew requested a review from a team as a code owner February 3, 2024 02:11
Copy link

codspeed-hq bot commented Feb 3, 2024

CodSpeed Performance Report

Merging #433 will degrade performances by 26.28%

⚠️ No base runs were found

Falling back to comparing Scrub1492:deinflect-overhaul (df75839) with master (f7da8a9)

Summary

❌ 1 (👁 1) regressions

Benchmarks breakdown

Benchmark master Scrub1492:deinflect-overhaul Change
👁 transformations 235.4 ms 319.3 ms -26.28%

Copy link
Collaborator

@djahandarie djahandarie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but honestly I don't quite understand how these rules work yet so I'll leave the final review to toasted-nutbread.

@toasted-nutbread
Copy link

Are there formal names for these word forms in Japanese that you are aware of?

@Casheeew
Copy link
Member Author

Casheeew commented Feb 4, 2024

Are there formal names for these word forms in Japanese that you are aware of?

The kansai-ben transformations are not really word forms per se but more sound changes. See ウ音便 section in the wikipedia page

Theres probably a name for the other transformations I have added though other than slang

@Casheeew
Copy link
Member Author

Casheeew commented Feb 4, 2024

I think their i18n could be added together with the other transforms in another pr

@toasted-nutbread
Copy link

Yeah I'm fine with that, I was just curious. I'm probably not going to be super helpful at being able to add all of those myself, so hoping others are more knowledgeable than me on that front when it comes to adding those.

@djahandarie
Copy link
Collaborator

Actually, I noticed that this causes ありがたい to show up before ありがとう when you hover over ありがとう. Is there some way we can make that not happen? I.e., prefer a direct match over a deinflected word.

Also, in the case of ありがとう, it's not really accurate that it's "kansai-ben" tbh... Same with おはようございます and so on. Maybe we should just call it u-onbin. Though that's less understandable for the cases that are actually kansai-ben... 🤔

@Casheeew
Copy link
Member Author

Casheeew commented Feb 4, 2024

weird, I thought that it was always the case that direct matches were precedented over deinflections. Maybe something broke with the recent deinflector changes. Or maybe it was never the case to begin with.

@Casheeew
Copy link
Member Author

Casheeew commented Feb 4, 2024

Also, in the case of ありがとう, it's not really accurate that it's "kansai-ben" tbh... Same with おはようございます and so on. Maybe we should just call it u-onbin. Though that's less understandable for the cases that are actually kansai-ben... 🤔

I think the job of the language transformer is just to list every possible transform, so it is unavoidable that theres noisy information, provided that its making transformations based on a prescriptive ruleset. The closest we can get is probably a blacklist for certain exceptions.
Which was also what this comment was about #433 (comment)

@Casheeew
Copy link
Member Author

Casheeew commented Feb 4, 2024

Actually, I noticed that this causes ありがたい to show up before ありがとう when you hover over ありがとう. Is there some way we can make that not happen? I.e., prefer a direct match over a deinflected word.

ありがとう shows up to me before ありがたい normally:
image

Can you check again?

@toasted-nutbread
Copy link

I see the same thing.

image

@MarvNC
Copy link
Member

MarvNC commented Feb 4, 2024

Possibly related: FooSoft#2082

I think the issue here could be addressed separately.

@Casheeew
Copy link
Member Author

Casheeew commented Feb 4, 2024

I replicated the problem with JMDict but cannot replicate with Jitendex. It is probably some quirky dictionary-specific problem.
It is probably an issue to be addressed separately

@djahandarie
Copy link
Collaborator

OK, let's make sure to try and fix FooSoft#2082 or whatever the issue is here.

@djahandarie djahandarie added this pull request to the merge queue Feb 4, 2024
Merged via the queue into yomidevs:master with commit 63a3817 Feb 4, 2024
7 checks passed
@djahandarie djahandarie added the kind/meta The issue or PR is meta label Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/meta The issue or PR is meta
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants