Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix deinflection bug #547

Merged

Conversation

StefanVukovic99
Copy link
Collaborator

With #503, more than one inflection rule chain is now shown. Seems the deinflector is producing an incorrect chain for -te form of ichidan verbs:
image

Not sure if the two subcategories of ichidan verbs in this PR are aptly named, but this seems to fix the problem.

@StefanVukovic99 StefanVukovic99 requested a review from a team as a code owner January 20, 2024 16:12
Copy link

github-actions bot commented Jan 20, 2024

✔️ No visual differences introduced by this PR.

View Playwright Report (note: open the "playwright-report" artifact)

@djahandarie
Copy link
Collaborator

I honestly don't quite understand what's going on in the deinflector logic to determine if this is right (and whether there's a better naming option) so hopefully toasted-nutbread can provide some insight.

@Casheeew
Copy link
Member

lgtm

@toasted-nutbread
Copy link

I believe the reason this works is because of some shenanigans with the bitflags and the way that the Translator class uses Deinflector.rulesMatch. Specifically, this code:

const definitionRules = Deinflector.rulesToRuleFlags(databaseEntry.rules);
for (const deinflection of uniqueDeinflectionArrays[databaseEntry.index]) {
    const deinflectionRules = deinflection.rules;
    if (!partsOfSpeechFilter || Deinflector.rulesMatch(deinflectionRules, definitionRules)) {
        deinflection.databaseEntries.push(databaseEntry);
    }
}

Deinflector.rulesMatch is only checking to see if any of the bits match between the two rules passed into it, so v1d and v1p will both be matched by terms with the v1 flag.

This is seems to be why these rules are set up this way:

        ['v1',    /** @type {import('translation-internal').DeinflectionRuleFlags} */ (0b000000011)], // Verb ichidan
        ['v1d',   /** @type {import('translation-internal').DeinflectionRuleFlags} */ (0b000000010)], // Verb ichidan dictionary form
        ['v1p',   /** @type {import('translation-internal').DeinflectionRuleFlags} */ (0b000000001)], // Verb ichidan progressive or perfect

This functions as a way to just prevent this from happening:

食べて
  => ('て' => 'てる' via 'masu stem'; rules='v1')
食べてる
  => ('てる' => 'て' via 'progressive or perfect'; rules='iru')
食べて
  => ('て' => 'る' via '-te'; rules='v1')
食べる

With the new rules:

食べて
  => ('て' => 'てる' via 'masu stem'; rules='v1d')
食べてる
  => attempts: ('てる' => 'て' via 'progressive or perfect')
     However, rulesIn='v1p', which doesn't match 'v1d', so this doesn't work
<stop>

@StefanVukovic99
Copy link
Collaborator Author

Exactly.

@toasted-nutbread
Copy link

I think the overall idea is that v1d is used as a sort of termination indicator to prevent anything that has been converted from masu stem to have a ending, that can no longer be used for the progressive or perfect conversion, since that is effectively undoing the addition of the .

Generally I don't see a huge issue with it being done this way, I can't think of any immediate edge cases where this breaks down. The only thing is that it makes the subtleties of the logic a bit more confusing to understand I suppose, due to the idiosyncrasies of the bitflags test I mentioned above, but that's fine. I think that issue can be addressed by adjusting the structure of the deinflect.json file, which I think would also benefit other language support as well. I'll try looking into that at some point.

This was referenced Jan 21, 2024
Copy link
Collaborator

@djahandarie djahandarie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for breaking it down. I think I understand, and approve of the change. That said, this introduces even more complexity into the already confusing logic here, so I definitely feel a need for some rework of the deinflector overall, as well as deinflection rule names to be a little less opaque, with more clarity of intent of each part in the naming and structure.

@djahandarie djahandarie added this pull request to the merge queue Jan 27, 2024
Merged via the queue into yomidevs:master with commit a3224f6 Jan 27, 2024
5 checks passed
@djahandarie djahandarie added the kind/bug The issue or PR is regarding a bug label Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug The issue or PR is regarding a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants