You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm messing around with the idea of updating the structure of the deinflection file to support a few things:
Better clarity - things like fix deinflection bug #547 would be a bit more explicitly defined rather than having to manually define it using bitflags magic.
More generalized - should be more generalized for other languages and use less Japanese-specific naming.
Internationalization - names/descriptions can have different variations provided for other languages.
Extensibility - Internationalization features and new rules can eventually be imported into a single deinflector. This will require changes to the deinflector code obviously, but the intent is to make the source data format more conducive for this.
Cleaner code - less manual definition of bitflags will be needed; the bitflags can be automatically generated from the input file(s).
So here's somewhat of a preview of what might work:
The rules.*.partsOfSpeech corresponds to the [3] field of each source definition in dictionaries. In general, these correlate with the name of the rule, but do not need to match.
In the rules.v1 entry, note the "subRules": ["v1d", "v1p"] declaration. This is intended to map to the changes in fix deinflection bug #547.
The i18n will be used to provide language translations, which are optional. Dictionary or localization providers should also be able to eventually provide these separately if desired.
As mentioned in this comment, I named the deinflection list to transforms for generalization.
The individual declarations themselves are just listed as variants.
kana(In|Out) has been renamed to suffix(In|Out).
(I'm not that good at technical Japanese to say that all of my i18n's are correct, so feel free to correct anything that's wrong/missing if we proceed with this.)
Thoughts on naming:
Overall, I'm not sure what the best naming strategy is for everything in here, so I'm open to suggestion. Primarily, I'm not sure if "rule" is a good name for how it's being used here. Similar for "variants", but I couldn't immediately think of anything that is more clear. I tried to avoid having both "rule" and "reason" since I think the two can be easily confused. So some of the current types I'm looking at for the raw JSON file would be something like:
Transformation
TransformationVariant
TransformationRule
Again please provide any thoughts on alternate ways to name these.
Overall, I'm not sure what the best naming strategy is for everything in here, so I'm open to suggestion. Primarily, I'm not sure if "rule" is a good name for how it's being used here. Similar for "variants", but I couldn't immediately think of anything that is more clear. I tried to avoid having both "rule" and "reason" since I think the two can be easily confused. So some of the current types I'm looking at for the raw JSON file would be something like:
Transformation
TransformationVariant
TransformationRule
Again please provide any thoughts on alternate ways to name these.
I think a good name should express that Transformation should be bigger than TransformationVariant. I was pretty confused as to what TransformationVariant was supposed to be when I first read the deinflector code (I didn't think it as a subclass of transformations)
some names i can think of: Transformation and AtomicTransformation (inspired from Rust and cpp, also has the benefit that atomics mean that this is the smallest case possible (so the relationship can be easily inferred from Transformation or TransformationChain etc)) TransformationGroup and Transformation Transformation and TransformationCase
I'm messing around with the idea of updating the structure of the deinflection file to support a few things:
So here's somewhat of a preview of what might work:
A few notes:
rules.*.partsOfSpeech
corresponds to the[3]
field of each source definition in dictionaries. In general, these correlate with the name of the rule, but do not need to match.rules.v1
entry, note the"subRules": ["v1d", "v1p"]
declaration. This is intended to map to the changes in fix deinflection bug #547.i18n
will be used to provide language translations, which are optional. Dictionary or localization providers should also be able to eventually provide these separately if desired.transforms
for generalization.variants
.kana(In|Out)
has been renamed tosuffix(In|Out)
.Thoughts on naming:
Overall, I'm not sure what the best naming strategy is for everything in here, so I'm open to suggestion. Primarily, I'm not sure if "rule" is a good name for how it's being used here. Similar for "variants", but I couldn't immediately think of anything that is more clear. I tried to avoid having both "rule" and "reason" since I think the two can be easily confused. So some of the current types I'm looking at for the raw JSON file would be something like:
Transformation
TransformationVariant
TransformationRule
Again please provide any thoughts on alternate ways to name these.
Related links:
The text was updated successfully, but these errors were encountered: