Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On the subject of the schema for frequency meta tags #1594

Closed
precondition opened this issue Nov 21, 2024 · 4 comments
Closed

On the subject of the schema for frequency meta tags #1594

precondition opened this issue Nov 21, 2024 · 4 comments
Labels
kind/support The issue is a support request / question

Comments

@precondition
Copy link

precondition commented Nov 21, 2024

I'm trying to make sense of https://github.com/yomidevs/yomitan/tree/master/ext/data/schemas/dictionary-term-meta-bank-v3-schema.json in order to better support frequency tags in Memento. To clarify, I am not looking to create a Yomitan dictionary, I am looking to implement the Yomitan dictionary format in an application that should be able to read and parse any Yomitan dictionary thrown its way.

The JSON schema syntax is very hard to read so can you double-check if my reasoning is correct?

There are 3+ scenarios for the format of a frequency tag in a term_meta_bank_###.json file:

  1. First scenario:
[
     "<term>","freq",{"reading":"<reading>","frequency":<number>}
 ]

Coming from:

{},
{"const": "freq"},
{
    "oneOf": [
          [...snipped irrelevant option...]
        {
            "type": "object",
            "required": [
                "reading",
                "frequency"
            ],
            "additionalProperties": false,
            "properties": {
                "reading": {
                    "type": "string",
                    "description": "Reading for the term."
                },
                "frequency": {
                    "type": "number" // expanded from one of the options in  "$ref": "#/definitions/frequency"
                    "description": "Frequency information for the term."
                }
            }
        }
    ]
}
  1. Second scenario:
[
     "<term>","freq",{"reading":"<reading>","frequency": "<frequency string>">}
 ]

Coming from:

{},
{"const": "freq"},
{
    "oneOf": [
          [...snipped irrelevant option...]
        {
            "type": "object",
            "required": [
                "reading",
                "frequency"
            ],
            "additionalProperties": false,
            "properties": {
                "reading": {
                    "type": "string",
                    "description": "Reading for the term."
                },
                "frequency": {
                    "type": "string" // expanded from one of the options in  "$ref": "#/definitions/frequency"
                    "description": "Frequency information for the term."
                }
            }
        }
    ]
}
  1. Third scenario:
[
     "<term>","freq",
     {"reading":"<reading>",
        "frequency": {"value": <number>, "displayValue": "<stylized frequency string>"}
     }
 ]

Coming from:

{},
{"const": "freq"},
{
    "oneOf": [
          [...snipped irrelevant option...]
        {
            "type": "object",
            "required": [
                "reading",
                "frequency"
            ],
            "additionalProperties": false,
            "properties": {
                "reading": {
                    "type": "string",
                    "description": "Reading for the term."
                },
                "frequency": {
// start expansion of second option in  "$ref": "#/definitions/frequency"
                    "type": "object",
                    "additionalProperties": false,
                    "required": [
                        "value"
                    ],
                    "properties": {
                        "value": {
                            "type": "number"
                        },
                        "displayValue": {
                            "type": "string"
                        }
                    }
// end expansion of second option in  "$ref": "#/definitions/frequency"
                    "description": "Frequency information for the term."
                }
            }
        }
    ]
}

So far, I've been omitting the first oneOf option:

{
    "$ref": "#/definitions/frequency",
    "description": "Frequency information for the term."
},

so am I understanding correctly that the following scenarios are also possible?

  1. Fourth scenario:
[
     "<term>","freq", "<stylized frequency string>"
 ]

Coming from:

image

  1. Fifth scenario:
[
     "<term>","freq", <number>
 ]

Coming from
image

  1. Sixth scenario:
[
    "<term>","freq",{"value": <number>, "displayValue": "<stylized frequency string>"}
]
  1. Seventh scenario:
[
    "<term>","freq",{"value": <number>}
]

Both coming from:

image

with the fact that reading is mandatory but displayValue is optional.





I tried to follow the schema when extending the meta term bank support of Memento in ripose-jp/Memento#237 but I did not realize that the formats such as №6 were legal.
I realized this when coming across this entry from JPDB_v2.1_kana_2024-05-26 shared on TMW server:

["アラフォー","freq",{"value":61709,"displayValue":"61709㋕"}]

Are there more formats that I am missing?

@Kuuuube
Copy link
Member

Kuuuube commented Nov 21, 2024

Rather than figure out every possible combination that is accepted in the schema, I think it's better to consider what do you need.

Heres an example of a freq dict term_meta_bank_1.json with two terms and has both a value and displayValue:

[["","freq",{"value":1,"displayValue":"1"}],["言う","freq",{"value":2,"displayValue":"2"}]]

@Kuuuube Kuuuube added the kind/support The issue is a support request / question label Nov 21, 2024
@precondition
Copy link
Author

precondition commented Nov 21, 2024

It is already the second time that I am coming across yet another frequency tag format that requires adding upstream support so I'd rather make sure that I take everything into account this time instead of bumping against the same kind of issue again and again in the future as I use more dictionaries coming from various sources.

@Kuuuube
Copy link
Member

Kuuuube commented Nov 21, 2024

Ah, I misread. Thought you were creating a dictionary rather than implementing the yomitan format.

How about these cases:

["<term>","freq",{"value":<number>}]
["<term>","freq",{"reading":"<reading>","frequency":{"value":<number>}}]

@Kuuuube
Copy link
Member

Kuuuube commented Nov 25, 2024

Closing this as it should be finished.

@Kuuuube Kuuuube closed this as completed Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support The issue is a support request / question
Projects
None yet
Development

No branches or pull requests

2 participants