Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fullwidth characters support #460

Open
weii41392 opened this issue Nov 17, 2023 · 2 comments · Fixed by #461
Open

feat: fullwidth characters support #460

weii41392 opened this issue Nov 17, 2023 · 2 comments · Fixed by #461
Assignees
Labels
parsing Related to string parsing

Comments

@weii41392
Copy link
Contributor

Currently the parser can recognize opening parentheses and closing parentheses and exclude closing parentheses when appropriate, while we don't have the same behavior with fullwidth characters. See this example:

import { tokenize } from "linkifyjs";

const links = [
    "http://foo.com/blah_blah",
    "http://foo.com/blah_blah_(wikipedia)_(again)"
];

const texts = [
    `${links[0]} ${links[1]}`,
    `Link 1(${links[0]}) Link 2(${links[1]})`,      // halfwidth parentheses
    `Link 1(${links[0]}) Link 2(${links[1]})`,   // fullwidth parentheses
];

for (const text of texts) {
    const tokens = tokenize(text);
    tokens.filter(token => token.isLink).forEach((token) => console.log(`"${token.v}"`));
}

// texts[0]: succeed without parentheses
// "http://foo.com/blah_blah"
// "http://foo.com/blah_blah_(wikipedia)_(again)"

// texts[1]: succeed with halfwidth parentheses
// "http://foo.com/blah_blah"
// "http://foo.com/blah_blah_(wikipedia)_(again)"

// texts[2]: fail to handle fullwidth parentheses
// "http://foo.com/blah_blah)"
// "http://foo.com/blah_blah_(wikipedia)_(again))"

My proposal is to define fullwidth characters as tokens, and add new behaviors in the parser.
The logic should be fairly simple as fullwidth brackets are semantically the same as their halfwidth counterparts.
(In our use case we care more about fullwidth parentheses (), but in general this can apply to other fullwidth characters, e.g. 「」『』<>.)

@nfrasser
Copy link
Owner

@weii41392 thanks for the report and the fix! This has been released in the latest linkifyjs v4.1.3

@weii41392
Copy link
Contributor Author

@weii41392 thanks for the report and the fix! This has been released in the latest linkifyjs v4.1.3

Thank you @nfrasser! But with further testing we found that the current logic doesn't work as expected. Is this intended or can we also modify this behavior?

Work as expected

Not work as expected

Different from English, we don't add whitespaces in Chinese (at least in formal writing). That's why http://foo.com/blah_blah) withWhitespace works for English convention but http://foo.com/blah_blah)withoutWhitespace doesn't work for Chinese.

@nfrasser nfrasser reopened this Nov 22, 2023
@nfrasser nfrasser self-assigned this Dec 4, 2024
@nfrasser nfrasser added the parsing Related to string parsing label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parsing Related to string parsing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants