Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] How to get same behaviour of po4a in wrap? #140

Open
Freed-Wu opened this issue May 30, 2023 · 3 comments
Open

[question] How to get same behaviour of po4a in wrap? #140

Freed-Wu opened this issue May 30, 2023 · 3 comments

Comments

@Freed-Wu
Copy link

$ po4a --help
...
    --wrap-po no|newlines|number (default: 76)
        Specify how the po file should be wrapped. This gives the choice
        between either files that are nicely wrapped but could lead to git
        conflicts, or files that are easier to handle automatically, but
        harder to read for humans.

        Historically, the gettext suite has reformatted the po files at the
        77th column for cosmetics. This option specifies the behavior of
        po4a. If set to a numerical value, po4a will wrap the po file after
        this column and after newlines in the content. If set to newlines,
        po4a will only split the msgid and msgstr after newlines in the
        content. If set to no, po4a will not wrap the po file at all. The
        reference comments are always wrapped by the gettext tools that we
        use internally.

        Note that this option has no impact on how the msgid and msgstr are
        wrapped, ie on how newlines are added to the content of these
        strings.
...

https://polib.readthedocs.io/en/latest/api.html#polib.pofile

wrapwidth
integer, the wrap width, only useful when the -w option was passed to xgettext (optional, default: 78).

However,

import polib
po = polib.pofile("XXX.po", wrapwidth=76)
po.save()

we get:

#. type: Plain text
#: tmux-prefix/src/tmux/tmux.1:202
msgid ""
"Do not start the server even if the command would normally do so (for "
"example E<.Ic new-session> or E<.Ic start-server>)."
msgstr "即使命令通常会启动服务器(例如 E<.Ic new-session> 或 E<.Ic start-server>),也不要启动服务器。"

the linewidth is not 76. Why?

Then po4a, the po file's linewidth still change.

#. type: Plain text
#: tmux-prefix/src/tmux/tmux.1:202
msgid ""
"Do not start the server even if the command would normally do so (for "
"example E<.Ic new-session> or E<.Ic start-server>)."
msgstr ""
"即使命令通常会启动服务器(例如 E<.Ic new-session> 或 E<.Ic start-server>),"
"也不要启动服务器。"

the linewidth of 即使命令通常会启动服务器(例如 E<.Ic new-session> 或 E<.Ic start-server>), is 76. Right, it is my need. So how to get same behaviour of po4a in wrap?

@izimobil
Copy link
Owner

I haven't any clue about what po4a is ?!
Please do your own research with this issue (that is still valid): #96

@Freed-Wu
Copy link
Author

Freed-Wu commented May 31, 2023

Oh, sorry, let me say more clearly:

import polib
po = polib.pofile("XXX.po", wrapwidth=76)
po.save()
#. type: Plain text
#: tmux-prefix/src/tmux/tmux.1:202
msgid ""
"Do not start the server even if the command would normally do so (for "
"example E<.Ic new-session> or E<.Ic start-server>)."
msgstr "即使命令通常会启动服务器(例如 E<.Ic new-session> 或 E<.Ic start-server>),也不要启动服务器。"

Because

len("Do not start the server even if the command would normally do so (for ") < 76 < len("Do not start the server even if the command would normally do so (for example")

the english text is wraped correctly. However, the chinese text is different, one Chinese character has two line width.

Although

len("即使命令通常会启动服务器(例如 E<.Ic new-session> 或 E<.Ic start-server>),也不要启动服务器。") < 76

but the true line width > 76, so it need to be wraped. In this situation, po4a give a correct behaviour.

python has a library named https://github.com/jquast/wcwidth can calculate the correct line width of Chinese characters.

@kthy
Copy link
Contributor

kthy commented May 31, 2023

So … the real title for this issue should probably be

wrapwidth does not take into account double-width unicode characters

and the answer, like in #96, is that this is standard behaviour of the textwrap module in Python's stdlib. Please file a bug against that, if you would like this fixed at source.

Edit: this is probably the relevant issue to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants