Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf8n_to_uvchr_msgs(): Macroïze a common paradigm #22762

Merged
merged 1 commit into from
Nov 27, 2024

Conversation

khwilliamson
Copy link
Contributor

@khwilliamson khwilliamson commented Nov 18, 2024

Each of the cases in the loop had a somewhat obscure conditional that had a few variants. This commit creates a macro that puts the complication in one place.

It also fixes a bug found by more robust test cases, yet to be committed. The function can either raise a warning (if appropriate) or pass the needed information about the warning back to the caller when the function parameters say to. The data for each should be identical, but prior to this commit, they weren't in one unlikely case.

This happened when the input UTF-8 sequence represents a code point whose value doesn't fit in the platform's word size. This is viewed as a malformation, and, if enabled, a warning using the WARN_UTF8 category is raised. But if disabled, another way to look at it is that this is an attempt to use a code point that isn't legal Unicode. There is another warnings category for that, WARN_NON_UNICODE. And, so a warning is raised if that category is enabled.

Note that WARN_NON_UNICODE is a subcategory of WARN_UTF8, so the only way to get to this situation is

no warnings 'utf8'; use warnings 'non_unicode';

(those two statements could be separated by many lines)

Prior to this commit, if the caller asked for the warning information to be passed to it instead of raising the warnings, WARN_NON_UNICODE never was returned, making the two modes sometimes inconsistent.

With this commit, WARN_NON_UNICODE is passed to the caller if (and only if) a warning would otherwise have been generated using it.

utf8.c Outdated Show resolved Hide resolved
utf8.c Outdated Show resolved Hide resolved
utf8.c Outdated Show resolved Hide resolved
utf8.c Outdated
Comment on lines 1717 to 1733
: ((msgs) \
/* Here are to return a warning message. Choose the \
* highest priority enabled category, or 'warning' \
* if neither is enabled */ \
? ((ckWARN_d(warning)) \
? warning \
: ((extra_ckWARN(extra_category) \
? extra_category \
: warning))) \
/* Here are to raise a warning if either category is \
* enabled. Return the highest priority enabled \
* one, or 0 if neither is enabled */ \
: ((ckWARN_d(warning) \
? warning \
: ((extra_ckWARN(extra_category) \
? extra_category \
: 0))))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be simpler if you invert the order of checks (i.e. check msgs on the inside)?

             : ((ckWARN_d(warning))                                   \
                 ? warning                                            \
                 : ((extra_ckWARN(extra_category)                     \
                    ? extra_category                                  \
                    : (msgs) ? warning : 0))))                        \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I got fixated on checking msgs first because it is cheaper than the function calls, and lost sight of the fact that that is irrelevant in this case because there is no possible path through the code that doesn't also have a function call

Comment on lines +1787 to +1768
pack_warn = NEED_MESSAGE(WARN_UTF8,
ckWARN_d, WARN_NON_UNICODE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same logic. In the old version, if msgs was true, ckWARN_d(WARN_UTF8) false, ckWARN_d(WARN_NON_UNICODE) true, we still used WARN_UTF8. With the new macro, we now use WARN_NON_UNICODE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; it is a bug fix. I don't know why I thought it not necessary to point that out in the commit message. Now revised

@khwilliamson khwilliamson changed the title utf8.c: Macroize a common paradigm utf8n_to_uvchr_msgs(): Macroize a common paradigm Nov 19, 2024
@khwilliamson
Copy link
Contributor Author

I decided that the behavior change should be in a separate pull request, and so I created #22767

@khwilliamson khwilliamson changed the title utf8n_to_uvchr_msgs(): Macroize a common paradigm utf8n_to_uvchr_msgs(): Macroïze a common paradigm Nov 25, 2024
Each of the cases in the loop had a somewhat obscure conditional that
had a few variants.  This commit creates a macro that puts the
complication in one place.
@khwilliamson khwilliamson merged commit 80f266d into Perl:blead Nov 27, 2024
33 checks passed
@khwilliamson khwilliamson deleted the utf8_macroize branch November 27, 2024 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants