utf8n_to_uvchr_msgs(): Macroïze a common paradigm #22762

khwilliamson · 2024-11-18T20:58:30Z

Each of the cases in the loop had a somewhat obscure conditional that had a few variants. This commit creates a macro that puts the complication in one place.

It also fixes a bug found by more robust test cases, yet to be committed. The function can either raise a warning (if appropriate) or pass the needed information about the warning back to the caller when the function parameters say to. The data for each should be identical, but prior to this commit, they weren't in one unlikely case.

This happened when the input UTF-8 sequence represents a code point whose value doesn't fit in the platform's word size. This is viewed as a malformation, and, if enabled, a warning using the WARN_UTF8 category is raised. But if disabled, another way to look at it is that this is an attempt to use a code point that isn't legal Unicode. There is another warnings category for that, WARN_NON_UNICODE. And, so a warning is raised if that category is enabled.

Note that WARN_NON_UNICODE is a subcategory of WARN_UTF8, so the only way to get to this situation is

no warnings 'utf8'; use warnings 'non_unicode';

(those two statements could be separated by many lines)

Prior to this commit, if the caller asked for the warning information to be passed to it instead of raising the warnings, WARN_NON_UNICODE never was returned, making the two modes sometimes inconsistent.

With this commit, WARN_NON_UNICODE is passed to the caller if (and only if) a warning would otherwise have been generated using it.

utf8.c

mauke · 2024-11-18T21:10:04Z

utf8.c

+             : ((msgs)                                                  \
+                   /* Here are to return a warning message.  Choose the \
+                    * highest priority enabled category, or 'warning'   \
+                    * if neither is enabled */                          \
+                ? ((ckWARN_d(warning))                                  \
+                   ? warning                                            \
+                   : ((extra_ckWARN(extra_category)                     \
+                      ? extra_category                                  \
+                      : warning)))                                      \
+                   /* Here are to raise a warning if either category is \
+                    * enabled.  Return the highest priority enabled     \
+                    * one, or 0 if neither is enabled */                \
+                : ((ckWARN_d(warning)                                   \
+                   ? warning                                            \
+                   : ((extra_ckWARN(extra_category)                     \
+                      ? extra_category                                  \
+                      : 0))))))


Wouldn't this be simpler if you invert the order of checks (i.e. check msgs on the inside)?

: ((ckWARN_d(warning)) \ ? warning \ : ((extra_ckWARN(extra_category) \ ? extra_category \ : (msgs) ? warning : 0)))) \

Yes, I got fixated on checking msgs first because it is cheaper than the function calls, and lost sight of the fact that that is irrelevant in this case because there is no possible path through the code that doesn't also have a function call

mauke · 2024-11-18T21:15:08Z

utf8.c

+                    pack_warn = NEED_MESSAGE(WARN_UTF8,
+                                             ckWARN_d, WARN_NON_UNICODE);


This is not the same logic. In the old version, if msgs was true, ckWARN_d(WARN_UTF8) false, ckWARN_d(WARN_NON_UNICODE) true, we still used WARN_UTF8. With the new macro, we now use WARN_NON_UNICODE.

Yes; it is a bug fix. I don't know why I thought it not necessary to point that out in the commit message. Now revised

khwilliamson · 2024-11-20T04:34:51Z

I decided that the behavior change should be in a separate pull request, and so I created #22767

Each of the cases in the loop had a somewhat obscure conditional that had a few variants. This commit creates a macro that puts the complication in one place.

mauke reviewed Nov 18, 2024

View reviewed changes

utf8.c Outdated Show resolved Hide resolved

mauke reviewed Nov 18, 2024

View reviewed changes

utf8.c Outdated Show resolved Hide resolved

mauke reviewed Nov 18, 2024

View reviewed changes

utf8.c Outdated Show resolved Hide resolved

mauke reviewed Nov 18, 2024

View reviewed changes

khwilliamson force-pushed the utf8_macroize branch from 858acff to 45ab74e Compare November 19, 2024 15:40

khwilliamson changed the title ~~utf8.c: Macroize a common paradigm~~ utf8n_to_uvchr_msgs(): Macroize a common paradigm Nov 19, 2024

khwilliamson force-pushed the utf8_macroize branch from 45ab74e to da8e8fa Compare November 25, 2024 17:30

khwilliamson changed the title ~~utf8n_to_uvchr_msgs(): Macroize a common paradigm~~ utf8n_to_uvchr_msgs(): Macroïze a common paradigm Nov 25, 2024

khwilliamson force-pushed the utf8_macroize branch from da8e8fa to 5870272 Compare November 25, 2024 19:43

github-actions bot added the hasConflicts label Nov 26, 2024

khwilliamson force-pushed the utf8_macroize branch from 5870272 to 8f7d284 Compare November 26, 2024 17:57

utf8n_to_uvchr_msgs(): Macroize a common paradigm

dbf5e7f

Each of the cases in the loop had a somewhat obscure conditional that had a few variants. This commit creates a macro that puts the complication in one place.

khwilliamson force-pushed the utf8_macroize branch from 8f7d284 to dbf5e7f Compare November 27, 2024 01:27

github-actions bot removed the hasConflicts label Nov 27, 2024

khwilliamson merged commit 80f266d into Perl:blead Nov 27, 2024
33 checks passed

khwilliamson deleted the utf8_macroize branch November 27, 2024 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utf8n_to_uvchr_msgs(): Macroïze a common paradigm #22762

utf8n_to_uvchr_msgs(): Macroïze a common paradigm #22762

khwilliamson commented Nov 18, 2024 •

edited

Loading

mauke Nov 18, 2024

khwilliamson Nov 19, 2024

mauke Nov 18, 2024

khwilliamson Nov 19, 2024

khwilliamson commented Nov 20, 2024

		pack_warn = NEED_MESSAGE(WARN_UTF8,
		ckWARN_d, WARN_NON_UNICODE);

utf8n_to_uvchr_msgs(): Macroïze a common paradigm #22762

utf8n_to_uvchr_msgs(): Macroïze a common paradigm #22762

Conversation

khwilliamson commented Nov 18, 2024 • edited Loading

mauke Nov 18, 2024

Choose a reason for hiding this comment

khwilliamson Nov 19, 2024

Choose a reason for hiding this comment

mauke Nov 18, 2024

Choose a reason for hiding this comment

khwilliamson Nov 19, 2024

Choose a reason for hiding this comment

khwilliamson commented Nov 20, 2024

khwilliamson commented Nov 18, 2024 •

edited

Loading