utf8n_to_uvchr_msgs(): Fix wrong overflow warn category

This function looks for various malformation in the input string that is being converted from UTF-8 to its equivalen code point ordinal value. When it finds an issue, it can either raise a warning (if appropriate) or pass the needed information about the warning back to the caller when the function parameters say to. The data for each should be identical, but prior to this commit, they weren't in one unlikely case. This happened when the input UTF-8 sequence represents a code point whose value doesn't fit in the platform's word size. This is viewed as a malformation, and, if enabled, a warning using the WARN_UTF8 category is raised. But if disabled, another way to look at it is that this is an attempt to use a code point that isn't legal Unicode. There is another warnings category for that, WARN_NON_UNICODE. And, so a warning is raised if that category is enabled. Note that WARN_NON_UNICODE is a subcategory of WARN_UTF8, so the only way to get to this situation is no warnings 'utf8'; use warnings 'non_unicode'; (those two statements could be separated by many lines) Prior to this commit, if the caller asked for the warning information to be passed to it instead of raising the warnings, WARN_NON_UNICODE never was returned, making the two modes sometimes inconsistent. With this commit, WARN_NON_UNICODE is passed to the caller if (and only if) a warning would otherwise have been generated using it. This bug was found with tests that will be commited later.
Perl · Nov 26, 2024 · 6c47c37 · 6c47c37
1 parent 44641fd
commit 6c47c37
Showing 1 changed file with 6 additions and 2 deletions.
diff --git a/utf8.c b/utf8.c
@@ -1727,12 +1727,16 @@ Perl__utf8n_to_uvchr_msgs_helper(const U8 *s,
                      * necessarily do so in the future.  We output (only) the
                      * most dire warning */
                     if (! (flags & UTF8_CHECK_ONLY)) {
-                        if (msgs || ckWARN_d(WARN_UTF8)) {
+                        if (ckWARN_d(WARN_UTF8)) {
                             pack_warn = packWARN(WARN_UTF8);
                         }
-                        else if (msgs || ckWARN_d(WARN_NON_UNICODE)) {
+                        else if (ckWARN_d(WARN_NON_UNICODE)) {
                             pack_warn = packWARN(WARN_NON_UNICODE);
                         }
+                        else if (msgs) {
+                            pack_warn = packWARN(WARN_UTF8);
+                        }
+
                         if (pack_warn) {
                             message = Perl_form(aTHX_ "%s: %s (overflows)",
                                             malformed_text,