CDRIVER-5859 Integer comparison macro #1845

vector-of-bool · 2025-01-29T21:34:03Z

This changeset does not include any functional changes, but adds a more concise way to perform integer comparison. The mcommon components provide a set of functions and macros for integer comparison and range checking, but they require diligence from the caller that the signedness of the operands match the signedness of the function parameters. The mlib_cmp and mlib_in_range macros support any combination of signed/unsigned and integer precision.

It is recommended to review by-commit, because one massive commit refactors almost all existing range checks in a single go.

Summary

Macro Utilities

This change adds several utility macros that were developed as part of amongoc. They are defined in mlib/config.h. The main non-trivial macro is MLIB_ARGC_PICK(P, ...), which allows us to branch on the number of arguments given to .... It substitutes with P ## _argc_ ## N (__VA_ARGS__), where N is the number of arguments in __VA_ARGS__ (it was a chore to get this to play nice with MSVC's old broken preprocessor).

Lossless Widening

The basis of the functionality is from the mlib_upsize_integer(I) macro, which does the following:

If the operand $I$ is smaller than intmax_t, it is cast to intmax_t, since such widening is always lossless, even if $I$ is unsigned.
Otherwise, if $I$ is signed¹, it is cast to intmax_t (it is probably already an intmax_t)
Otherwise, $I$ is unsigned, so it is cast to uintmax_t.
The cast value is stored in struct mlib_upsized_integer.
Whether it cast to the signed or unsigned value is stored in mlib_upsized_integer::is_signed

This effectively normalizes any integer into a value that we can operate on while also knowing whether to treat it as signed or unsigned, losslessly widening any integer.

Unified Comparison

The mlib_cmp macro will opaquely widen the operands before passing it to an implementation function (also called mlib_cmp) that operates on mlib_upsized_integer. Internally, it branches on whether the operands are signed/unsigned (eight branches). Since the decision on signedness and the comparison operation is a compile-time constant, it is trivial for the optimizer to simplify the call to mlib_cmp to the single relevant branch after inlining.

Shorthand Comparison

The return type of mlib_cmp is an enumerator with values -1 (less), 0 (equal), and +1 (greater), mimicking the return value of strcmp.

With this, we can do a neat trick. If we rewrite mlib_cmp(A, Op, B) as mlib_cmp(A, B) Op 0, then passing an operator token for Op will compare the result of mlib_cmp with zero, which will "do the right thing":

if (mlib_cmp(a, <, b)) {
  puts("a is less than b");
}

This allows us to omit separate compare functions for each binary relation we might want to use.

Min/Max of Type

mcommon range checks required a different function/macro for each target type, but I was able to simplify this to a single function and single macro that can do the work:

mlib_minof(T) takes an integer type specifier and yields the minimum value of type T. For unsigned types, this is always zero. ²
mlib_maxof(T) similarly yields the maximum value of T. ²
mlib_in_range(T, Value) can thus be written as mlib_cmp(mlib_minof(T), <=, V) && mlib_cmp(V, <=, mlib_maxof(T)) (almost, it is written a bit differently to avoid necessarily double-evaluating V)

The downside of this process is that it will necessitate double-evaluating the expression $I$. With _Generic, one could implement an "is-this-signed?" macro that doesn't evaluate the operand, but we don't have _Generic yet. Instead, the signcheck is implemented using ((0 & I) - 1 < 0), which will only return true if I is a signed type. Note that this doesn't work unless I is at least the size of int, because of integer promotion nonsense. ↩
mlib_minof and mlib_maxof assume two's-complement integer encoding. ↩ ↩²

This will lay the basis for more convenient checked arithmetic going forward.

These are intended to replace existing integer comparison functions/macros. Having different comparison functions for differing sign/size combinations is error-prone, resists refactoring, and requires programmer diligence at call sites. `mlib_cmp` is a single macro that works with all integer types, regardless of sign or size. `mlib_cmp` internally branches on the signedness properties of the operands. In unoptimized cases, this may be slower, but any inliner should trivially eliminate the dead code branches.

mdbmes

This is cool. I'll pick this up tomorrow, I just wanted to post the feedback I had so far. Overall I really like it. The less error-prone API is nice, and I only have vague concerns about portability.

mdbmes · 2025-01-31T01:25:41Z

src/common/src/mlib/intutil.h

+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#pragma once


I'm sure this is fine in practice, but using a non-C99 feature here reminds me that I don't have a clear picture of our intended level of compiler support. I was under the impression that we may have best-effort support for platforms we don't automatically test, but I'm not aware of how extensive or not this may be in the wild.

mdbmes · 2025-01-31T02:16:01Z

src/common/src/mlib/config.h

+ *    -  then: _mlibHasComma(,)
+ *    -  then: 1
+ * Given any other aruments:
+ *    - first: _mlibHasComma(_bsonDSL_isEmpty_CASE_<somethingelse>)


"bsonDSL" left over in the comments

mdbmes · 2025-01-31T03:39:16Z

src/common/src/mlib/config.h

+ * list. Otherwise, expands to the third argument list. The unused argument list
+ * is not expanded and is discarded.
+ */
+#define MLIB_IF_ELSE(...) MLIB_NOTHING (#__VA_ARGS__) MLIB_PASTE (_mlibIfElseBranch_, MLIB_BOOLEAN (__VA_ARGS__))


What's the reason for MLIB_NOTHING (#__VA_ARGS__) here? I was curious and I started running evergreen without this, and I haven't found the failure yet.

I know what I was thinking when I added it (the #__VA_ARGS__ suppresses argument macro expansion), but now I'm thinking it's not necessary.

mdbmes · 2025-01-31T03:44:31Z

src/libbson/src/bson/bson-decimal128.c

@@ -298,7 +298,7 @@ bson_decimal128_to_string (const bson_decimal128_t *dec, /* IN  */
            *(str_out++) = '0';
         }

-         for (uint32_t i = 0; mcommon_cmp_greater_us (significand_digits - i, BSON_MAX (radix_position - 1, 0)) &&


I haven't worked out how this is happening yet but this seems to break NUL termination sometimes:

[2025/01/30 19:05:51.023] Assert Failure: [2025/01/30 19:05:51.023] "0.1" [2025/01/30 19:05:51.023] != [2025/01/30 19:05:51.023] "0.1��=//0��0000��/��0��/��1Z�Q��1" [2025/01/30 19:05:51.023] C:\data\mci\359e37b49d653f053af8ab5f62837867\mongoc\src\libbson\tests\test-decimal128.c:160 test_decimal128_to_string__regular()

https://evergreen.mongodb.com/task_log_raw/mongo_c_driver_sasl_matrix_winssl_sasl_cyrus_winssl_windows_2019_vs2017_x64_test_5.0_server_auth_patch_4b4aade5a07c62c917b1f3cd24dec7c34152608f_679c3595a815820007cb738a_25_01_31_02_30_26/0?type=T&text=true

https://spruce.mongodb.com/version/679c3595a815820007cb738a/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

(This is from one of the evergreen runs I did to test that MLIB_NOTHING removal)

I'll take a closer look tomorrow, this'll just likely require setting up a local repro so I can look at the preprocessor and compiler outputs.

Pretty sure it's a compiler bug but I think the trigger is related to the large argument type of mlib_cmp(). For some reason it's not being inlined here, so the resulting binary is quite a mess.

I was alarmed by the possibility of stack damage across the mlib_cmp() call but it does look like the problem is limited to an incorrect comparison result. The NUL terminator is present at the very end of the buffer, and the garbage is coming from uninitialized data copied over from significand_read.

Is there perhaps a static test we can add that will ensure mlib_cmp is always inlined? My first inclination is to read the debug symbols but that'll be painful to implement for every platform.

Not sure if there's a way to check that at compile time. An always-inline may be useful on this one to force dead code elimination to trigger.

I'm unable to repro the failing decimal128 tests. May be an environmental issue?

It seems to be specific to 32-bit windows and vs2017. I haven't reproduced the failure from source locally yet but i've been able to repro the runtime failure using binary artifacts from evergreen, and debug that with windbg and ghidra.

Even if the generated code was correct, I'd argue that un-inlined output is not acceptable here. It takes what should be a single comparison and turns it into quite a lot of sign extension and branching and temporary stack space.

Oh, one easy trick that comes to mind: put a disallowed token in mlib_cmp() that should always be dead code after inlining. Something that'll cause a link failure, or worst-case a random string that can be searched in the binary.

vector-of-bool added 8 commits January 29, 2025 10:41

Integer type and range utilities.

7e44494

This will lay the basis for more convenient checked arithmetic going forward.

mlib/config.h header for common platform detection and utility macros

0936188

Mass update: Replace mcommon_cmp with mlib_cmp

67a1fe7

Workaround broken MSVC preprocessor

2e200fe

Rename variable to not fight with clang-format

f8f419c

Formatting

903ffc2

We don't need that flag

e8e67b6

vector-of-bool requested review from kevinAlbs and mdbmes January 30, 2025 18:50

vector-of-bool marked this pull request as ready for review January 30, 2025 19:02

mdbmes reviewed Jan 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CDRIVER-5859 Integer comparison macro #1845

CDRIVER-5859 Integer comparison macro #1845

vector-of-bool commented Jan 29, 2025 •

edited

Loading

mdbmes left a comment

mdbmes Jan 31, 2025

mdbmes Jan 31, 2025

mdbmes Jan 31, 2025

vector-of-bool Jan 31, 2025

mdbmes Jan 31, 2025

mdbmes Jan 31, 2025 •

edited

Loading

mdbmes Jan 31, 2025

mdbmes Jan 31, 2025

vector-of-bool Jan 31, 2025

mdbmes Jan 31, 2025 •

edited

Loading

mdbmes Jan 31, 2025

CDRIVER-5859 Integer comparison macro #1845

Are you sure you want to change the base?

CDRIVER-5859 Integer comparison macro #1845

Conversation

vector-of-bool commented Jan 29, 2025 • edited Loading

Summary

Macro Utilities

Lossless Widening

Unified Comparison

Shorthand Comparison

Min/Max of Type

Footnotes

mdbmes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdbmes Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdbmes Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vector-of-bool commented Jan 29, 2025 •

edited

Loading

mdbmes Jan 31, 2025 •

edited

Loading

mdbmes Jan 31, 2025 •

edited

Loading