-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDRIVER-5859 Integer comparison macro #1845
base: master
Are you sure you want to change the base?
CDRIVER-5859 Integer comparison macro #1845
Conversation
This will lay the basis for more convenient checked arithmetic going forward.
These are intended to replace existing integer comparison functions/macros. Having different comparison functions for differing sign/size combinations is error-prone, resists refactoring, and requires programmer diligence at call sites. `mlib_cmp` is a single macro that works with all integer types, regardless of sign or size. `mlib_cmp` internally branches on the signedness properties of the operands. In unoptimized cases, this may be slower, but any inliner should trivially eliminate the dead code branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool. I'll pick this up tomorrow, I just wanted to post the feedback I had so far. Overall I really like it. The less error-prone API is nice, and I only have vague concerns about portability.
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
#pragma once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure this is fine in practice, but using a non-C99 feature here reminds me that I don't have a clear picture of our intended level of compiler support. I was under the impression that we may have best-effort support for platforms we don't automatically test, but I'm not aware of how extensive or not this may be in the wild.
* - then: _mlibHasComma(,) | ||
* - then: 1 | ||
* Given any other aruments: | ||
* - first: _mlibHasComma(_bsonDSL_isEmpty_CASE_<somethingelse>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"bsonDSL" left over in the comments
* list. Otherwise, expands to the third argument list. The unused argument list | ||
* is not expanded and is discarded. | ||
*/ | ||
#define MLIB_IF_ELSE(...) MLIB_NOTHING (#__VA_ARGS__) MLIB_PASTE (_mlibIfElseBranch_, MLIB_BOOLEAN (__VA_ARGS__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for MLIB_NOTHING (#__VA_ARGS__)
here? I was curious and I started running evergreen without this, and I haven't found the failure yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what I was thinking when I added it (the #__VA_ARGS__
suppresses argument macro expansion), but now I'm thinking it's not necessary.
@@ -298,7 +298,7 @@ bson_decimal128_to_string (const bson_decimal128_t *dec, /* IN */ | |||
*(str_out++) = '0'; | |||
} | |||
|
|||
for (uint32_t i = 0; mcommon_cmp_greater_us (significand_digits - i, BSON_MAX (radix_position - 1, 0)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't worked out how this is happening yet but this seems to break NUL termination sometimes:
[2025/01/30 19:05:51.023] Assert Failure:
[2025/01/30 19:05:51.023] "0.1"
[2025/01/30 19:05:51.023] !=
[2025/01/30 19:05:51.023] "0.1���=//0���0000���/���0���/���1Z�Q�����1"
[2025/01/30 19:05:51.023] C:\data\mci\359e37b49d653f053af8ab5f62837867\mongoc\src\libbson\tests\test-decimal128.c:160 test_decimal128_to_string__regular()
(This is from one of the evergreen runs I did to test that MLIB_NOTHING removal)
I'll take a closer look tomorrow, this'll just likely require setting up a local repro so I can look at the preprocessor and compiler outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure it's a compiler bug but I think the trigger is related to the large argument type of mlib_cmp(). For some reason it's not being inlined here, so the resulting binary is quite a mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was alarmed by the possibility of stack damage across the mlib_cmp() call but it does look like the problem is limited to an incorrect comparison result. The NUL terminator is present at the very end of the buffer, and the garbage is coming from uninitialized data copied over from significand_read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there perhaps a static test we can add that will ensure mlib_cmp is always inlined? My first inclination is to read the debug symbols but that'll be painful to implement for every platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there's a way to check that at compile time. An always-inline
may be useful on this one to force dead code elimination to trigger.
I'm unable to repro the failing decimal128 tests. May be an environmental issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be specific to 32-bit windows and vs2017. I haven't reproduced the failure from source locally yet but i've been able to repro the runtime failure using binary artifacts from evergreen, and debug that with windbg and ghidra.
Even if the generated code was correct, I'd argue that un-inlined output is not acceptable here. It takes what should be a single comparison and turns it into quite a lot of sign extension and branching and temporary stack space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, one easy trick that comes to mind: put a disallowed token in mlib_cmp() that should always be dead code after inlining. Something that'll cause a link failure, or worst-case a random string that can be searched in the binary.
Refer: CDRIVER-5859
This changeset does not include any functional changes, but adds a more concise way to perform integer comparison. The
mcommon
components provide a set of functions and macros for integer comparison and range checking, but they require diligence from the caller that the signedness of the operands match the signedness of the function parameters. Themlib_cmp
andmlib_in_range
macros support any combination of signed/unsigned and integer precision.It is recommended to review by-commit, because one massive commit refactors almost all existing range checks in a single go.
Summary
Macro Utilities
This change adds several utility macros that were developed as part of
amongoc
. They are defined inmlib/config.h
. The main non-trivial macro isMLIB_ARGC_PICK(P, ...)
, which allows us to branch on the number of arguments given to...
. It substitutes withP ## _argc_ ## N (__VA_ARGS__)
, whereN
is the number of arguments in__VA_ARGS__
(it was a chore to get this to play nice with MSVC's old broken preprocessor).Lossless Widening
The basis of the functionality is from the
mlib_upsize_integer(I)
macro, which does the following:intmax_t
, it is cast tointmax_t
, since such widening is always lossless, even ifintmax_t
(it is probably already anintmax_t
)uintmax_t
.struct mlib_upsized_integer
.mlib_upsized_integer::is_signed
This effectively normalizes any integer into a value that we can operate on while also knowing whether to treat it as signed or unsigned, losslessly widening any integer.
Unified Comparison
The
mlib_cmp
macro will opaquely widen the operands before passing it to an implementation function (also calledmlib_cmp
) that operates onmlib_upsized_integer
. Internally, it branches on whether the operands are signed/unsigned (eight branches). Since the decision on signedness and the comparison operation is a compile-time constant, it is trivial for the optimizer to simplify the call tomlib_cmp
to the single relevant branch after inlining.Shorthand Comparison
The return type of
mlib_cmp
is an enumerator with values-1
(less),0
(equal), and+1
(greater), mimicking the return value ofstrcmp
.With this, we can do a neat trick. If we rewrite
mlib_cmp(A, Op, B)
asmlib_cmp(A, B) Op 0
, then passing an operator token forOp
will compare the result ofmlib_cmp
with zero, which will "do the right thing":This allows us to omit separate compare functions for each binary relation we might want to use.
Min/Max of Type
mcommon
range checks required a different function/macro for each target type, but I was able to simplify this to a single function and single macro that can do the work:mlib_minof(T)
takes an integer type specifier and yields the minimum value of typeT
. For unsigned types, this is always zero. 2mlib_maxof(T)
similarly yields the maximum value ofT
. 2mlib_in_range(T, Value)
can thus be written asmlib_cmp(mlib_minof(T), <=, V) && mlib_cmp(V, <=, mlib_maxof(T))
(almost, it is written a bit differently to avoid necessarily double-evaluatingV
)Footnotes
The downside of this process is that it will necessitate double-evaluating the expression $I$. With
_Generic
, one could implement an "is-this-signed?" macro that doesn't evaluate the operand, but we don't have_Generic
yet. Instead, the signcheck is implemented using((0 & I) - 1 < 0)
, which will only returntrue
ifI
is a signed type. Note that this doesn't work unlessI
is at least the size ofint
, because of integer promotion nonsense. ↩mlib_minof
andmlib_maxof
assume two's-complement integer encoding. ↩ ↩2