-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Power10 trait, with first method is_power_of_ten #15
base: master
Are you sure you want to change the base?
Conversation
I think I need help with fixing the u128 case. I can't seem to hide the static array from the compiler with a I don't understand why it works with 1.20, which doesn't have u128, but not older... |
Looks like I managed to hide u128 😏 If there is a better way, let me know. |
FYI, this is making things a bit difficult. |
I prefer one PR with all planned methods, so we can treat it as ready-to-publish once merged, without further breaking changes. Adding more methods on a trait is breaking unless there's a default impl. I don't care how many commits are in your PR, though I do prefer a clean working progression over seeing useless "fix it" kind of commits.
I believe the reason is that 1.20 has unstable support for u128, so it at least knows how to parse the big literals, where earlier versions didn't. Hiding it in a macro allowed those earlier versions to discard it without trying to tokenize the contents. An alternative workaround would be to use a const expression of smaller literals, like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run cargo fmt
-- it's going to expand a lot of your one-liners, but I want consistent formatting.
src/power10.rs
Outdated
if v >= 100 { return v == 100; } | ||
if v >= 10 { return v == 10; } | ||
if v >= 1 { return v == 1; } | ||
false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these last two lines could be collapsed to just v == 1
.
(but LLVM might figure that out anyway...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yub.
src/power10.rs
Outdated
fn is_pow10_u64(v: u64) -> bool { | ||
let mut hash: u32 = v as u32; | ||
hash ^= hash >> 3; | ||
hash = hash.rotate_right(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since rust-lang/rust#56009 has attributed your performance regression to rotate_right
, I wonder why do you even need that here? That 1 MSB won't contribute to the actual hash index, so can this be a plain shr
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, after further analyzing the issue, the better code is:
let mut hash: u32 = v as u32;
hash ^= hash >> 3;
hash = (hash >> 24) ^ (hash >> 1);
src/power10.rs
Outdated
#[inline] | ||
pub fn is_pow10_u128(v: $T) -> bool { | ||
let mut hash: u32 = v as u32 | (((v as u64) >> 32) as u32); | ||
hash = hash.wrapping_mul(1249991743).rotate_right(25); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too, I think those 25 MSBs won't matter, so shr
should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yub.
Hmm, actually it looks like I haven't done a crate-wide |
src/power10.rs
Outdated
x.is_power_of_ten() | ||
} | ||
|
||
// implementation note: the is_power_of_ten algorithm is based on a perfect hash setup with very simple hash functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you use any particular tool to create these perfect hash functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some code that does that. Nothing clever. It just picks random values and produces a hash until it finds something.
So I need a little help/sounding board here. After looking more closely at the performance, I have two different 64-bit implementations (perfect_hash, lz) that behave like so: RUSTFLAGS="" RUSTFLAGS="-C target-cpu=native" I think perfect_hash vectorizes better. I can't make up my mind about which one is the right one for a general purpose library |
Those numbers look close enough that I'd lean toward whichever will be easier to maintain. The vectorization of your microbenchmark may or may not represent how well it can be optimized in real use, where you're more likely to have other computations involved too. So try not to extrapolate too much from these results. |
I agree that vectorization is not an important consideration. When I turn off vectorization, lz benefits from native, but perfect_hash does not. I was also worried about 32bit platforms, but a quick check shows that lz with target-cpu=native is better, but worse without native at least on i686. On balance, I think I'll go with lz for u64. Gotta run the numbers for u32. Thanks. |
This should be a good basis for a full review. I did a A few things:
|
@cuviper ping, just in case this got lost in the holiday shuffle 😄 |
Hi! This is still in my inbox to review, but I'll at least thrown opinions at your design questions. :)
I think the method names are OK. The floats give a precedent for
I would expect 0 to be a panic case, because You can also consider that Aside on naming - I'm inclined to just call this
|
fixed cargo test --release
Renamed floor_log10 to log10 and added checked/unchecked variants (wrapping didn't make sense?). BTW, interesting curiosity: |
I get And even that only makes from the positive side, |
Right, sorry, this got me confused: |
It seems we ought to rather have a general |
I suggested generic powers in #14 too, but I think it's still OK to specialize common bases. |
src/power10.rs
Outdated
fn is_power_of_ten(&self) -> bool; | ||
|
||
/// Returns the base 10 logarithm value, truncated down. | ||
/// Returns zero for zero in release mode, panics in debug mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather make zero an unconditional panic -- it's a domain error, just like divide-by-zero.
fn log10(&self) -> u32; //note: u32 return type to allow BigInt types to implement. 10^(2^32) is 4 billion digits | ||
|
||
/// Returns the base 10 logarithm value, truncated down. | ||
/// Panics for zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "checked" method usually implies returning an Option
, and again checked_div
is a good example.
src/power10.rs
Outdated
fn checked_log10(&self) -> u32; | ||
|
||
/// Returns the base 10 logarithm value, truncated down. | ||
/// Returns zero for zero. (This matches f64::log10()::floor() for zero.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this one. It only matches f64
given that you got -inf as u32 == 0
, but I think this is actually undefined behavior -- rust-lang/rust#10184
The last commit should address the latest review comments. Let me know if there is more to do (I'll squash the commits once the review is done). Oh, I should clarify one thing: the f64 stuff was mostly a joke. It's so hard to get that through in writing (sigh... 😓) |
The benchmarks are failing on nightly:
|
It was the zero panic. Pushed a fix. BTW, the always panic behavior has a 50% cost in performance for u32 (from 0.8 ns per call to 1.2 ns per call). |
Well, microbenchmarks are hard to extrapolate into real use cases, mixed with other computations. For instance, if the optimizer has enough range information from context to know the input can't be zero, then the check will probably be removed altogether. |
Here is the first method in the new trait. Just sending it across for a quick review before I start adding more.
Do you prefer