-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch std::string
-> hashes for label resolution.
#2215
Switch std::string
-> hashes for label resolution.
#2215
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. It's great to be able to avoid a lot of the internal string handling!
For the high-level network feature, it might be best to expose hashes to the user. So it could be useful to have access to the hash function, for example to allow comparison of hashed labels for custom selection functions. It's probably an uncommon use-case, but what do think about making the hash function part of the arbor interface?
So, the user can see the hashes, they are in fact returned from |
I don't know if our simple hashing requirements justify adding another dependency. For the network generation feature, a change of hash function at a later stage would no longer be an issue. But I'd expect FNV-1a to be good enough for our purpose. |
Technically, we already have a hash function exposed as part of the interface in |
// that will be here until the end of time. | ||
|
||
template <typename T> | ||
inline constexpr std::size_t internal_hash(T&& data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for moving it here!
I just have one suggestion.
Calling internal_hash
and hash_value
would give different results for a single type due to the multiplication with the prime number in hash_value_combine
. How about merging the internal_hash
function into hash_value
?
So something like this:
template <typename T>
inline constexpr std::size_t hash_value(const T& t) {
if constexpr (std::is_convertible_v<T, std::string_view>) {
constexpr std::size_t prime = 0x100000001b3;
constexpr std::size_t offset_basis = 0xcbf29ce484222325;
std::size_t hash = offset_basis;
for (uint8_t byte: std::string_view{t}) {
hash = hash ^ byte;
hash = hash * prime;
}
return hash;
}
else { return std::hash<std::decay_t<T>>{}(t); }
}
template <typename T, typename... TAIL>
inline constexpr std::size_t hash_value(const T& t, const TAIL&... tail) {
constexpr std::size_t prime = 93481;
return prime * hash_value(t) + hash_value(tail...);
}
This way we would simply always use hash_value
throughout the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a comment mentioning the different behaviour for char*
and const char*
types would be good. For these, the content they point to is hashed. Any other type of pointer will have the hash of the pointer value itself.
Or we do not allow usage with other types of pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I decided to add a slightly more intricate logic layer:
- char*, const char*, char[] are hashed as strings
- integral types are hashed byte-wise, to avoid the STL int-hash-is-id 'feature'
- void* is hashed like an integral value
- other pointer types are denied
- everything else is hashed via STL
Idea being that hashes of pointer types are likely be unintended and must be opted-in to via casting to void*.
The deviation from STL is meant to spread out integer-keyed maps a bit better. I hope they do not use a similar
scheme for real values...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a decent solution for the problem of how to treat different pointer types.
The first comment is not yet addressed however. Calling hash_value
and internal_hash
for a single type will yield a different hash. As suggested above, we could just overload the hash_value
function, such that the current internal_hash
function will just become the hash_value
function for single types. This way we can also get rid of hash_value_combine
, which appears to be only used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I changed the hash_combine
function (below) s.t.
hash_combine(vs) = H(0, vs)
H(h, v:vs) = H(h * prime + internal_hash(v), vs)
shouldn't that address your comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I missed that change. I think just having hash_value
would still be overall cleaner then internal_hash
, hash_combine
and hash_value
. But from a functionality point of view I'm ok with it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the surface now just is hash_value
. I should adjust the usage accordingly and stuff internal_hash
and hash_combine
into a detail
namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there 😉
Just two small suggestions.
|
||
#include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <iostream> |
I think this is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh dear my old vice ;)
// that will be here until the end of time. | ||
|
||
template <typename T> | ||
inline constexpr std::size_t internal_hash(T&& data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a decent solution for the problem of how to treat different pointer types.
The first comment is not yet addressed however. Calling hash_value
and internal_hash
for a single type will yield a different hash. As suggested above, we could just overload the hash_value
function, such that the current internal_hash
function will just become the hash_value
function for single types. This way we can also get rid of hash_value_combine
, which appears to be only used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now 👍
Use the simple but well-known FNV-1a hash function to map
cell_tag_type
akastd::string
to anuint64_t
for label resolution. The former type has a size of 32B or more and the latter 8B, thus cutting the storage and bandwidth
requirements by 3/4.
The hash function is implemented from the reference given on the authors' website/wikipedia and is extremely
simple. If we ever experience issues, we might consider switching this to something of higher quality via an
external library, candidates are
xxHASH
andMurmur3
.https://github.com/Cyan4973/xxHash
Note: This should further relieve the memory pressure on larger scale simulation as formulated in #1969 and make
#2005 less urgent.
There is no performance impact (at laptop scale), but the memory savings are worth it.
TODO