Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use 0 as the hash of false #8

Open
no-longer-on-githu-b opened this issue Feb 10, 2019 · 7 comments
Open

Don't use 0 as the hash of false #8

no-longer-on-githu-b opened this issue Feb 10, 2019 · 7 comments

Comments

@no-longer-on-githu-b
Copy link

no-longer-on-githu-b commented Feb 10, 2019

The combining function is a * 31 + b, so combining with zero doesn't do anything to the hash. This means that the arrays [false, false, false, true], [false, false, true], [false, true] and [true] all have the same hash.

The solution would be not to use zero as the hash of false. Similar for Nothing.

@nukisman
Copy link

nukisman commented Feb 10, 2019

@rightfold

> hash [false, true]
962
> hash [false, false, true]
29792
> hash [false, false, false, true]
923522

@fehrenbach
Copy link
Owner

In arrays, we start with accumulator 1, so the array length contributes to the array hash even when all elements hash to 0, as @nukisman observed.

I made this mistake in the prelude PR however, so thanks for pointing it out!

Changing the hash of false, Nothing, ... would be an option too. However, the identity is a natural hash function for Int, and in general counting from 0 seems reasonable. Therefore, I'd rather fix the hash functions of containers if they don't work well with 0 (or any other value).

@hdgarrood
Copy link

Records and tuples still seem to have the same problem:

> hash (tuple2 true false)
32

> hash (tuple2 false true)
32

> hash {a: true, b: false}
31

> hash {b: true, a: false}
31

It seems quite easy to accidentally fall foul of this issue with the current design. Did you consider using something a little closer to the Haskell hashable package's design, where the Hashable class provides a hashWithSalt? Then we could provide a separate hash :: forall a. Hashable a => a -> Hash a which uses something non-zero as the default salt, and thereby helps avoid this issue.

@fehrenbach fehrenbach reopened this Feb 11, 2019
@fehrenbach
Copy link
Owner

Good point. I only thought about varying lengths, not ordering.

I did not like that Haskell's hashable has two methods in its Hashable class, especially since we do not have default implementations. I picked hash because it seemed like the lower-overhead option. However, hashWithSalt is more flexible.
IIRC Rust (and others) have an even more general approach to hashing which splits adding data to a hash builder from finalizing a hash computation into a single hash value. This way the hash function can have more internal state than external values.
I suppose one could go further and swap combining functions too or even the whole algorithm, if there are any that are not basically a fold. I think Guava is generic over the hashing algorithm.

To be honest, I'm not terribly concerned about the records and tuples thing. It's easy enough to fix. Putting the right thing into the prelude is important, though. I will think about it some more.

@JordanMartinez
Copy link

Would doing hashWithSalt be what we include in prelude for now with the option of using something more like Rust in a future breaking release?

@hdgarrood
Copy link

I would prefer not to do anything that sets us up for a future breaking release in Prelude. If there's an alternative API we should consider (I'm not aware of what Rust does), then let's discuss that before adding anything to Prelude. Personally I'd prefer to leave hashable out of 0.14, mostly because it's already been a really long time since we made a compiler release.

@fehrenbach
Copy link
Owner

Personally I'd prefer to leave hashable out of 0.14, mostly because it's already been a really long time since we made a compiler release.

FWIW, me too. I have somewhat recently (that is sometime this year, IIRC) rebased and improved the deriving implementation, but there's still a bit of work to do and I'm not going to have time to do it before the end of December. In any case, adding any of the hashable things is not (very) breaking, so there's absolutely no reason to delay 0.14.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants