Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple fix for colorhash algo #216

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

saim61
Copy link

@saim61 saim61 commented Dec 4, 2024

I noticed a small problem in the color hashing algo for HSV. When the bin bits increase to size 46 and onwards it gives a warning of scalar multiplication and due to this we get a slightly wrong answer.

The warning:

/imagehash/__init__.py:441: RuntimeWarning: overflow encountered in scalar multiply
  values.append(min(maxvalue - 1, int(counts * maxvalue * 1. / c)))

Peppers image bin bit size 46 (with fix)


0ffffffffff07fffffffffe0ffffffffff40fffffffffd000ffffffffc07ffffffffc0001fffffff83fffffffffeffffffffffec01ffffffffd00000000000000000000000000000000000fffffffffff


Peppers image bin bit size 46 (without fix)

0ffffffffff07fffffffffe0ffffffffff40fffffffffd000ffffffffc07ffffffffc0001fffffff83fffffffffefffffffffff001ffffffffd00000000000000000000000000000000000fffffffffff

Key Difference:
At Position 108–119 (starting from index 0):
With Fix: feffffffffffec01
Without Fix: fffffffffff001

The fix: we just needed to round off the counts variable before performing other calculations.

@coveralls
Copy link

Coverage Status

coverage: 87.063%. remained the same
when pulling d6001f1 on saim61:colorhash_fix
into 07951cd on JohannesBuchner:master.

@JohannesBuchner
Copy link
Owner

Will this break backwards compatibility, for example, for someone who created a database of colorhashes?

@saim61
Copy link
Author

saim61 commented Dec 4, 2024

If they're using bin bits size greater than 46, then yes. It will affect the outputs. This doesn't change anything for people using bin bits 3 to 45. However, the change in this PR does give you the correct output which you should be getting.

@JohannesBuchner
Copy link
Owner

Just a small question: for what application would one use so many bits? If you hash, you want it to be lossy so that similar images have the same value.

@saim61
Copy link
Author

saim61 commented Dec 5, 2024

It does seem unlikely that anyone would use such a large bin bits size. I was playing around with this library and was working on something that does exactly what this library does in Go. So while testing and comparing values for hashes i noticed this bug in the python repo.

@JohannesBuchner
Copy link
Owner

JohannesBuchner commented Dec 5, 2024

I think the idea of the line is that a value should fit between 0 and 2^bitsize, from another range. The line scales linearly and then discretizes to integer. The change in the PR applies the discretization to counts, which is already integer. With this PR, I think values can end up containing floats.

@saim61
Copy link
Author

saim61 commented Dec 5, 2024

Actually i have done quite some research on this. If we dont make this change and print out the values array, you can see negative values in some of the indexes which cause incorrect answers. This is due to integer overflow.

@JohannesBuchner
Copy link
Owner

OK, but for that case,
min(maxvalue - 1, int(counts * maxvalue * 1. / c))
should be changed to
int(min(maxvalue - 1, counts * maxvalue * 1. / c))

@saim61
Copy link
Author

saim61 commented Dec 5, 2024

Screenshot 2024-12-05 at 11 52 10 PM

I have tried this too when I was debugging things. It doesnt work.

This issue and the fix both itself are very strange 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants