-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash transformer is too slow #8
Comments
We will review the implementation. But I suspect it is due to the hash function. I think we should provide hash function choice and hash function params. We will try to resolve it in the next release. For now, I can suggest a temporal solution. You can use a simple shell script to implement any hashing function. For instance #!/bin/bash
while read line
do
printf "%s" "$line" | md5sum | awk '{print $1}'
done And the config can be like: - schema: "humanresources"
name: "employee"
transformers:
- name: "Cmd"
params:
driver:
name: "text"
expected_exit_code: -1
skip_on_null_input: true
executable: "/var/lib/playground/test.sh"
columns:
- name: "jobtitle" The result Read about Cmd transformer |
I see that the current hash implementation is using an encryption algorithm, maybe a hash algorithm would be faster. I'm going to push back this requirement for now until there is a built-in solution for it, thanks |
Agreed. We will try to deliver it soon, but any contribution is appreciated. Thank you! |
Another finding about the I have a table |
Yeah, a collision was caused. I will rewrite the implementation with the possibility of choosing a hash function (md5, sha1, SHA224/256/384/512). The expected release date is 14 February. Thank you so much for reporting. |
Never mind, it was my fault, the |
* New `Hash` transformer uses `sha1` hash by default. * Added parameter `function` that can provide a choice of possible hash algorithms `md5, sha1, sha256, sha512`. * Added `max_length` parameter allowing to truncate hash tail higher than provided length. The default value is `0` - meaning "do not truncate" * Fixed metadata enrichment for validation warnings caused by `RawValueValidator` Additional changes: * Added Error severity for Cmd parameter validator Closes #8
FIxed in v0.1.5 |
I gave it a try, the new |
I'm currently using
RandomUuid
for most of the columns but I was asked to hash the original values to maintain the same masked value.I've replaced
RandomUuid
withHash
and what used to take less than a minute to dump/transform the data now takes 30 min.This is what looks like the
transformation
config for 6 tablesThe text was updated successfully, but these errors were encountered: