-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: scrub substring values with their hash #44
Conversation
One missing piece here is the scrubberSQL part :D _assert(scrubber, `No SQL factory for ${scrubberCurrentField.scrubber}, used for ${fieldName}`) will throw. In order to avoid that, we want to implement a tl;dr: add this to macAndIdScrubber: undefinedScrubberSQL, |
I added the requested changes. I'm afraid it will effectively make every HardwareDevice and pairedDevice undefined. Not sure if that is going to be an issue or not. I also added a pair of tests to ensure that future devs have a failing test when they miss the SQL-scrubber. |
I defer to Florent here 🤞 |
It might indeed, but I don't think we use hardware device in analytics anyway |
I want to point out one thing. Look at this list of scrubbers: export const defaultScrubbers: ScrubbersMap = {
staticScrubber,
preserveOriginalScrubber,
isoDateStringScrubber,
unixTimestampScrubber,
undefinedScrubber,
charsFromRightScrubber,
randomScrubber,
randomEmailScrubber,
randomEmailInContentScrubber,
saltedHashScrubber,
saltedHashEmailScrubber,
bcryptStringScrubber,
macAndIdScrubber,
} Do you see an outlier here? (like a children's game - find what does not belong in a list) The original idea for the scrubber-lib project was that it's generic/open-source, not NC specific. All the scrubbers until this PR followed that. But 2 possible outcomes:
I suggest to start with 2 and see if there's a rather simple way to make it generic. And if we really really cannot - come back here and discuss/consider 1. My gut feeling is that 2 should be possible. Alternative is to define NC-specific scrubbers in NCShared. |
src/scrubbers.ts
Outdated
const substringToReplace = `REGEXP_SUBSTR(${sqlValueToReplace}, ${params.regex})` | ||
const hashedValue = `SHA2(${substringToReplace} || '${params.initializationVector}', 256)` | ||
const replacedValue = `REGEXP_REPLACE(${sqlValueToReplace}, ${params.regex}, ${hashedValue})` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flo-monin Can you help me check if this actually works?
I checked that the functions are available on all major rdbms, but I am not sure how to test this properly.
src/scrubbers.ts
Outdated
const substringToReplace = `REGEXP_SUBSTR(${sqlValueToReplace}, ${params.regex})` | ||
const hashedValue = `SHA2(${substringToReplace} || '${params.initializationVector}', 256)` | ||
const replacedValue = `REGEXP_REPLACE(${sqlValueToReplace}, ${params.regex}, ${hashedValue})` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const substringToReplace = `REGEXP_SUBSTR(${sqlValueToReplace}, ${params.regex})` | |
const hashedValue = `SHA2(${substringToReplace} || '${params.initializationVector}', 256)` | |
const replacedValue = `REGEXP_REPLACE(${sqlValueToReplace}, ${params.regex}, ${hashedValue})` | |
const substringToReplace = `NVL(REGEXP_SUBSTR(${sqlValueToReplace}, '${params.regex}'), '')` | |
const hashedValue = `SHA2(${substringToReplace} || '${params.initializationVector}', 256)` | |
const replacedValue = `REGEXP_REPLACE(${sqlValueToReplace}, '${params.regex}', ${hashedValue})` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long story short, you need to coalesce the null value to an empty string to avoid returning null if the field doens't match the regex
and you also need to add sing quotes around the regex itself, for the SQL to compile :D
@kirillgroshkov I picked up my Generalizator gun, pointed at the code and pulled the trigger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Just one comment on naming. You call the scrubber "Salted". But the parameter is called "initializationVector". Is it better/worse to make them called the same, e.g rename initializationVector to salt
?
There already is a I think if we change the name, both should be changed. |
No worries then, ok to keep. How about the other part of my question - to rename |
As we discussed on the meeting, this name comes from the lib as it is being used elsewhere too. |
🎉 This issue has been resolved in version 2.14.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
In this PR, I added a new scrubber named
saltedHashSubstringScrubber
which behaves similar tosaltedHashScrubber
- with the difference that the new scrubber replaces a substring with the hash rather than the entire value.The substring can be specified as a string or a regex.
The scrubber takes the substring and hashes it the same way as
saltedHashScrubber
.Finally it replaces the substring with the hash.
The underlying intention is to be able to replace multiple occurrences of substring with the same replacement value.