Initial Obfuscation feature #1

ahouck · 2019-02-21T05:09:00Z

Initial end to end implementation of workflow, basic implementation

Migrate Data -> Apply masking values -> Encrypt Data -> Overwrite masked data with decrypted restoration of migrated data

Includes base condition tests, and rudimentary AES encryption module

dimensia

Great work Adam!

I have a couple suggestions for work to be done later on (definitely don't need to do these as a part of this trial project!).

(1) Just use a single Tyr.Collection

There should just be one *Tyr.Collection instance per domain.
A single Tyr.Collection instance can map to multiple MongoDB collections.

For example, the migrate function could look more like:

migrate(..., from: Tyr.Collection, to: Tyr.Collection, ...)

to

migrate(..., collection: Tyr.Collection, obfuscationSuffix, ...)

Maybe the suffix could have a default value like (_obfuscated). So then if User information is
by default stored in the MongoDB collection "users", then the obfuscated data would be stored at "users_obfuscated" by default.

(2) There is an existing module for Tyranid text sanitization which could be used for generating "randomized" text located at:

https://github.com/tyranid-org/tyranid/tree/master/packages/tyranid-sanitize

(3) The Tyranid repository is set up as a Mono Repo (https://en.wikipedia.org/wiki/Monorepo) using Lerna (https://lernajs.io/). (This is mostly done to ensure that tests for all subprojects are run when any project is updated to ensure that everything plays well with each other.) Later on we can set you up as a Tyranid committer and I can walk you through setting up your stuff as another Tyranid project.

(4) I think the obfuscator and de-obfuscator needs to be able to be done both in batch and on a per-document basis. I am guessing that in the CrossLead administration area there will someday be an option to "Obfuscate" or "Unobfuscate" a single user (or set of users) without affecting the existing obfuscations.

(5) I notice that you are using the same _id for the main record in the main collection and the obfuscated record in the obfuscated collection. This is great! -- definitely keep this.

(6) It would be great to add a "_obfuscated" field or something to the original document when it is obfuscated so code operating on that data knows it's essentially working with (ideally) read-only data at this point.

(7) Tyranid is first and foremost a metadata library so it is a good use of tyranid to define more refined sub-types of existing types. So, for example, rather than modeling firstName as { is: 'string', ... } we could define a new firstName type pretty easily. Then each type could have its own "sanitized"/"obfuscated" function. We already have some existing types already, like "url", "email", and so on which can be used to generate higher quality "sample data".

Anyway, all these suggestions are for the future. I'm perfectly happy with what you have already for the purposes of this trial project.

Great work again! Thanks.

dimensia · 2019-02-21T17:37:14Z

(8) We also will need to obfuscate the historical data that is currently being kept. (A lot of the collections are historical in that when new values are changed we keep records of what the previous values were and who changed them for auditing and for being able to show visualizations how organizations are evolving over time.) This should not be too much work, probablyt would be easiest to just pair program for a couple hours some afternoon.

ahouck · 2019-02-21T17:59:21Z

@dimensia

Just pushed a quick update that will do this, didn't put in the default case, will do that.
This would work really well with the obfuscation plugin, they seem to have a similar basic underlying goal. Would it be worth it to merge the two eventually? So for instance if you wanted to sanitize it would be the same as obfuscating with generated values.
Sounds good
Technically you can do a batch of one right now, as the code is, if you wanted to preserve and encrypt the original value (which is optional) it would create a standalone collection for that one user. This is mainly due to the bug I ran into with the $out function.

Going forward from this proof of concept to a production ready functionality, I would prefer to refactor it to use the in-place obfuscation approach instead of generating a new collection. Therefore I probably wouldn't alter the current code to allow for single obfuscations, but rather implement it as part of the redesign.

Yes I wanted the records in the derived collections to always be able to be tied back to the original data, even if they are exported and later imported.
See #4, I would like to re-implement on the in-place obfuscation idea
That would be good, when I started the project I did not have a great grasp of Tyranid and approached it more from a raw data/working with mongo direction.

src/obfuscator.ts

README.md

package.json

yangchristian · 2019-02-21T21:14:35Z

package.json

+    "@types/mongodb": "^3.1.19",
+    "mongodb": "^3.1.10",
+    "tyranid": "^0.5.66"
+  },


We should probably convert these to peerDependencies in the future so that consuming apps don't run into version mismatches and/or multiple imported copies of these libs.

src/obfuscator.ts

yangchristian · 2019-02-22T00:18:33Z

src/obfuscator.ts

+}
+
+const migrateData = async (targetCollection: Tyr.CollectionInstance, sourceCollection: Tyr.CollectionInstance, query?: Tyr.MongoQuery) => {
+  const q = query ? query : {};


Non-blocking: You can do default paramaters if you like using this syntax:

async (..., query: Tyr.MongoQuery = {})

interesting, I just assumed the function signature had to match the interface

yangchristian · 2019-02-22T00:27:15Z

src/typings/tyranid-extensions.d.ts

+             */
+            replacementValues?: object,
+
+        }


Non-blocking:
You could enforce either replacementValCollection or replacementValues (mutually exclusive) with a base type and type unions:

interface OptsCommon { query: FilterQuery; ... } interface OptsWithCol extends OptsCommon { replacementValCollection?: Collection; } interface OptsWithVals extends OptsCommon { replacementValues?: object, } export type ObfuscateBatchOpts = OptsWithCol | OptsWithVals;

yangchristian · 2019-02-22T00:36:38Z

test/obfuscator-spec.ts

+
+  const records = await collection.findAll({ query: {} });
+  t.true(records.length === 10, 'All records still in collection');
+  t.notDeepEqual(JSON.stringify(records), ExpectedResults.CopiedObfuscateableData, 'Data not encrypted');


Future: This assert seems a little prone to false positives. For example, if anyone accidentally adds to the users dataset but not the expected results, the test would always pass. Compounded slightly with the stringification: if stringify ever returns a format slightly different from your string literal expected result, same issue. What was the purpose of using stringify by the way? Were you running into issues with a regular deep comparison?

Very true, I was trying to quickly replicate how DBunit asserts database result batches. Also I wasn't sure how to account for the unique values generated in a simplistic comparison.

yangchristian

Some inline questions and minor feedback, but overall looks good.

Regarding point (2) from @dimensia about tyranid-sanitize: Indeed the ultimate goals are similar. That plugin was developed more for generating test data sets without PII from existing data. However, that plugin wasn't quite configurable enough to satisfy the test data nor GDPR use case. You couldn't, for example, punt the mask value generation to the application level, so a database with unique constraints wouldn't validate. I'd basically imagine this plugin superseding tyranid-sanitize but maybe taking bits and pieces, like the data generation via faker.js.

Thanks again!

Adam Houck added 4 commits February 12, 2019 18:06

Adding initial files for troubleshooting

f39df14

Base test framework setup, I don't think tyranid config is correct

5e349ea

feat: can copy raw data and replace with static vals

b81f97e

feat: finished MVP workflow, still needs to be fleshed out in some areas

ee94912

ahouck requested a review from yangchristian February 21, 2019 05:09

Adam Houck added 3 commits February 21, 2019 09:35

feat: creating readMe file

63e0d20

fix: removed unused interface prop

bbfd1bd

fix: moving export to index

d8ffd76

dimensia approved these changes Feb 21, 2019

View reviewed changes

fix: metadata collection now generated with given suffix

b2ec14c

fix: restoration decryption now supports subquery

5501995

ahouck commented Feb 21, 2019

View reviewed changes

src/obfuscator.ts Outdated Show resolved Hide resolved

yangchristian reviewed Feb 21, 2019

View reviewed changes

README.md Outdated Show resolved Hide resolved

yangchristian reviewed Feb 21, 2019

View reviewed changes

package.json Outdated Show resolved Hide resolved

yangchristian reviewed Feb 21, 2019

View reviewed changes

src/obfuscator.ts Outdated Show resolved Hide resolved

yangchristian reviewed Feb 22, 2019

View reviewed changes

fix: small tweaks, putting back collection migration

c128829

yangchristian reviewed Feb 22, 2019

View reviewed changes

yangchristian approved these changes Feb 22, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Obfuscation feature #1

Initial Obfuscation feature #1

ahouck commented Feb 21, 2019 •

edited

Loading

dimensia left a comment

dimensia commented Feb 21, 2019

ahouck commented Feb 21, 2019 •

edited

Loading

yangchristian Feb 21, 2019

yangchristian Feb 22, 2019

ahouck Feb 22, 2019

yangchristian Feb 22, 2019

yangchristian Feb 22, 2019

ahouck Feb 22, 2019

yangchristian left a comment

Initial Obfuscation feature #1

Are you sure you want to change the base?

Initial Obfuscation feature #1

Conversation

ahouck commented Feb 21, 2019 • edited Loading

dimensia left a comment

Choose a reason for hiding this comment

dimensia commented Feb 21, 2019

ahouck commented Feb 21, 2019 • edited Loading

yangchristian Feb 21, 2019

Choose a reason for hiding this comment

yangchristian Feb 22, 2019

Choose a reason for hiding this comment

ahouck Feb 22, 2019

Choose a reason for hiding this comment

yangchristian Feb 22, 2019

Choose a reason for hiding this comment

yangchristian Feb 22, 2019

Choose a reason for hiding this comment

ahouck Feb 22, 2019

Choose a reason for hiding this comment

yangchristian left a comment

Choose a reason for hiding this comment

ahouck commented Feb 21, 2019 •

edited

Loading

ahouck commented Feb 21, 2019 •

edited

Loading