Skip to content

Commit

Permalink
Update README with Pandas example (#336)
Browse files Browse the repository at this point in the history
  • Loading branch information
mallport authored Feb 5, 2024
1 parent 52b07ff commit a879421
Showing 1 changed file with 22 additions and 3 deletions.
25 changes: 22 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ result_df = (
Pseudonymize.from_polars(df) # Specify what dataframe to use
.on_fields("fornavn") # Select the field to pseudonymize
.with_default_encryption() # Select the pseudonymization algorithm to apply
.run() # Apply pseudonymization to the selected field
.run() # Apply pseudonymization to the selected field
.to_polars() # Get the result as a polars dataframe
)

Expand All @@ -55,7 +55,7 @@ result_df = (
Pseudonymize.from_polars(df) # Specify what dataframe to use
.on_fields("fornavn", "etternavn") # Select multiple fields to pseudonymize
.with_default_encryption() # Select the pseudonymization algorithm to apply
.run() # Apply pseudonymization to the selected fields
.run() # Apply pseudonymization to the selected fields
.to_polars() # Get the result as a polars dataframe
)

Expand All @@ -64,7 +64,7 @@ result_df = (
Pseudonymize.from_polars(df) # Specify what dataframe to use
.on_fields("fnr") # Select the field to pseudonymize
.with_stable_id() # Map the selected field to stable id
.run() # Apply pseudonymization to the selected fields
.run() # Apply pseudonymization to the selected fields
.to_polars() # Get the result as a polars dataframe
)
```
Expand All @@ -74,6 +74,25 @@ field is a valid Norwegian personal identification number (fnr, dnr), the recomm
the function `with_stable_id()` to convert the identification number to a stable ID (SID) prior to pseudonymization.
In that case, the pseudonymization algorithm is FPE (Format Preserving Encryption).


Note that you may also use a Pandas DataFrame as an input or output, by exchanging `from_polars` with `from_pandas`
and `to_polars` with `to_pandas`. However, Pandas is much less performant, so take special care especially if your
dataset is large.

Example:

```python
# Example: Single field default encryption (DAEAD)
df_pandas = (
Pseudonymize.from_pandas(df) # Specify what dataframe to use
.on_fields("fornavn") # Select the field to pseudonymize
.with_default_encryption() # Select the pseudonymization algorithm to apply
.run() # Apply pseudonymization to the selected field
.to_pandas() # Get the result as a polars dataframe
)
```


### Validate SID mapping

```python
Expand Down

0 comments on commit a879421

Please sign in to comment.