Skip to content
ominbruce edited this page Jul 26, 2022 · 8 revisions

Quick Installation and basic usage

python -m pip install --upgrade pyredactkit

Note : If you are using govtech version, you can use redactor from the powershell terminal instead. E.g:

redactor 'this is my ip:127.0.0.1. my email is [email protected]. secret link is github.com'

To redact a text glob from terminal.

prk 'this is my ip:127.0.0.1. my email is [email protected]. secret link is github.com'

This will create a redacted file and a hashshadow file which you can later use to unredact.

// .hashshadow.json
{
  "5f7aa522-86e5-4ca7-83ae-09fbb5a1044b": "[email protected]",
  "983b017a-98a5-4763-aa6d-a8ad69db20bc": "github.com",
  "a9581c73-05cb-428e-8c62-8bf1521a8aa1": "127.0.0.1"
}
<!-- redacted.txt -->
this is my ip:a9581c73-05cb-428e-8c62-8bf1521a8aa1. my email is 5f7aa522-86e5-4ca7-83ae-09fbb5a1044b. secret link is 983b017a-98a5-4763-aa6d-a8ad69db20bc

To redact a single file from terminal. PyRedactKit will automatically detect if it's a valid file and start processing.

prk test.txt 

To Unredact a redacted file.

prk redacted_test.txt -u .hashshadow_test.txt.json 

This will create an unredacted file which contains the original unmasked data.

<!-- unredacted.txt -->
this is my ip:127.0.0.1. my email is [email protected]. secret link is github.com

Advance usage

You can redact multiple files in a folder with sub directories and output the files into a newly created directory.

Consider the below folder containing multiple log files with sub directory.

tree foldertoredact 
foldertoredact
β”œβ”€β”€ cctest.txt
β”œβ”€β”€ ip_test 3.txt
β”œβ”€β”€ ip_test 4.txt
β”œβ”€β”€ ip_test.txt
β”œβ”€β”€ nric.txt
└── subdir
    └── ip_test copy.txt

1 directory, 6 files

To redact all of log files in the folder and place them in a new folder:

prk foldertoredact -d newfolder
tree newfolder
newfolder
β”œβ”€β”€ redacted_cctest.txt
β”œβ”€β”€ redacted_ip_test 3.txt
β”œβ”€β”€ redacted_ip_test 4.txt
β”œβ”€β”€ redacted_ip_test copy.txt
β”œβ”€β”€ redacted_ip_test.txt
└── redacted_nric.txt

0 directories, 6 files

Besides the core regex patterns of SG NRIC, domain names, emails, ip addresses, base64 strings and credit cards, you can also define custom regex patterns to redact them from your log files.

To redact using custom regex pattern, create a custom json file as per format below.

// customregex.json
[
    {
        "pattern": "^([a-zA-Z0-9_-]*:[a-zA-Z0-9_-][email protected]*)$",
        "type": [
            "API Keys",
            "Credentials",
            "Bug Bounty",
            "GitHub"
        ]
    },
    {
        "pattern": "(?i)^(arn:(?P<Partition>[^:\\n]*):(?P<Service>[^:\\n]*):(?P<Region>[^:\\n]*):(?P<AccountID>[^:\\n]*):(?P<Ignore>(?P<ResourceType>[^:\\/\\n]*)[:\\/])?(?P<Resource>.*))$",
        "type": [
            "Identifiers",
            "Networking",
            "AWS",
            "Bug Bounty"
        ]
    },
    {
        "pattern": "(?i)^((facebook|fb)(.{0,20})?['\\\"][0-9a]{32}['\\\"])$",
        "type": [
            "API Keys",
            "Bug Bounty",
            "Credentials",
            "Facebook"
        ]
    }
]
prk file -c customregex.json
Clone this wiki locally