Utility to redact/mask key parts of logs and other files that need to be shared without breaking the original log structure. It can redact IPV4 and IPV6 addresses, hostnames, URLs, email addresses, phone numbers, names, and API keys. It can also redact custom patterns if interactive mode is enabled. The script reads from secrets.json
and ignores.json
to keep track of sensitive information and patterns to ignore.
The underlying redaction logic is implemented in both Python and Rust. The Python implementation is more feature-rich and supports redacting data from a variety of file types, including PDFs. The Rust implementation is faster and can redact data from tar, tar.gz, tgz, zip, and PDF files.
- The redaction output conforms to original data types (e.g., IP addresses are redacted to valid IP addresses) to ensure the entire log remains valid and usable
- Keeps track of redacted data in
redacted-mapping.txt
for future reference - Redaction of sensitive data from a variety of file types, including PDFs
- Interactive mode to confirm redaction of sensitive data
- Support for custom patterns in
secrets.json
andignores.json
- Support simple glob patterns in
secrets.json
andignores.json
- Support for redacting data from tar, tar.gz, tgz, zip, and PDF files
-
Clone the repository:
git clone https://github.com/hiranp/log-redactor.git cd log-redactor
-
Install the required dependencies:
pip install -r requirements.txt
-
(Optional) Install
PyMuPDF
for PDF redaction:pip install pymupdf
-
(Optional) Install
Rust
for faster redaction:curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
-
(Optional) Build the Rust implementation:
cargo build --release
-
(Optional) Run the Rust implementation:
cargo run --release -- <path>
Guidelines for contributing to the project.
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
- GitHub repository: https://github.com/hiranp/log-redactor/
- Documentation: https://github.com/hiranp/log-redactor/blob/main/docs/index.md
Instructions on how to use the tool.
- Basic Redaction: Run
python3 redactor.py <path>
where<path>
is the file, directory, or (tar, gzip, or zip) archive you want to redact. - Interactive Mode: Run
python3 redactor.py <path> -i
to redact interactively. - PDF Redaction: Ensure
PyMuPDF
is installed. Runpython3 redactor.py <path>
where<path>
is a PDF file. [Note: PDF redaction is experimental and may not work as expected.]
The redacted file is saved as <original-filename>-redacted.<extension>
.
- Basic Redaction: Run
cargo run --release -- <path>
where<path>
is the file, directory, or archive (tar, tar.gz, tgz, zip, or pdf) you want to redact. - Interactive Mode: Run
cargo run --release -- <path> -i yes
to redact interactively. Enter 'yes' or 'no' when prompted. - Specify Secrets File: Use the
-s
or--secrets
flag to specify the path to the secrets file. Example:cargo run --release -- <path> -s /path/to/secrets.json
- Specify Ignores File: Use the
-g
or--ignores
flag to specify the path to the ignores file. Example:cargo run --release -- <path> -g /path/to/ignores.json
-
Redact a directory:
cargo run --release -- /path/to/directory
-
Redact a file:
cargo run --release -- /path/to/file.txt (tar.gz, tgz, zip, pdf)
-
Redact interactively:
cargo run --release -- /path/to/file.txt -i yes
-
Redact a file with custom secrets and ignores:
cargo run --release -- /path/to/file.txt -s /path/to/secrets.json -g /path/to/ignores.json
-
More help:
cargo run --release -- --help
The script uses a list of regular expressions to find sensitive data in the file. It then replaces the sensitive data with a redacted version of itself. For example, 102.23.5.1
becomes 240.0.0.1
.
Based on Wikipedia's Reserved IP addresses page, the script uses the following reserved IP addresses for redaction: For IP4 addresses, the script uses 240.0.0.0/4 as the redacted IP address. For IP6 addresses, the script uses 3fff::/20 as the redacted IP address.
For numbers, the script uses (800) 555‑0100 through (800) 555‑0199 range. See https://en.wikipedia.org/wiki/555_(telephone_number) for more information.
For email addresses, the script uses [email protected]
as the redacted email address. See https://en.wikipedia.org/wiki/Example.com for more information.
The script reads from secrets.json
and ignores.json
to manage sensitive information that should be redacted or ignored during the redaction process.
Note:
Value in secrets.json
take precedence over values ignores.json
during redaction.
This file contains patterns of sensitive information that should always be redacted. Each line in the file specifies a type of sensitive information (e.g., ipv4
, email
, etc.) and the corresponding value to be redacted.
The secrets file can contain glob patterns like:
{
"hostname": ["special*", "*example.com", "test-*.local"],
"email": ["*@internal.com", "admin*@*"]
}
This implementation allows for simple wildcard patterns:
*
matches any number of characters?
matches exactly one character[abc]
matches any character in the set[!abc]
matches any character not in the set
Examples:
special*
matches anything starting with "special"*.example.com
matches any subdomain of example.comtest-*.local
matches any test domain in .local
Example:
{
"ipv4": ["192.168.1.1"],
"email": ["[email protected]"],
"phone": ["123-456-7890"],
"hostname": ["example.com", "special*", "test-*.local"],
"email": ["*@internal.com"],
"url": ["https://www.example.com"],
"api": ["apikey=1234567890abcdef"]
}
This file contains patterns of information that should be ignored during the redaction process. Each line in the file specifies a type of information (e.g., ipv4, email, etc.) and the corresponding value to be ignored.
Example:
{
"ipv4": ["127.0.0.1"],
"email": ["[email protected]"],
"phone": ["555-555-5555"],
"hostname": ["localhost"],
"url": ["http://localhost"],
"email": ["junk*@*"],
"api": ["apikey=ignorethisapikey"]
}
In interactive mode, the script will ask you to confirm each redaction. You can choose to always redact that data, never redact that data, or redact/not redact just that instance of the data. If you are not in interactive mode, the script will always try to redact the data.
- Complete rust implementation
- Review third-party libraries to validate strings before redacting.
- garde or
- validators or
- phonenumbers to validate phone numbers or
- Add support for social security numbers
- Add support for incorporating custom patterns
- Improve redaction of pdf files
- Add support for incorporating ML models to redact data more accurately
- Add support for redacting data in multiple files at once
Inspired by PyRedactKit