Skip to content
This repository has been archived by the owner on Jun 2, 2023. It is now read-only.

BitCurator Access Redaction

Kam Woods edited this page Apr 15, 2018 · 1 revision

The [//github.com/bitcurator/bitcurator-access-redaction bitcurator-access-redaction] project builds on existing disk image redaction and [//github.com/simsong/dfxml/tree/master/python Digital Forensics XML tools] to provide collecting institutions with software to redact strings and byte sequences identified in disk images. The software also includes a Python API allowing institutions to develop powerful custom redaction facilities using cutting-edge tools including [//strozfriedberg.github.io/liblightgrep/ lightgrep].

Developing comprehensive strategies for redacting born-digital materials is an important concern for many archives, libraries, and museums. Digital media acquisitions often contain data that may be classified as private, sensitive, or individually identifying, and the complexity and volume of information being collected demands automation to ensure that risks of inadvertent disclosure are minimized.

Currently, there are relatively few open source redaction tools capable of addressing these needs. The [//github.com/bitcurator/bitcurator-access-redaction bitcurator-access-redaction] project targets specific needs, including:

  • Redacting specific bitstreams from raw disk images
  • Creating redacted copies of forensically-packaged disk images
  • Redacting metadata from common file formats, including Office and PDF files.
  • Redacting patterns from bitstreams using [//github.com/strozfriedberg/liblightgrep/tree/master/pylightgrep pylightgrep]