-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
create article to remove historic commits
- Loading branch information
1 parent
6dc980f
commit fa5e463
Showing
1 changed file
with
179 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
--- | ||
title: "Ultimate Guide to Removing Sensitive Data from Git History: Protect Your Codebase" | ||
excerpt: "Discover the definitive methods to remove sensitive data like API keys from Git history. Learn how to safeguard your repository, rewrite commit history, and prevent future leaks using best practices." | ||
coverImage: "/assets/blogs/removing-historic-commits/cover.webp" | ||
date: "2024-12-08T18:13:00.000Z" | ||
author: | ||
name: Umair Jibran | ||
picture: "/assets/authors/jibran.webp" | ||
ogImage: | ||
url: "/assets/blogs/removing-historic-commits/cover.webp" | ||
tags: | ||
- github | ||
- git | ||
- sensitive-data-removal | ||
- api-security | ||
- devops-best-practices | ||
- version-control | ||
--- | ||
|
||
# Ultimate Guide to Removing Sensitive Data from Git History: Protect Your Codebase | ||
|
||
> Unlike traditional credentials, secrets are meant to be distributed to developers, applications, and infrastructure systems. Adding more of these factors will inevitably make the number of secrets used in a development cycle increase, leading to a natural sprawling phenomenon: secrets start to appear hardcoded in source code. From an organization’s point of view, visibility and control over their distribution start to degrade. This is what secrets sprawl is all about – [GitGuardian](https://www.gitguardian.com/glossary/secret-sprawl-definition?ref=umairjibran.com) | ||
## Steps to Remove Historic Commits with Sensitive Data | ||
|
||
### 1. Understand the Problem | ||
|
||
When sensitive data is committed to a Git repository: | ||
|
||
- It remains in the commit history even if removed in later commits. | ||
- Anyone with access to the repository can recover the data by inspecting the history. | ||
|
||
The solution involves rewriting the commit history to permanently remove sensitive data. | ||
|
||
--- | ||
|
||
### 2. Prerequisites | ||
|
||
- **Backup:** Clone the repository and create a backup to prevent accidental data loss. | ||
- **Access:** Ensure you have the necessary permissions to rewrite history and force-push changes. | ||
|
||
--- | ||
|
||
### 3. Choose a Method for Removal | ||
|
||
There are two primary tools for removing sensitive data from Git history: | ||
|
||
#### A. **Git Filter-Branch (Deprecated, but Widely Used)** | ||
|
||
```bash | ||
git filter-branch --force --index-filter \ | ||
'git rm --cached --ignore-unmatch path/to/your/file' \ | ||
--prune-empty --tag-name-filter cat -- --all | ||
``` | ||
|
||
- `--index-filter`: Removes the file from the Git index for all commits. | ||
- `--prune-empty`: Removes commits that become empty after file removal. | ||
- `--all`: Rewrites all branches. | ||
|
||
#### B. **BFG Repo-Cleaner (Recommended)** | ||
|
||
BFG Repo-Cleaner is faster and simpler than `git filter-branch`. | ||
|
||
1. Install BFG: | ||
|
||
```bash | ||
brew install bfg | ||
``` | ||
|
||
Or download the [BFG JAR file](https://rtyley.github.io/bfg-repo-cleaner/). | ||
|
||
2. Use BFG to remove sensitive files: | ||
|
||
```bash | ||
bfg --delete-files path/to/your/file | ||
``` | ||
|
||
Or to remove specific strings (like API keys): | ||
|
||
```bash | ||
bfg --replace-text replace-patterns.txt | ||
``` | ||
|
||
Format for `replace-patterns.txt`: | ||
|
||
``` | ||
YOUR_SECRET_KEY==> | ||
``` | ||
|
||
3. Clean up and repackage the repository: | ||
```bash | ||
git reflog expire --expire=now --all | ||
git gc --prune=now --aggressive | ||
``` | ||
|
||
--- | ||
|
||
### 4. Push the Changes | ||
|
||
Force push the cleaned history to the remote repository: | ||
|
||
```bash | ||
git push origin --force --all | ||
git push origin --force --tags | ||
``` | ||
|
||
--- | ||
|
||
### 5. Notify Collaborators | ||
|
||
- Let collaborators know the history was rewritten. | ||
- Advise them to re-clone the repository to avoid conflicts. | ||
|
||
--- | ||
|
||
### 6. Invalidate Leaked Credentials | ||
|
||
- Rotate the API keys or sensitive credentials immediately, as they might have been compromised. | ||
- Use tools like [GitGuardian](https://www.gitguardian.com/) to detect any leaks. | ||
|
||
--- | ||
|
||
## Use Case Example | ||
|
||
Imagine you accidentally commit an API key in a file named `config.json`. Here’s how to remove it: | ||
|
||
1. **Identify the File:** Check if the file is part of your history. | ||
|
||
```bash | ||
git log --all -- config.json | ||
``` | ||
|
||
2. **Remove the File Using BFG:** | ||
|
||
```bash | ||
bfg --delete-files config.json | ||
``` | ||
|
||
3. **Replace the Key in the File Content:** | ||
Create `replace-patterns.txt`: | ||
|
||
``` | ||
OLD_API_KEY==> | ||
``` | ||
|
||
Then run: | ||
|
||
```bash | ||
bfg --replace-text replace-patterns.txt | ||
``` | ||
|
||
4. **Repackage and Push:** | ||
|
||
```bash | ||
git reflog expire --expire=now --all | ||
git gc --prune=now --aggressive | ||
git push origin --force --all | ||
``` | ||
|
||
5. **Rotate API Keys:** Immediately invalidate the old API key and issue a new one. | ||
|
||
--- | ||
|
||
## Best Practices to Avoid Leaking Sensitive Data | ||
|
||
1. **Use `.gitignore`:** Prevent sensitive files from being committed. | ||
2. **Git Hooks:** Use pre-commit hooks to scan for sensitive data. | ||
3. **Environment Variables:** Store sensitive data outside the repository. | ||
4. **Secret Scanning:** Enable tools like [GitHub's secret scanning](https://docs.github.com/en/code-security/secret-scanning) or [GitGuardian](https://www.gitguardian.com/). | ||
|
||
--- | ||
|
||
## References | ||
|
||
- [Git Documentation on Filter-Branch](https://git-scm.com/docs/git-filter-branch) | ||
- [BFG Repo-Cleaner Documentation](https://rtyley.github.io/bfg-repo-cleaner/) | ||
- [GitHub Docs: Removing Sensitive Data](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository) | ||
- [GitGuardian for Secret Scanning](https://www.gitguardian.com/) | ||
- [Rotating API Keys Best Practices](https://blog.gitguardian.com/api-key-rotation-best-practices/) |