Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document manual process for removing WARCs #4

Open
edsu opened this issue May 23, 2022 · 3 comments
Open

Document manual process for removing WARCs #4

edsu opened this issue May 23, 2022 · 3 comments
Labels
blocked prereqs for this ticket aren't done yet web archiving 2022 web archiving work cycle

Comments

@edsu
Copy link
Contributor

edsu commented May 23, 2022

@peterchanws has occasionally needed to remove WARC files from SWAP. This isn't a common occurrence, and there isn't an obvious place to trigger it in Argo, so this can be a manual process. We should document the process in DevOpsDocs.

@edsu edsu added the web archiving 2022 web archiving work cycle label May 23, 2022
@lwrubel
Copy link
Contributor

lwrubel commented May 31, 2022

I think we had the wrong GitHub handle for you, @peterchanws.

@aaron-collier
Copy link
Contributor

Should this be assigned to @peterchanws?

@ndushay ndushay added the blocked prereqs for this ticket aren't done yet label Jun 3, 2022
@lwrubel
Copy link
Contributor

lwrubel commented Jul 21, 2022

The use case is if we have WARCs with PII. This is a future use case, not existing need. We think this process involves removing the WARC from web-archiving-stacks and re-indexing all WARCs (or remove lines from indexes that reference the problem WARC). This is a hand-editing process right now, so a script to do this would be an improvement.

We'll wait for future pywb improvements to access logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked prereqs for this ticket aren't done yet web archiving 2022 web archiving work cycle
Projects
None yet
Development

No branches or pull requests

4 participants