Skip to content

Commit

Permalink
v3.7 - see CHANGELOG.md for details
Browse files Browse the repository at this point in the history
  • Loading branch information
xnl-h4ck3r committed Mar 14, 2024
1 parent 8e5da4c commit dc24248
Show file tree
Hide file tree
Showing 4 changed files with 29 additions and 15 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
## Changelog

- v3.7

- Changed

- Fix a big that can occur in some situations where error `ERROR processResponses 1: [Errno 2] No such file or directory: 'testing/responses.tmp'` shown. The required directories weren't being created correctly.
- Remove a debug print line I left in!
- Remove this script from downloaded responses that's now being included by archive.org:
`<script>window.RufflePlayer=window.RufflePlayer||{};window.RufflePlayer.config={"autoplay":"on","unmuteOverlay":"hidden"};</script>`
- Remove the comment `<!-- End Wayback Rewrite JS Include -->` from the downloaded responses.
- Clarify that `-nlf` argument is only relevant to `mode U`.

- v3.6

- Changed
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>

## About - v3.6
## About - v3.7

The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.

Expand Down Expand Up @@ -77,7 +77,7 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
| -ko | --keywords-only | Only return links and responses that contain keywords that you are interested in. This can reduce the time it takes to get results. If you provide the flag with no value, Keywords are taken from the comma separated list in the `config.yml` file (typically in `~/.config/waymore/`) with the `FILTER_KEYWORDS` key, otherwise you can pass a specific Regex value to use, e.g. `-ko "admin"` to only get links containing the word `admin`, or `-ko "\.js(\?\|$)"` to only get JS files. The Regex check is NOT case sensitive. |
| -lr | --limit-requests | Limit the number of requests that will be made when getting links from a source (this doesn\'t apply to Common Crawl). Some targets can return a huge amount of requests needed that are just not feasible to get, so this can be used to manage that situation. This defaults to 0 (Zero) which means there is no limit. |
| -ow | --output-overwrite | If the URL output file (default `waymore.txt`, or specified by `-oU`) already exists, it will be overwritten instead of being appended to. |
| -nlf | --new-links-file | If this argument is passed, a `waymore.new` file (or if `-oU` is used it will be the name of that file suffixed with `.new`) will also be written, and will contain links for the latest run. This can be used for continuous monitoring of a target. |
| -nlf | --new-links-file | If this argument is passed, a `waymore.new` file (or if `-oU` is used it will be the name of that file suffixed with `.new`) will also be written, and will contain links for the latest run. This can be used for continuous monitoring of a target (only for `mode U`, not `mode R`). |
| -c | --config | Path to the YML config file. If not passed, it looks for file `config.yml` in the default directory, typically `~/.config/waymore`. |
| -wrlr | --wayback-rate-limit-retry | The number of minutes the user wants to wait for a rate limit pause on Wayback Machine (archive.org) instead of stopping with a `429` error (default: 3). |
| -urlr | --urlscan-rate-limit-retry | The number of minutes the user wants to wait for a rate limit pause on URLScan.io instead of stopping with a `429` error (default: 1). |
Expand Down
2 changes: 1 addition & 1 deletion waymore/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__="3.6"
__version__="3.7"
27 changes: 15 additions & 12 deletions waymore/waymore.py
Original file line number Diff line number Diff line change
Expand Up @@ -756,6 +756,7 @@ def processArchiveUrl(url):
# Remove all web archive references in the response
archiveHtml = re.sub(r'\<script type=\"text\/javascript" src=\"\/_static\/js\/bundle-playback\.js\?v=[A-Za-z0-9]*" charset="utf-8"><\/script>\n<script type="text\/javascript" src="\/_static\/js\/wombat\.js.*\<\!-- End Wayback Rewrite JS Include --\>','',archiveHtml,1,flags=re.DOTALL|re.IGNORECASE)
archiveHtml = re.sub(r'\<script src=\"\/\/archive\.org.*\<\!-- End Wayback Rewrite JS Include --\>','',archiveHtml,1,flags=re.DOTALL|re.IGNORECASE)
archiveHtml = re.sub(r'\<script\>window\.RufflePlayer[^\<]*\<\/script\>','',archiveHtml,1,flags=re.DOTALL|re.IGNORECASE)
archiveHtml = re.sub(r'\<\!-- BEGIN WAYBACK TOOLBAR INSERT --\>.*\<\!-- END WAYBACK TOOLBAR INSERT --\>','',archiveHtml,1,flags=re.DOTALL|re.IGNORECASE)
archiveHtml = re.sub(r'(}\n)?(\/\*|<!--\n)\s*FILE ARCHIVED ON.*108\(a\)\(3\)\)\.\n(\*\/|-->)','',archiveHtml,1,flags=re.DOTALL|re.IGNORECASE)
archiveHtml = re.sub(r'var\s_____WB\$wombat\$assign\$function.*WB\$wombat\$assign\$function_____\(\"opener\"\);','',archiveHtml,1,flags=re.DOTALL|re.IGNORECASE)
Expand All @@ -767,6 +768,7 @@ def processArchiveUrl(url):
archiveHtml = re.sub(r'\<script type=\"text\/javascript\">\s*__wm\.init\(\"https:\/\/web\.archive\.org\/web\"\);[^\<]*\<\/script\>','',archiveHtml,flags=re.IGNORECASE)
archiveHtml = re.sub(r'\<script type=\"text\/javascript\" src="https:\/\/web-static\.archive\.org[^\<]*\<\/script\>','',archiveHtml,flags=re.IGNORECASE)
archiveHtml = re.sub(r'\<link rel=\"stylesheet\" type=\"text\/css\" href=\"https:\/\/web-static\.archive\.org[^\<]*\/\>','',archiveHtml,flags=re.IGNORECASE)
archiveHtml = re.sub(r'\<\!-- End Wayback Rewrite JS Include --\>','',archiveHtml,re.IGNORECASE)

# If there is a specific Wayback error in the response, raise an exception
if archiveHtml.lower().find('wayback machine has not archived that url') > 0 or archiveHtml.lower().find('snapshot cannot be displayed due to an internal error') > 0:
Expand Down Expand Up @@ -2598,17 +2600,18 @@ def createDirs():
domain_dir.mkdir(parents=True, exist_ok=True)
except Exception as e:
pass
else:
print("HERE2")
try:
if args.output_responses == '':
responseDir = Path(args.output_responses)
responseDir.mkdir(parents=True, exist_ok=True)
if args.output_urls == '':
responseDir = Path(args.output_urls)
responseDir.mkdir(parents=True, exist_ok=True)
except Exception as e:
pass
try:
# Create specified directory for -oR if required
if args.output_responses != '':
responseDir = Path(args.output_responses)
responseDir.mkdir(parents=True, exist_ok=True)
# If -oU was passed and is prefixed with a directory, create it
if args.output_urls != '' and '/' in args.output_urls:
directoriesOnly = os.path.dirname(args.output_urls)
responseDir = Path(directoriesOnly)
responseDir.mkdir(parents=True, exist_ok=True)
except Exception as e:
pass
except Exception as e:
writerr(colored(getSPACER('ERROR createDirs 1: ' + str(e)), 'red'))

Expand Down Expand Up @@ -2867,7 +2870,7 @@ def main():
"-nlf",
"--new-links-file",
action="store_true",
help="If this argument is passed, a .new file will also be written that will contain links for the latest run.",
help="If this argument is passed, a .new file will also be written that will contain links for the latest run. This is only relevant for mode U.",
)
parser.add_argument(
"-c",
Expand Down

0 comments on commit dc24248

Please sign in to comment.