Skip to content

Commit

Permalink
v3.6 - see CHANGELOG.md for details
Browse files Browse the repository at this point in the history
  • Loading branch information
xnl-h4ck3r committed Mar 14, 2024
1 parent 7185aed commit 8e5da4c
Show file tree
Hide file tree
Showing 5 changed files with 21 additions and 11 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
## Changelog

- v3.6

- Changed

- Added `-ko` to the suggestions displayed for Responses when the `-co`/`--check-only` option is used, and there a huge amount of requests to be made.
- Remove `-ko` from the suggestion displayed for Urls when the `-co`/`--check-only` option is used, because this doesn't affect this. The `-ko` is applied after the links are retrieved.
- Add a statement to `setup.py` to show where `config.yml` is created if it doesn't already exist. This is to help in figuring out Issue #41.

- v3.5

- Changed
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<center><img src="https://github.com/xnl-h4ck3r/waymore/blob/main/waymore/images/title.png"></center>

## About - v3.5
## About - v3.6

The idea behind **waymore** is to find even more links from the Wayback Machine than other existing tools.

Expand Down Expand Up @@ -57,7 +57,7 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
| -f | --filter-responses-only | The initial links from sources will not be filtered, only the responses that are downloaded, e.g. it maybe useful to still see all available paths from the links, even if you don't want to check the content. |
| -fc | | Filter HTTP status codes for retrieved URLs and responses. Comma separated list of codes (default: the `FILTER_CODE` values from `config.yml`). Passing this argument will override the value from `config.yml` |
| -mc | | Only Match HTTP status codes for retrieved URLs and responses. Comma separated list of codes. Passing this argument overrides the config `FILTER_CODE` and `-fc`. |
| -l | --limit | How many responses will be saved (if `-mode R` or `-mode B` is passed). A positive value will get the **first N** results, a negative value will will get the **last N** results. A value of 0 will get **ALL** responses (default: 5000) |
| -l | --limit | How many responses will be saved (if `-mode R` or `-mode B` is passed). A positive value will get the **first N** results, a negative value will get the **last N** results. A value of 0 will get **ALL** responses (default: 5000) |
| -from | --from-date | What date to get responses from. If not specified it will get from the earliest possible results. A partial value can be passed, e.g. `2016`, `201805`, etc. |
| -to | --to-date | What date to get responses to. If not specified it will get to the latest possible results. A partial value can be passed, e.g. `2021`, `202112`, etc. |
| -ci | --capture-interval | Filters the search on archive.org to only get at most 1 capture per hour (`h`), day (`d`) or month (`m`). This filter is used for responses only. The default is `d` but can also be set to `none` to not filter anything and get all responses. |
Expand All @@ -77,7 +77,7 @@ pipx install git+https://github.com/xnl-h4ck3r/waymore.git
| -ko | --keywords-only | Only return links and responses that contain keywords that you are interested in. This can reduce the time it takes to get results. If you provide the flag with no value, Keywords are taken from the comma separated list in the `config.yml` file (typically in `~/.config/waymore/`) with the `FILTER_KEYWORDS` key, otherwise you can pass a specific Regex value to use, e.g. `-ko "admin"` to only get links containing the word `admin`, or `-ko "\.js(\?\|$)"` to only get JS files. The Regex check is NOT case sensitive. |
| -lr | --limit-requests | Limit the number of requests that will be made when getting links from a source (this doesn\'t apply to Common Crawl). Some targets can return a huge amount of requests needed that are just not feasible to get, so this can be used to manage that situation. This defaults to 0 (Zero) which means there is no limit. |
| -ow | --output-overwrite | If the URL output file (default `waymore.txt`, or specified by `-oU`) already exists, it will be overwritten instead of being appended to. |
| -nlf | --new-links-file | If this argument is passed, a waymore.new file (or if `-oU` is used it will be the name of that file suffixed with `.new`) will also be written that will contain links for the latest run. This can be used for continuous monitoring of a target. |
| -nlf | --new-links-file | If this argument is passed, a `waymore.new` file (or if `-oU` is used it will be the name of that file suffixed with `.new`) will also be written, and will contain links for the latest run. This can be used for continuous monitoring of a target. |
| -c | --config | Path to the YML config file. If not passed, it looks for file `config.yml` in the default directory, typically `~/.config/waymore`. |
| -wrlr | --wayback-rate-limit-retry | The number of minutes the user wants to wait for a rate limit pause on Wayback Machine (archive.org) instead of stopping with a `429` error (default: 3). |
| -urlr | --urlscan-rate-limit-retry | The number of minutes the user wants to wait for a rate limit pause on URLScan.io instead of stopping with a `429` error (default: 1). |
Expand Down
4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,6 @@
)

if configNew:
print('\n\033[33mIMPORTANT: The file '+target_directory+'/config.yml already exists.\nCreating config.yml.NEW but leaving existing config.\nIf you need the new file, then remove the current one and rename config.yml.NEW to config.yml\n\033[0m')
print('\n\033[33mIMPORTANT: The file '+target_directory+'/config.yml already exists.\nCreating config.yml.NEW but leaving existing config.\nIf you need the new file, then remove the current one and rename config.yml.NEW to config.yml\n\033[0m')
else:
print('\n\033[92mThe file '+target_directory+'/config.yml has been created.\n\033[0m')
2 changes: 1 addition & 1 deletion waymore/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__="3.5"
__version__="3.6"
12 changes: 6 additions & 6 deletions waymore/waymore.py
Original file line number Diff line number Diff line change
Expand Up @@ -948,9 +948,9 @@ def processURLOutput():
elif hours < 24:
write(colored('\n-> Getting URLs (e.g. at 1 req/sec) take more than '+str(hours)+' hours.','yellow'))
elif days < 7:
write(colored('\n-> Getting URLs (e.g. at 1 req/sec) could take more than '+str(days)+' days. Consider using arguments -ko, -lr, -ci, -from and -to wisely!','red'))
write(colored('\n-> Getting URLs (e.g. at 1 req/sec) could take more than '+str(days)+' days. Consider using arguments -lr, -ci, -from and -to wisely!','red'))
else:
write(colored('\n-> Getting URLs (e.g. at 1 req/sec) could take more than '+str(days)+' days!!! Consider using arguments -ko, -lr, -ci, -from and -to wisely!','red'))
write(colored('\n-> Getting URLs (e.g. at 1 req/sec) could take more than '+str(days)+' days!!! Consider using arguments -lr, -ci, -from and -to wisely!','red'))
write('')
else:
linkCount = len(linksFound)
Expand Down Expand Up @@ -2532,17 +2532,17 @@ def processResponses():
elif hours < 24:
write(colored('\n-> Downloading the responses (depending on their size) could take more than '+str(hours)+' hours.','yellow'))
elif days < 7:
write(colored('\n-> Downloading the responses (depending on their size) could take more than '+str(days)+' days. Consider using arguments -l, -ci, -from and -to wisely! ','red'))
write(colored('\n-> Downloading the responses (depending on their size) could take more than '+str(days)+' days. Consider using arguments -ko, -l, -ci, -from and -to wisely! ','red'))
else:
write(colored('\n-> Downloading the responses (depending on their size) could take more than '+str(days)+' days!!! Consider using arguments -l, -ci, -from and -to wisely!','red'))
write(colored('\n-> Downloading the responses (depending on their size) could take more than '+str(days)+' days!!! Consider using arguments -ko, -l, -ci, -from and -to wisely!','red'))
write('')
else:
# If the limit has been set over the default, give a warning that this could take a long time!
if totalResponses - successCount > DEFAULT_LIMIT:
if successCount > 0:
writerr(colored(getSPACER('WARNING: Downloading remaining ' + str(totalResponses - successCount) + ' responses may take a loooooooong time! Consider using arguments -l, -ci, -from and -to wisely!'),'yellow'))
writerr(colored(getSPACER('WARNING: Downloading remaining ' + str(totalResponses - successCount) + ' responses may take a loooooooong time! Consider using arguments -ko, -l, -ci, -from and -to wisely!'),'yellow'))
else:
writerr(colored(getSPACER('WARNING: Downloading ' + str(totalResponses) + ' responses may take a loooooooong time! Consider using arguments -l, -ci, -from and -to wisely!'),'yellow'))
writerr(colored(getSPACER('WARNING: Downloading ' + str(totalResponses) + ' responses may take a loooooooong time! Consider using arguments -ko, -l, -ci, -from and -to wisely!'),'yellow'))

# Open the index file if hash value is going to be used (not URL)
if not args.url_filename:
Expand Down

0 comments on commit 8e5da4c

Please sign in to comment.