-
v4.8
-
Changed
- BUG FIX: When downloading responses and creating the file name, sometimes the file extension is incorrectly derived and has
/
in it, e.g.5146045725697.well-known/openid-configuration
, and this causes the writing of the file to fail. If the derived extension does contain/
then it will be reset to blank, and determined a different way.
- BUG FIX: When downloading responses and creating the file name, sometimes the file extension is incorrectly derived and has
-
-
v4.7
-
New
- BUG FIX: If an input domain has unicode in, e.g
xñl.uk
, then it will be converted to the punycode version, e.g.xn--xl-zja.uk
to use that as the input instead. This will ensure the URLs and responses are correctly retrieved from the archive sources.
- BUG FIX: If an input domain has unicode in, e.g
-
-
v4.6
-
New
- Add argument
-ft
to specify a list of MIME Types to filter. This will override theFILTER_MIME
list inconfig.yml
. NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.. - Add argument
-mt
to specify a list of MIME Types to match. This will be used instead of the default filtering usingFILTER_MIME
list inconfig.yml
, or filtering using-ft
. NOTE: This will NOT be applied to Alien Vault OTX and Virus Total because they don't have the ability to filter on MIME Type. Sometimes URLScan does not have a MIME Type defined - these will always be included. Consider excluding sources if this matters to you.. - Add argument
--providers
in the same way asgau
. A comma separated list of source providers that you want to get URLs from. The values can bewayback
,commoncrawl
,otx
,urlscan
andvirustotal
. Passing this will override any exclude arguments (e.g.-xwm
,-xcc
, etc.) passed to exclude sources, and reset those based on what was passed with this argument.
- Add argument
-
Changed
- When argument
--verbose
has been used and the options are shown, show the name of providers that will be searched instead of the exclude arguments, e.g.-xwm
,-xcc
, etc. - Change
HTTP_ADAPTER_CC
used for Common Crawl requests to useretries+3
instead ofreties+20
. This was originally suggested by Common Crawl, but there are so many issues it can just take forever to get anything from their API, and often fail anyway. - Change the default of
-lcc
to 1 instead of 3 because of so many problems with Common Crawl. - BUG FIX: If a connection error occurs when getting the Common Crawl index file, then error
ERROR getCommonCrawlUrls 1: object of type 'NoneType' has no len()
is displayed. This will now be suppressed. - BUG FIX: If arg
-mc
was not passed and-ft
was, when options were shown to the user (inshowOptions
function), the value of-mc
was shown for-ft
. - BUG FIX: When a MIME type is used in a filter for Wayback Machine that has a
+
in it (e.g.image/svg+xml
), then the+
was replaced because that'#s the only way Wayback recognises it. However, it was being escaped first and was being converted toimage/svg\.xml
instead ofimage/svg.xml
so was not recognised in the filter.
- When argument
-
-
v4.5
-
Change
- BUG FIX: When
-f
/--filter-responses-only
is used, and retrieving Wayback Archive links, the links were still being filtered for URL exclusions, e.g. the extensions. This has been fixed and should return more links in that situation. - BUG FIX: If there is an invalid response from Alien Vault, then the error
ERROR: getAlienVaultUrls 1: Expecting value: line 1 column 1 (char 0)
is raised. This will be handled properly. - BUG FIX: If there is an invalid response from URLScan, then the error
ERROR getURLScanUrls 1: local variable 'jsonResp' referenced before assignment
is raised. This will be handled properly. - BUG FIX: If there is an invalid response from Virus Total, then the error
ERROR getVirusTotalUrls 1: Expecting value: line 1 column 1 (char 0)
is raised. This will be handled properly. - BUG FIX: When retrieving links from the Wayback Archive, and the user presses Ctrl-C to cancel the program, the error
[ ERR ] Error getting response for page - local variable 'resp' referenced before assignment
was displayed. This will no longer be shown.
- BUG FIX: When
-
-
v4.4
-
New
- When using
-mode R
, if input was used that does find results, but then those reults don't match the input given, then display a message. For example, if input iswww.hackerone.com/xnl
then wayback machine returns links forhttp://hackerone.com/xnl
(without thewww.
). These don't match so aren't returned, but a message will give the user and clue as to what to change the input to if they did want those.
- When using
-
Changed
- BUG FIX:Rewrite the logic in
linksFoundAdd
and correct a typo that always made a runtime error occur and always add a link, without doing the check to see if the domain matches what was searched for (it's rare other URLs are included anyway). Also use newlinksFoundResponseAdd
with similar logic, but remove the prefixed timestamp which occurs with response links. - BUG FIX: If a URL is passed (instead of just a domain) as input for
-mode R
to download archived responses, it would not download anything because it would check the result contains the input, but the default port number is included in wayback results, but not included in the input. This has been corrected. - Remove
argparse
fromsetup.py
andrequirements.txt
because it is a standard Python module.
- BUG FIX:Rewrite the logic in
-
-
v4.3
-
Changed
- Wayback Machine seemed to have made some changes to their CDX API without any notice or documentation. This caused problems getting URLs for
-mode U
because the API pagination no longer worked. If a number of pages cannot be returned, then all links will be retrieved in one request. However, if they "fix" the problem and pagination starts working again, it will revert to previous code that will get results a page at a time. - Although the bug fix for Github Issue #45 appeared to be working fine since the last version, the "changes" made by Wayback machine seemed to have broken that too. The code had to be refactored to work (i.e. don't include the
collapse
parameter at all ifnone
), but also no longer works with multiple fields. - When
-co
is used, there is no way to tell how long the results will take from Wayback machine now because all the data is retrieved in one request. While pagination is broken, this will just returnUnknown
but will revert back to previous functionality if pagination is fixed.
- Wayback Machine seemed to have made some changes to their CDX API without any notice or documentation. This caused problems getting URLs for
-
-
v4.2
-
Changed
- BUG FIX: Github Issue #45 - When getting archived responses from wayback machine, by default it is supposed to get one capture per day per URL (this interval can be changed with
-ci
). But, it was only getting one response per day, not for all the different URLs per day. Thanks to @zakaria_ounissi for raising this. - BUG FIX: Github Issue #46 - The config
FILTER_URL
list was being applied to links found from all sources, except wayback machine. So if the MIME type wasn't correct, it was possible that links that matchedFILTER_URL
were still included in the output. Thanks to @brutexploiter for raising this.
- BUG FIX: Github Issue #45 - When getting archived responses from wayback machine, by default it is supposed to get one capture per day per URL (this interval can be changed with
-
-
v4.1
-
Changed
- Removed line
from tqdm import tqdm
as it is not needed and will cause errors if not installed.
- Removed line
-
-
v4.0
-
New
- Add argument
-oijs
/--output-inline-js
. If passed, and archived responses are requested, all unique scripts from the responses (excluding.js
,.csv
,.xls
,.xslx
,.doc
,.docx
,.pdf
,.msi
,.zip
,.gzip
,.gz
,.tar
,.rar
,.json
) will be extracted and written to filescombinedInline{}.js
(in the same response directory) where{}
will be the number of the file for every 1000 unique scripts. There will also be a filecombinedInlineSrc.txt
written that will contain thesrc
value for all inline external scripts. - Exclude SOME downloaded custom 404 responses for
-mode R
if 404 status is to be excluded. The custom 404 pages will be identified by the regex<title>[^\<]*(404|not found)[^\<]*</title>
. - Add
long_description_content_type
tosetup.py
to upload to PyPi - Add
waymore
toPyPi
so can be installed withpip install waymore
- Add argument
-
Changed
- When getting the
DEFAULT_OUTPUT_DIR
, use theos.path.expanduser
to ensure that the full path is used.
- When getting the
-
-
v3.7
-
Changed
- Fix a big that can occur in some situations where error
ERROR processResponses 1: [Errno 2] No such file or directory: 'testing/responses.tmp'
shown. The required directories weren't being created correctly. - Remove a debug print line I left in!
- Remove this script from downloaded responses that's now being included by archive.org:
<script>window.RufflePlayer=window.RufflePlayer||{};window.RufflePlayer.config={"autoplay":"on","unmuteOverlay":"hidden"};</script>
- Remove the comment
<!-- End Wayback Rewrite JS Include -->
from the downloaded responses. - Clarify that
-nlf
argument is only relevant tomode U
.
- Fix a big that can occur in some situations where error
-
-
v3.6
-
Changed
- Added
-ko
to the suggestions displayed for Responses when the-co
/--check-only
option is used, and there a huge amount of requests to be made. - Remove
-ko
from the suggestion displayed for Urls when the-co
/--check-only
option is used, because this doesn't affect this. The-ko
is applied after the links are retrieved. - Add a statement to
setup.py
to show whereconfig.yml
is created if it doesn't already exist. This is to help in figuring out Issue #41.
- Added
-
-
v3.5
-
Changed
- Change
README
descriptions of-oU
and-oR
to reference recentDEFAULT_OUTPUT_DIR
. - Change description of
-ra
arg in code and onREADME
to say all sources. - Other small improvements to
README
.
- Change
-
-
v3.4
-
New
- Add
DEFAULT_OUTPUT_DIR
to theconfig.yml
file. This will be used to specify the default directory where output will be written if the-oU
and-oR
options aren't used. If blank, this defaults to the directory whereconfig.yml
is stored (typically~/.config/waymore/
).
- Add
-
-
v3.3
-
New
- Add
WEBHOOK_DISCORD
toconfig.yml
to provide a webhook to be notified whenwaymore
has finished, because in some cases it can take a looooooong time! - Add arg
-nd
/--notify-discord
to send a notification to the specified Discord webhook inconfig.yml
whenwaymore
completes. This is useful when becausewaymore
can take a looooong time to complete for some targets.
- Add
-
-
v3.2
-
New
- When getting the Common Crawl index file, if the response is 503, then let the user know it's unavailable. If anything other than 429 or 403, then print an error.
-
Changed
- Don't show the coffee link if the output is piped out to something else.
- Remove the
mimetypes
library as it turned out to be quite inaccurate compared to just getting the path extension and using content type of response. Also improve the extension logic. - If the input has a path, then make sure it is treated as if no subdomains are wanted, i.e. don't prefix with
*.
. This stopped links coming back from archive.org - Change the messaging to make more sense when multiple sources are used, showing
Links found...
for the first, butExtra links found...
for the rest.
-
-
v3.1
-
Changed
- Make the identification of extension type better when creating the archived hash files. First try the
mimetypes
library that guesses the extension based on the mimetype. If that doesn't work, try to get the extension from the path. If the extension cannot be retrieved from the path, it will be derived from thecontent-type
header. If a generic type still can't be obtained, it will be set to the 2nd part of thecontent-type
after the/
. If still unknown, it will be set to.unknown
. There will be no more.xnl
extensions by default. - Updated
README
and images to reflect the most recent version.
- Make the identification of extension type better when creating the archived hash files. First try the
-
-
v3.0
-
New
- Allow
waymore
to be installed usingpip
orpipx
so it can be run from any directory. - Show the current version of the tool in the banner, and whether it is the latest, or outdated.
- When installing
waymore
, if theconfig.yml
already exists then it will keep that one and createconfig.yml.NEW
in case you need to replace the old config. - Add reference to VirusTotal v2 API in
README.md
. - Fix a big where the
results/target
folder was being created every time, even if the-oU
and-oR
arguments were passed. - Include "Buy Me a Coffee" link at the end of output.
- Allow
-
Changed
- Change installation instructions in
README.md
. - If
--check-only
was passed and it looks like it will take a long time. include the-ko
argument in the message description of arguments to consider.
- Change installation instructions in
-
-
v2.6
-
New
- Use
tldextract
library to determine whether the input is a subdomain, or just a domain. - Include
tldextract
insetup.py
- Use
-
Changed
- Fix a bug that causes Alien Vault to not return any links if a subdomain is passed as input. This happens because the api is called with
/indicators/domain/
. If a URL is passed, it will use/indicators/hostname/
instead and return links successfully. - Fix a bug that causes URLScan to fail with error
[ 400 ] Unable to get links from urlscan.io
. This happens when a URL is sent as input because URLScan.io can only retrieve information for hosts. Also, if a host is sent with a trailing/
then it will be stripped for URLScan.io so it doesn't think there is a path. - Fix a bug that causes Alien Vault to fail with runtime error
ERROR getAlienVaultUrls 1: 'full_size'
. This happens when a URL is sent as input. This will now successfully return links for passed URLs.
- Fix a bug that causes Alien Vault to not return any links if a subdomain is passed as input. This happens because the api is called with
-
-
v2.5
-
New
- Show a warning if the user may be passing a sub domain. The chances are that they want all subs if a domain, so should just call for the domain only.
-
-
v2.4
-
New
- Add lots of extra search terms to the
DEFAULT_FILTER_KEYWORDS
andFILTER_KEYWORDS
inconfig.yml
- Add lots of extra search terms to the
-
Changed
- The Common Crawl HTTPAdaptor for retry strategy will just be applied for code 503. There was an issue with 504 errors happening, and then waymore effectively freezes because of the retry strategy. The Common Crawl documentation (https://commoncrawl.org/blog/oct-nov-2023-performance-issues) just says to retry on 503.
-
-
v2.3
-
New
- Add
jira
as a search term to theDEFAULT_FILTER_KEYWORDS
andFILTER_KEYWORDS
inconfig.yml
- Add
-
-
v2.2
-
New
- Add
-lcy
argument. This lets you limit the number of Common Crawl index collections searched by the year of the index data. The earliest index has data from 2008. Setting to 0 (default) will search collections or any year (but in conjuction with-lcc
). For example, if you are only interested in data from 2015 and after, pass-lcy 2015
. This will override the value of-lcc
if passed.
- Add
-
-
v2.1
-
New
- When the responses are downloaded from archive.org they include some archive.orf code such as scripts and stylesheets. This is usually removed but they may have changed this so was being included again. This change will ensure the new code is removed so the response doesn't include the archive.org code.
-
-
v2.0
-
New
- Add VirusTotal as a source for URLs. We will get URLs from the v2 API domain report. This can include sub domains, detected URLs, and undetected URLs in the response. It does not give you the status code or MIME type of the links, so we will just check against extension.
- Show a specific message for Wayback Machine if there is a Connection Refused error. This happens when they have blocked the users IP.
- Add some pointless celebration messages to the banner for a few different dates!
-
-
v1.37
-
New
- Add argument
-co
/--check-only
. If passed, then it will just get the count of requests that need to be made to get URLs from the sources, and how many archived responses will be downloaded. It will try to give an idea of the time the tool could take with the settings given.
- Add argument
-
-
v1.36
-
New
- Add argument
-wrlr
/--wayback-rate-limit-retry
which is the number of minutes the user wants to wait for a rate limit pause on Wayback Machine (archive.org) instead of stopping with a429
error. This defaults to 3 minutes which is a time that seems to work for a while after. - Add some additional User-Agents to use when making requests to the API providers.
- Add new MIME exclusions
video/x-ms-wmv
,image/x-png
,video/quicktime
,image/x-ms-bmp
,font/opentype
,application/x-font-opentype
,application/x-woff
andaudio/aiff
.
- Add argument
-
Changed
- Change the default
-p
/--processes
to 1 instead of 3. This is to help with the rate limiting now put in place by web.archive.org. If set to 1 we can also ensure that the pages are processed in order and save where we stopped. - Change the
backoff_factor
onHTTP_ADAPTER
from 1 to 1.1 to help with the rate limiting now put in place by web.archive.org. - Change the
pages
set to a list to ensure pages are processed in order (only does if--processes
is 1).
- Change the default
-
-
v1.35
-
New
- I had a specific problem with my ISP blocking archive.org for adult content (!) which resulted in a large and confusing error message. This has been replaced by a more useful message if this happens for anyone else.
-
-
v1.34
-
Changed
- Any scheme, port number, query string, or URL fragment will be removed from the input values.
- Only show the warning
No value for "URLSCAN_API_KEY" in config.yml - consider adding (you can get a FREE api key at urlscan.io)
if the-xus
argument wasn't passed. - If the input has a domain AND path, then it will still be searched for links, and the mode will not be forced to R.
- When input value is validated and
<stdin>
is used, just assume one line is a domain/url, and multiple lines are treated as a file (so the correct description is shown).
-
-
v1.33
-
Changed
- A bug existed that would cause any site that had only had one page of links to not be completely retrieved. Change the processing for Wayback Machine that gets the number of pages. If the total number of pages is 1, then don't pass page number at all.
- In the
getSPACER
function, add 5 to the length instead of taking 1 away, to text artifacts aren't left.
-
-
v1.32
-
Changed
- Changes to prevent
SyntaxWarning: invalid escape sequence
errors when Python 3.12 is used.
- Changes to prevent
-
-
v1.31
-
New
- Add new argument
-urlr
/--urlscan-rate-limit-retry
to pass the number of minutes that you want to wait between each rate-limit pause from URLScan. - Add new MIME exclusions
application/x-msdownload
andapplication/x-ms-application
.
- Add new argument
-
Changed
- When getting URLs from the results of URLScsn, also get the
[task][url]
values. Thanks to @Ali45598547 for highlighting this! - When the URLScan rate limits, it says how many seconds you need to wait until you can try again. If less than 1 minute, the program will wait automatically to get more results. If more than 1 minute, then the code will wait for the length of time specified by the
-urlr
/--urlscan-rate-limit-retry
argument, if passed. - For CommonCrawl, do at least 20 retires. This helps reduce the problem of `503`` errors and doing many retries was suggested by CommonCrawl them selves to deal with the problem.
- When getting URLs from the results of URLScsn, also get the
-
-
v1.30
-
Changed
- If there any
+
in the MIME types, e.g.image/svg+xml
, then replace the+
with a.
otherwise the wayback API does not recognise it. - Add
application/font-otf
to theFILTER_MIME
value inconfig.yml
.
- If there any
-
-
v1.29
-
New
- Check for specific text in response code of 503 (which usually means the site is down for maintenance or not available) and return a specific message instead of the full response.
-
-
v1.28
-
New
- Added
application/font-otf
toDEFAULT_FILTER_MIME
- Added
-
Changed
- Fix a bug that overwrites the output URLs file if the input is a file that contains different hosts.
-
-
v1.27
-
Changed
- Set the default for
-lcc
to 3 instead of 0 to only search the 3 latest indexes for Common Crawl instead of all of them.
- Set the default for
-
-
v1.26
-
Changed
- Allow an input value of just a TLD, e.g.
.mil
. If a TLD is passed then resources for all domains with that TLD will be retrieved. NOTE: If a TLD is passed then the Alien Vault OTX source is excluded because it needs a full domain.
- Allow an input value of just a TLD, e.g.
-
-
v1.25
-
Changed
- Fix a bug that always strips the port number from URLs found. It should only remove the port if it is :80 or :443
-
-
v1.24
-
Changed
- Handle errors with the config file better. Display specific message to say if the file isn't found or if there is a formatting error. If there is any other kind of error, the error message will be displayed. THe default values will be used in the case of any of these errors.
-
-
v1.23
-
Changed
- The
-ko
/--keywords-only
argument can now be passed without a value, which will use theFILTER_KEYWORDS
inconfig.yml
as before, or passed with a Regex value that will be used instead. For example,-ko "admin"
to only get links containing the wordadmin
, or-ko "\.js(\?\|$)"
to only get JS files. The Regex check is NOT case sensitive.
- The
-
-
v1.22
-
Changed
- Fix issue #23. If a file is passed as input, an error would occur if any of the domains in the file contained a capital letter or ended with a full stop. The regex in
validateArgInput
has been amended to fix this, adn any.
on the end of a domain is stripped and domain converted to lowercase before processing.
- Fix issue #23. If a file is passed as input, an error would occur if any of the domains in the file contained a capital letter or ended with a full stop. The regex in
-
-
v1.21
-
Changed
- Fix issue #24. If the
FILTER_CODE
inconfig.yml
is set to one status code then it is needs to be explicitly set to a string ingetConfig()
- Fix issue #24. If the
-
-
v1.20
-
New
- Add argument
-fc
for filtering HTTP status codes. Using this will override theFILTER_CODE
value fromconfig.yml
. This is for specifying HTTP status codes you want to exclude from the results, and are provided in a comma separated list. - Add argument
-mc
for matching HTTP status codes. Using this will override theFILTER_CODE
value fromconfig.yml
AND the-fc
argument. This is for specifying HTTP status codes you want to match from the results, and are provided in a comma separated list.
- Add argument
-
Changed
- Changed how filters are specified in the request to the Common Crawl API. Removes the regex negative lookahead which is not needed if you use
filter=!
- Changed how filters are specified in the request to the Common Crawl API. Removes the regex negative lookahead which is not needed if you use
-
-
v1.19
-
Changed
- Bug fix - ignore any blank lines in the input file when validating if input is in the correct format
-
-
v1.18
-
Changed
- Cache the Common Crawl
collinfo.json
file locally. The file is only updated a few times per year so there is no point in requesting it every time waymore is run. Common Crawl can struggle with volume against it's API which can cause timeouts, and currently, about 10% of all requests they get are for thecollinfo.json
! - Add a HTTPAdapter specifically for Common Crawl to have
retries
andbackoff_factor
increased which seems to reduce the errors and maximize the results found.
- Cache the Common Crawl
-
-
v1.17
- Changed
- If an input file has a sub domain starting with _ or - then an error was raised, but these are valid. This bug has been fixed.
- In addition to the fix above, the error message will show what line was flagged in error so the user can raise an issue on Github about it if they believe it is an error.
- Changed
-
v1.16
- Changed
- Fix a bug that raises
ERROR processURLOutput 6: [Errno 2] No such file or directory: ''
if the value passed to-oU
has no directory specified as part of the file name.
- Fix a bug that raises
- Changed
-
v1.15
- Changed
- Fix bug that shows an error when
-v
is passed and-oU
does not specify a directory, just a filename
- Fix bug that shows an error when
- Changed
-
v1.14
- Changed
- Fix a bug with the
-c
/--config
option
- Fix a bug with the
- Changed
-
v1.13
-
New
- Added argument
-oU
/--output-urls
to allow the user to specify a filename (including path) for the URL links file when-mode U
(orB
oth) is used. If not passed, then the filewaymore.txt
will be created in theresults/{target.domain}
directory as normal. If a path is passed with the file, then any directories will be created. For example:-oU ~/Recon/Redbull/waymoreUrls.txt
- Added argument
-oR
/--output-responses
to allow the user to specify a directory (or path) where the archived responses andindex.txt
file is written when-mode R
(orB
oth) is used. If any directories in the path do not exist they will be created. For example:-oR ~/Recon/Redbull/waymoreResponses
- Added argument
-
Changed
- When removing all web archive references in the downloaded archived response, there were a few occasions this wasn't working so the regex has been changed to be more specific to ensure this works.
-
-
v1.12
- New
- Added argument
-c
/--config
to specify the full path of a YML config file. If not passed, it looks for fileconfig.yml
in the same directory as runtime filewaymore.py
- Added argument
- New
-
v1.11
- New
- Added argument
-nlf
/--new-links-file
. If passed, and you run-mode U
or-mode B
to get URLs more than once for the same target, thewaymore.txt
will still be appended with new links (unless-ow
is passed), but a new output file calledwaymore.new
will also be written. If there are no new links, the empty file will still be created. This can be used for continuous monitoring of a target. - Added a
waymore
folder containing a new__init__.py
file that contains the__version__
value. - Added argument
--verison
to display the current version. - Show better error messages if the archive.org site returns a
Blocked Site Error
.
- Added argument
- Changed
- If a file of domains is passed as input, make sure spaces are stripped from the lines.
- Change
.gitignore
to include__pycache__
. - Move images to
waymore/images
folder.
- New
-
v1.10
- New
- If
-mode U
is run for the same target again, by default new links found will be added to thewaymore.txt
file and duplicates removed. - Added argument
-ow
/--output-overwrite
that can be passed to force thewaymore.txt
file to be overwritten with newly found links instead of being appended.
- If
- Changed
- Change the README.md to reflect new changes
- New
-
v1.9
- New
- Add functionality to continue downloading archived responses if it does not complete for any reason. When downloading archived responses, a file called
responses.tmp
will be created with the links of all responses that will be downloaded. There will also be acontinueresp.tmp
that will store the index of the current response being saved. If these files exist when run again, the user will be prompted whether to continue a previous run (so new filters will be ignored) or start a new one. - Add
CONTINUE_RESPONSES_IF_PIPED
toconfig.yml
. Ifstdin
orstdout
is piped from another process, the user is not prompted whether they want a previous run of downloading responses. This value will determine whether to continue a previous run, or start a new one, in that situation.
- Add functionality to continue downloading archived responses if it does not complete for any reason. When downloading archived responses, a file called
- Changed
- Corrected the total pages shown when getting wayback URLs
- Included missing packages in the
requirements.txt
document. - Fix Issue #16 (#16)
- New
-
v1.8
- Changed
- When archived responses are saved as files, the extension
.xnl
will no longer be used if-url-filename
is passed. If-url-filename
is not passed then the filename is represented by a hash value. The extension of these files will be set to.xnl
only of the original file type cannot be derived from the original URL.
- When archived responses are saved as files, the extension
- Changed
-
v1.7
- New
- Added
-xwm
parameter to exclude getting URL's from Wayback Machine (archive.org) - Added
-lr
/--limit-requests
that can be used to limit the number of requests made per source (excluding Common Crawl) when getting URL's. For example, if you run waymore for-i twitter.com
it says there are 28,903,799 requests to archive.org that need to be made (that could take almost 1000 days for some people!!!). The default value for the argument is 0 (Zero) which will apply no limit as before. There is also an problem with the Wayback Machine CDX API where the number of pages returned is not correct when filters are applied and can cause issues. Setting this parameter to a sensible value can relieve that issue (it may take a while for archive.org to address the problem).
- Added
- Changed
- Make sure that filters in API URL's are escaped correctly
- Add error handling to
getMemory()
to avoid any errors ifpsutil
is not installed
- New
-
v1.6
- New
- Add a docker option to run
waymore
. Include instructions inREADME.md
and a newDockerFile
- Add a docker option to run
- Changed
- If multiple domain/URLs are passed by file or STDIN, remove
*.
from the start of any input values. - Change the default
FILTER_KEYWORDS
to include more useful words. - If a link found from an API has port 80 or 443 specified, e.g.
https://exmaple.com:80/example
then remove the:80
. Many links have this in archive.org so this could reduce the number of similar links reported. - Amend
setup.py
to includeurlparse3
that is now used to get the domain and port of links found
- If multiple domain/URLs are passed by file or STDIN, remove
- New
-
v1.5
- New
- Add argument
-ko
/--keywords-only
which if passed, will only get Links (unless-f
is passed) that have a specified keyword in the URL, and will only download responses (regardless of-f
) where the keyword is in the URL. These multiple keywords can be specified inconfig.yml
in a comma separated list. - Add a
FILTER_KEYWORDS
key/pair toconfig.yml
(and default value in code) initially set toadmin,login,logon,signin,register,dashboard,portal,ftp,cpanel
- Add argument
- Changed
- Only add to the MIME type list if the
-v
option is used because they are not displayed otherwise. - Warn the user if there is a value missing from the config.yml file
- Fixed small bug in
getURLScanUrls
that raised an error forgetSPACER
- Only add to the MIME type list if the
- New
-
v1.4
- New
- Added
-m /--memory-threshold
argument to set memory threshold percentage. If the machines memory goes above the threshold, the program will be stopped and ended gracefully before running out of memory (default: 95) - If
-v
verbose output was used, memory stats will be output at the end, and also shown on teh progress bar downloading responses. - Included
psutil
insetup.py
- Added
- Changed
- Fix some display issues not completely done in v1.3, regarding trailing spaces when errors are displayed.
- Remove line
os.kill(os.getpid(),SIGINT)
fromprocessArchiveUrl
which isn't needed and just causes more error if a user does press Ctrl-C.
- New
-
v1.3
- New
- Added functionality to allow output Links output to be piped to another program (the output file will still be written). Errors and progress bar are written to STDERR. No information about archived responses will be piped.
- Added functionality to allow input to be piped to waymore. This will be the same as passing to
-i
argument.
- Changed
- Use a better way to add trailing spaces to strings to cover up other strings (like progress bar), regardless of terminal width.
- Change the README to mention
-xus
argument and how to get a URLScan API key to add to the config file.
- New
-
v1.2
- Changed
- Removed User-Agent
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1
because it caused problems on some domains inxnLinkFinder
tool, so removing from here too. - Base the length of the progress bar (show when downloading archived responses) on the width of the terminal so it displays better and you don't get multiple lines on smaller windows.
- Amend
.gitignore
to include other unwanted files
- Removed User-Agent
- Changed
-
v1.1
- New
- Allow a file of domains/URL's to be passed as input with
-i
instead of just one.
- Allow a file of domains/URL's to be passed as input with
- Changed
- Remove version numbers from
requirements.txt
as these aren't really needed and may cause some issues.
- Remove version numbers from
- New
-
v1.0
- New
- Added URLScan as a source of URL's. Waymore now has all the same sources for URLs as gau
- Added
-xus
parameter to exclude URLScan when getting URL's. - Added
-r
parameter to specify the number of times requests are retried if they return 429, 500, 502, 503 or 504 (default: 1). - Made requests use a retry strategy using
-r
value, and also abackoff_factor
of 1 for Too Many Redirect (429) responses. - General bug fixes.
- Changed
- Fixed a bug with was preventing HTTP Status Code filtering from working on Alien Vaults requests.
- Fixed a bug that was preventing MIME type filtering from working on Common Crawl requests.
- Correctly escape all characters is strings compared in regex with re.escape instead of just changing
.
to\.
- Changed default MIME type filter to include: video/webm,video/3gpp,application/font-ttf,audio/mp3,audio/x-wav,image/pjpeg,audio/basic
- Changed default URL filter to include:/jquery,/bootstrap
- If Ctrl-C is used to end the program, try to ensure that results at that point are still saved before ending.
- New
-
v0.3
- New
- Added Alien Vault OTX as a source of URL's. Results cannot be checked against MIME filters though because that info is not available in the API response.
- Added
-xav
parameter to exclude Alien Vault when getting URL's.
- Changed
- Improved regex for the
-i
input value to ensure it's a valid domain, with or without sub domains and path, but no query string or fragments. - General tidying up and improvements
- Improved regex for the
- New
-
v0.2
- New
- Added to the TODO list on README.md of changes coming soon
- Changed
- When getting URl's from archive.org it now uses pagination. Instead of one API call (how waybackurls does it), it makes one call per page of URL's (how gau does it). This actually results in a lot more URL's being returned even though the archive.org API docs seem to imply it should be the same. So in comparison to gau it now returns the same number of URL's from archive.org
- Ensure input domain/path is URL encoded when added to the API call URL's
- New
-
v0.1 - Initial release