Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplication of identical results #1433

Conversation

mikhailevtikhov
Copy link

Description

What will it do?

If this PR will fix an issue, please address it:
Fix #{issue}

Requirements

  • Add your name to CONTRIBUTORS.md
  • If this is a new feature, then please add some additional information about it to CHANGELOG.md

We actively use Dirsearch (this problem is present in all similar projects) a problem was identified that if the HTTP response code 200 was received in the response, dirsearch generates a detect and there is no verification that the found file was not found before. This is init PR highlights the problem for further development of this logic within this PR.

These situations can be found massively with different web server configurations:

  1. 404 will be returned for all requests to files/folders, and when request the admin* path, 200 with the same content will be returned (in this case, there will be as many FP as the words in the dictionary begin with the word admin)
  2. If a WAF trigger occurred during the scanning process and the 200 code started returning to each request. In this case, we will get the number of FP = remaining words in the wordlist.

You can reproduce this problem using the example of the following target:

python3 dirsearch.py --url=http://testphp.vulnweb.com:80/ --wordlist=/<PATH>/dirserach_dict.txt

  _|. _ _  _  _  _ _|_    v0.4.3
 (_||| _) (/_(_|| (_| )

Extensions: php, aspx, jsp, html, js | HTTP method: GET | Threads: 25 | Wordlist size: 22

Target: http://testphp.vulnweb.com/

[17:58:47] Starting:
[17:58:51] 200 -   262B - /admin/?/?/
[17:58:51] 200 -   262B - /admin/
[17:58:51] 200 -   262B - /admin/?/123
[17:58:51] 200 -   262B - /admin/?/
[17:58:51] 200 -   262B - /admin/?/test
[17:58:51] 200 -   262B - /admin/?/something
[17:58:51] 301 -   169B - /admin  ->  http://testphp.vulnweb.com/admin/

Task Completed

The following solution is proposed:
Add the check_duplicate function, which will be responsible for clustering all defects by their size. And if a new detects with the same size is found, calculate its RATIO to the content with the same size.

There are also plans to implement the functionality of saving deduplicated paths and displaying this information in reports. This functionality will allow to reveal non-standard logic for processing requests by a web server.

@shelld3v
Copy link
Collaborator

shelld3v commented Nov 7, 2024

Hi @mikhailevtikhov, the idea is good, but the code has way too many problems, like why rounding to the nearest thousand bytes? All these things shouldn't be in DynamicContentParser anyway, it's not the purpose of it. And, you are doing too much checking on responses that probably don't even need to be checked because don't forget that just because a response is proven "not wildcard" doesn't mean it will be considered a valid "result", it still goes through several validations, so if we ever do something like this, it should be done somewhere in Controller or Fuzzer. Not going to mention that storing too many responses can cause the memory problems, I'm considering checking only the hash (eventhough it might not be the best).

Thank you for your effort but I think I will close this and try to address this problem soon

@shelld3v shelld3v closed this Nov 7, 2024
@shelld3v
Copy link
Collaborator

shelld3v commented Nov 7, 2024

This functionality will allow to reveal non-standard logic for processing requests by a web server.

This is also one of the reasons I haven't added this yet, cause most of the time these "duplicates" are so useful to me

@mikhailevtikhov
Copy link
Author

mikhailevtikhov commented Nov 12, 2024

Hi @shelld3v, thank you very much for your reply and comments - make sense for me. This was meant as an init commit to highlight the problem, I am ready to work on solving those problems in my PR that you described. Would this be the right way to go, or do you want to implement this functionality yourself?

@shelld3v
Copy link
Collaborator

Hi @mikhailevtikhov, I have already submitted a PR, you can check it out: #1436

@mikhailevtikhov
Copy link
Author

@shelld3v Wow, thanks a lot! Great job! I'm going to test this on all my cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants