Deduplication of identical results #1433
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
What will it do?
If this PR will fix an issue, please address it:
Fix #{issue}
Requirements
CONTRIBUTORS.md
CHANGELOG.md
We actively use Dirsearch (this problem is present in all similar projects) a problem was identified that if the HTTP response code 200 was received in the response, dirsearch generates a detect and there is no verification that the found file was not found before. This is init PR highlights the problem for further development of this logic within this PR.
These situations can be found massively with different web server configurations:
You can reproduce this problem using the example of the following target:
The following solution is proposed:
Add the check_duplicate function, which will be responsible for clustering all defects by their size. And if a new detects with the same size is found, calculate its RATIO to the content with the same size.
There are also plans to implement the functionality of saving deduplicated paths and displaying this information in reports. This functionality will allow to reveal non-standard logic for processing requests by a web server.