Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update deduplication fields for Trufflehog (and other scanners...) #10271

Open
brieucR opened this issue May 27, 2024 · 1 comment
Open

Update deduplication fields for Trufflehog (and other scanners...) #10271

brieucR opened this issue May 27, 2024 · 1 comment

Comments

@brieucR
Copy link

brieucR commented May 27, 2024

⚠️ Is your feature request related to a problem? Please describe

I'm always frustrated with deduplication with Trufflehog parser.
Any change from Trufflehog scanner or Security Operators might impact the description field, which is used as a key for deduplication.

✔️ Describe the solution you'd like

As a Security Operator, I want to update the deduplication mechanism so that updating the description field won't impact duplicate issues.

💡 Describe alternatives you've considered

There are two main solutions, both of them consist of updating the hashcode configuration.

Solution A
We can rely on the payload field which would be filled with the Raw or RawV2 field from Trufflehog. Adding the file_path, would ensure that we capture several findings if a same secret is found across several files/repositories:

'Trufflehog Scan': ['payload', 'file_path']

Solution B
We can rely on the url field which would be filled with the link field from Trufflehog. The link value is a unique identifier: https://github.com/[organization]/[project]/blob/[commit_hash]/[file_path]#[line_number]

'Trufflehog Scan': ['url']


Additional context

Migration step
payload and url are not sent to DefectDojo as a default. A migration step would require to calculate these fields for existing findings. Otherwise duplicates will be based on empty fields, causing a lot of unwanted duplicates.

Pros and Cons
Also, each solution comes with both pros and cons:

  • Solution A
    • Pros: Most straightforward solution. Customers can easily customize how they want to consider duplicates: 1 finding / secret, 1 finding / secret / project, 1 finding / secret / file... etc
    • Cons: It can be very difficult to retrieve the payload field for existing findings
  • Solution B
    • Pros: The url field can calculated for old findings. If we consider that one project in DefectDojo = 1 repository in Github, we just need to parse the existing description field to build an url path (organization, project, commit_hash, file_path and line_number are all known)
    • Cons: The url field is not exactly meant to receive this value (but rather an external source of documentation instead)

Finally, Solution B seems the most realistic. This thinking process is applicable to Trufflehog, but not only. There are several parsers calculating duplicate findings based on the `description` field which could be updated as well.
@brieucR
Copy link
Author

brieucR commented May 27, 2024

See #10118 for Solution B proposal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant