Update deduplication fields for Trufflehog (and other scanners...) #10271

brieucR · 2024-05-27T09:02:25Z

⚠️ Is your feature request related to a problem? Please describe

I'm always frustrated with deduplication with Trufflehog parser.
Any change from Trufflehog scanner or Security Operators might impact the description field, which is used as a key for deduplication.

✔️ Describe the solution you'd like

As a Security Operator, I want to update the deduplication mechanism so that updating the description field won't impact duplicate issues.

💡 Describe alternatives you've considered

There are two main solutions, both of them consist of updating the hashcode configuration.

Solution A
We can rely on the payload field which would be filled with the Raw or RawV2 field from Trufflehog. Adding the file_path, would ensure that we capture several findings if a same secret is found across several files/repositories:

'Trufflehog Scan': ['payload', 'file_path']

Solution B ⭐
We can rely on the url field which would be filled with the link field from Trufflehog. The link value is a unique identifier: https://github.com/[organization]/[project]/blob/[commit_hash]/[file_path]#[line_number]

'Trufflehog Scan': ['url']

➕ Additional context

Migration step
payload and url are not sent to DefectDojo as a default. A migration step would require to calculate these fields for existing findings. Otherwise duplicates will be based on empty fields, causing a lot of unwanted duplicates.

Pros and Cons
Also, each solution comes with both pros and cons:

Solution A
- Pros: Most straightforward solution. Customers can easily customize how they want to consider duplicates: 1 finding / secret, 1 finding / secret / project, 1 finding / secret / file... etc
- Cons: It can be very difficult to retrieve the payload field for existing findings
Solution B ⭐
- Pros: The url field can calculated for old findings. If we consider that one project in DefectDojo = 1 repository in Github, we just need to parse the existing description field to build an url path (organization, project, commit_hash, file_path and line_number are all known)
- Cons: The url field is not exactly meant to receive this value (but rather an external source of documentation instead)

Finally, Solution B seems the most realistic. This thinking process is applicable to Trufflehog, but not only. There are several parsers calculating duplicate findings based on the `description` field which could be updated as well.

The text was updated successfully, but these errors were encountered:

brieucR · 2024-05-27T09:11:37Z

See #10118 for Solution B proposal

brieucR added the enhancement label May 27, 2024

brieucR mentioned this issue May 27, 2024

feat(trufflehog): add link field and deduplicate issues based on it #10118

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update deduplication fields for Trufflehog (and other scanners...) #10271

Update deduplication fields for Trufflehog (and other scanners...) #10271

brieucR commented May 27, 2024 •

edited

Loading

brieucR commented May 27, 2024

Update deduplication fields for Trufflehog (and other scanners...) #10271

Update deduplication fields for Trufflehog (and other scanners...) #10271

Comments

brieucR commented May 27, 2024 • edited Loading

brieucR commented May 27, 2024

brieucR commented May 27, 2024 •

edited

Loading