Validate input data before training the models #342

barjin · 2024-12-12T10:30:50Z

As mentioned in #339 (and the related comments), the collected input data can contain arbitrary values (e.g. as a result of a penetration test run against the collecting server). This leads to the generation of less believable (or even potentially dangerous) fingerprints.

The input data should be validated before training the models with generator-networks-creator to ensure we only generate real fingerprints. This could be simple for some properties (e.g. Navigator.appCodeName should be always Mozilla), but may be impossible for other properties (e.g. Navigator.userAgent can be pretty much arbitrary string - sans the syntax).

Note that this blocks re-enabling the automatic updates of the models.

The text was updated successfully, but these errors were encountered:

0xARYA · 2025-01-04T04:30:44Z

Hey @barjin

I wanted to check in regards to the progress with this issue? Has anyone internally started work on it? I was looking at potential solutions regarding this-- would love to help in any way.

barjin · 2025-01-06T14:10:15Z

Hello @0xARYA and thank you for your interest in this project.

There was an open community PR adding basic validation before the model generation step, but the author decided to delete it (I can find the GitHub notifications in my email inbox, but the links are dead). We didn't get much time to look into this yet, so any expertise or ideas on how to validate separate parts of the fingerprints are definitely welcome!

Btw today, while solving an unrelated issue, I regenerated the models in the packages, manually checked those for the bad values and triggered a new release. This means there is a new version (2.1.62) of the fingerprint-suite packages with fresh models available on npm.

0xARYA · 2025-01-21T13:29:08Z

https://github.com/kkoooqq/fakebrowser/blob/586e85c0ed872513d2e0703d8c516250a8a4365b/src/core/DeviceDescriptor.ts#L239

I think this could be a good reference for a basic starting point, obviously dealing with the poisoning issue is a whole other can of worms... I cannot come to a conclusive standpoint in regard to whether the poisoning issue is a solution where you'd take the blacklist or the whitelist route...

0xARYA · 2025-01-21T13:31:08Z

I assume any sort of filtering logic would be implemented in the following function?

fingerprint-suite/packages/generator-networks-creator/src/generator-networks-creator.ts

Line 59 in b42c60a

0xARYA · 2025-01-22T00:25:02Z

I'm now trying to tackle this issue and hopefully increase quality across the board-- one really trivial step is eliminating fingerprint's with truthy webdriver.

I am currently just stuck on trying to understand the structure of the records, it seems like I can possibly reverse engineer the structure but if I could receive guidance as as I cannot currently download the dataset to inspect it myself.

0xARYA · 2025-01-22T05:22:15Z

Another thing we need to address to bring this library back up to speed is the new(-er?!) client hint headers, we're missing a sizeable amount and it causes issues with sites that do pre-response validation like amazon and google.

barjin added debt Code quality improvement or decrease of technical debt. t-tooling Issues with this label are in the ownership of the tooling team. labels Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate input data before training the models #342

Validate input data before training the models #342

barjin commented Dec 12, 2024

0xARYA commented Jan 4, 2025

barjin commented Jan 6, 2025 •

edited

Loading

0xARYA commented Jan 21, 2025

0xARYA commented Jan 21, 2025

0xARYA commented Jan 22, 2025

0xARYA commented Jan 22, 2025

Validate input data before training the models #342

Validate input data before training the models #342

Comments

barjin commented Dec 12, 2024

0xARYA commented Jan 4, 2025

barjin commented Jan 6, 2025 • edited Loading

0xARYA commented Jan 21, 2025

0xARYA commented Jan 21, 2025

0xARYA commented Jan 22, 2025

0xARYA commented Jan 22, 2025

barjin commented Jan 6, 2025 •

edited

Loading