Skip to content
This repository has been archived by the owner on Feb 10, 2021. It is now read-only.

Situations where our result doesn't match upstream #4

Closed
11 tasks done
clamburger opened this issue Jul 13, 2018 · 2 comments
Closed
11 tasks done

Situations where our result doesn't match upstream #4

clamburger opened this issue Jul 13, 2018 · 2 comments
Milestone

Comments

@clamburger
Copy link
Collaborator

clamburger commented Jul 13, 2018

Use this issue for tracking situations we need to resolve before release

  • L33tMatch::match returns 6 matches for Password1, should return 0 (fixed, single character detection wasn't working)
  • Password rockyou returns 10,718 guesses, should return 45,900 guesses (fixed, minimum guesses wasn't being applied to all matches)
  • Passwords 098765 and 09876 are detected as dvorak instead of qwerty - this results in a different number of guesses (fixed, typo in SpatialMatch caused dvorak matches to return an incorrect number of guesses)
  • For the password marie1, upstream detects a L33tMatch from the female_names dictionary, but we detect a normal DictionaryMatch from the passwords dictionary (fixed, our L33tMatch algorithm didn't match upstream properly)
  • Passwords ABC123 and PASSWORD1 are missing the 'All-uppercase is almost as easy...' suggestion (fixed, the 'all uppercase' regex was incorrect)
  • SpatialMatch casts the result of getGuesses to int, which upstream doesn't do (it's returns a float) - this has a maximum error of 1 guess, but can lead to greater effects when it's part of a set of multiple matches (fixed, now returns float)
  • Password j123456 is detected as a Bruteforce + SequenceMatch instead of Bruteforce + DictionaryMatch (fixed, matchers were in a different order than upstream which led to SequenceMatch being chosen over DictionaryMatches with equal guesses)
  • YearMatch should have a pattern type of regex to match upstream (instead of year) (fixed)
  • Multibyte characters are treated differently compared to upstream: the smiley face emoji 🙂 is treated as 1 character in JavaScript, but 3 characters in PHP (fixed, move to using the mb_ string functions)
  • When two SpatialMatches are returned (such as with !QAZ1qaz), the second match has 4 times as many guesses as it should (fixed, affected passwords where the first character was shifted)
  • Some passwords that contain multiple Bruteforce and Dictionary matches return different results. Examples: hitenmitsurugi, soldemedianoche, inthenameofgod. In all three cases we return slightly less guesses than upstream.
@clamburger clamburger added this to the 1.0 milestone Jul 13, 2018
@clamburger
Copy link
Collaborator Author

clamburger commented Jul 17, 2018

These three are unfixable:

  • Password h123456 is detected as a SequenceMatch + SequenceMatch instead of SequenceMatch + DictionaryMatch (seems to affect any password with 12345 in it)
  • The letters on or no at the start or end of a password will sometimes cause a DictionaryMatch to be returned instead of a ReverseDictionaryMatch, or vice versa
  • For certain SpatialMatches, the wrong graph will be returned (dvorak instead of qwerty, or keypad instead of mac_keypad)

Each issue is caused by the same thing: when the upstream library sorts the matches coming out of the Matcher (before sending them into the Scorer), it only checks the i and j members of the match. If both are equal, it's arbitrary as to which one is chosen - it's an implementation detail of the browser as to which sorting algorithm is used. This means that browsers experience the same issue: for some matches, Firefox will return a different sequence to Chrome.

Luckily the fallout isn't that bad. guesses and score are still guaranteed to be the same everywhere. All we can really do is add a note to the docs.

edit: more investigation revealed that the majority of browsers perform a stable sort (the main exception being Chrome when there are more than ten elements). As such, I've used a stable sort in PHP as well so that it matches most of the time.

@clamburger
Copy link
Collaborator Author

I've checked the first million rows of the rockyou password list. Haven't found any further issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant