Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use only valid characters for benchmarking #73

Closed
wants to merge 1 commit into from

Conversation

harendra-kumar
Copy link
Member

I just hardcoded the valid chars in benchmarks and now the benchmarks seem to make some sense. However, I cannot explain why the benchmarks were 400 us when we used the full range and now they are in milliseconds. Were we erroring out or something on encountering the first invalid char?

Old results with full range including invalid chars:

All
  Unicode.Char.Case
    isLowerCase
      unicode-data:   OK (0.22s)
        416  μs ±  22 μs
    isUpperCase
      unicode-data:   OK (0.22s)
        417  μs ±  21 μs
  Unicode.Char.Case.Compat
    isLower
      base:           OK (0.18s)
        25.3 ms ± 1.6 ms
      unicode-data:   OK (0.21s)
        6.52 ms ± 350 μs, 0.26x
    isUpper
      base:           OK (0.18s)
        25.4 ms ± 1.4 ms
      unicode-data:   OK (0.20s)
        6.52 ms ± 357 μs, 0.26x
    toLower
      base:           OK (0.17s)
        24.2 ms ± 1.4 ms
      unicode-data:   OK (0.14s)
        9.06 ms ± 687 μs, 0.37x
    toTitle
      base:           OK (0.17s)
        24.6 ms ± 1.6 ms
      unicode-data:   OK (0.14s)
        9.05 ms ± 692 μs, 0.37x
    toUpper
      base:           OK (0.17s)
        24.3 ms ± 1.4 ms
      unicode-data:   OK (0.14s)
        9.43 ms ± 713 μs, 0.39x
  Unicode.Char.General
    generalCategory
      base:           OK (0.43s)
        144  ms ± 4.4 ms
      unicode-data:   OK (0.38s)
        128  ms ± 4.8 ms, 0.89x
    isAlphabetic
      unicode-data:   OK (0.22s)
        417  μs ±  21 μs
    isAlphaNum
      base:           OK (0.18s)
        25.4 ms ± 1.6 ms
      unicode-data:   OK (0.23s)
        7.37 ms ± 342 μs, 0.29x
    isControl
      base:           OK (0.18s)
        25.6 ms ± 1.4 ms
      unicode-data:   OK (2.66s)
        5.29 ms ± 515 μs, 0.21x
    isMark
      base:           OK (0.19s)
        26.7 ms ± 1.8 ms
      unicode-data:   OK (0.21s)
        6.56 ms ± 345 μs, 0.25x
    isPrint
      base:           OK (0.18s)
        25.5 ms ± 1.7 ms
      unicode-data:   OK (0.21s)
        6.54 ms ± 433 μs, 0.26x
    isPunctuation
      base:           OK (0.18s)
        25.4 ms ± 1.9 ms
      unicode-data:   OK (0.22s)
        6.99 ms ± 372 μs, 0.27x
    isSeparator
      base:           OK (0.18s)
        25.9 ms ± 1.6 ms
      unicode-data:   OK (0.20s)
        6.36 ms ± 429 μs, 0.25x
    isSymbol
      base:           OK (0.18s)
        25.4 ms ± 1.8 ms
      unicode-data:   OK (0.20s)
        6.26 ms ± 430 μs, 0.25x
    isWhiteSpace
      unicode-data:   OK (0.22s)
        416  μs ±  21 μs
    isHangul
      unicode-data:   OK (0.22s)
        417  μs ±  21 μs
    isHangulLV
      unicode-data:   OK (0.22s)
        418  μs ±  24 μs
    isJamo
      unicode-data:   OK (0.22s)
        417  μs ±  21 μs
    jamoLIndex
      unicode-data:   OK (0.22s)
        416  μs ±  21 μs
    jamoVIndex
      unicode-data:   OK (0.22s)
        417  μs ±  21 μs
    jamoTIndex
      unicode-data:   OK (0.22s)
        417  μs ±  22 μs
  Unicode.Char.General.Compat
    isAlpha
      base:           OK (0.18s)
        25.5 ms ± 1.6 ms
      unicode-data:   OK (0.16s)
        4.97 ms ± 463 μs, 0.20x
    isLetter
      base:           OK (0.19s)
        26.6 ms ± 1.4 ms
      unicode-data:   OK (0.36s)
        5.67 ms ± 227 μs, 0.21x
    isSpace
      base:           OK (0.11s)
        14.8 ms ± 1.3 ms
      unicode-data:   OK (6.08s)
        5.89 ms ±  40 μs, 0.40x
  Unicode.Char.Identifiers
    isIDContinue
      unicode-data:   OK (0.22s)
        416  μs ±  21 μs
    isIDStart
      unicode-data:   OK (0.22s)
        417  μs ±  21 μs
    isXIDContinue
      unicode-data:   OK (0.22s)
        418  μs ±  21 μs
    isXIDStart
      unicode-data:   OK (0.22s)
        417  μs ±  22 μs
    isPatternSyntax
      unicode-data:   OK (0.22s)
        417  μs ±  21 μs
    isPatternWhitespace
      unicode-data:   OK (0.22s)
        417  μs ±  22 μs
  Unicode.Char.Normalization
    isCombining
      unicode-data:   OK (0.22s)
        417  μs ±  21 μs
    combiningClass
      unicode-data:   OK (0.19s)
        2.98 ms ± 170 μs
    isCombiningStarter
      unicode-data:   OK (0.22s)
        424  μs ±  27 μs
    isDecomposable
      Canonical
        unicode-data: OK (0.22s)
          417  μs ±  22 μs
      Kompat
        unicode-data: OK (0.22s)
          417  μs ±  22 μs
    decomposeHangul
      unicode-data:   OK (0.22s)
        417  μs ±  22 μs
  Unicode.Char.Numeric
    isNumeric
      unicode-data:   OK (0.21s)
        3.27 ms ± 186 μs
    numericValue
      unicode-data:   OK (0.13s)
        4.02 ms ± 400 μs
    integerValue
      unicode-data:   OK (0.24s)
        3.70 ms ± 183 μs
  Unicode.Char.Numeric.Compat
    isNumber
      base:           OK (0.19s)
        26.4 ms ± 1.5 ms
      unicode-data:   OK (0.11s)
        7.09 ms ± 707 μs, 0.27x

New results with only valid char blocks:

All
  Unicode.Char.Case
    isLowerCase
      unicode-data:   OK (0.47s)
        1.39 ms ±  52 μs
    isUpperCase
      unicode-data:   OK (0.21s)
        1.19 ms ± 111 μs
  Unicode.Char.Case.Compat
    isLower
      base:           OK (0.26s)
        7.08 ms ± 392 μs
      unicode-data:   OK (0.17s)
        1.95 ms ± 179 μs, 0.28x
    isUpper
      base:           OK (0.25s)
        7.02 ms ± 384 μs
      unicode-data:   OK (0.23s)
        1.37 ms ±  95 μs, 0.20x
    toLower
      base:           OK (0.24s)
        6.66 ms ± 355 μs
      unicode-data:   OK (0.20s)
        2.56 ms ± 252 μs, 0.38x
    toTitle
      base:           OK (0.23s)
        6.57 ms ± 351 μs
      unicode-data:   OK (0.21s)
        2.66 ms ± 185 μs, 0.40x
    toUpper
      base:           OK (0.24s)
        6.59 ms ± 348 μs
      unicode-data:   OK (0.20s)
        2.54 ms ± 219 μs, 0.38x
  Unicode.Char.General
    generalCategory
      base:           OK (0.28s)
        38.0 ms ± 3.6 ms
      unicode-data:   OK (0.25s)
        32.5 ms ± 1.4 ms, 0.86x
    isAlphabetic
      unicode-data:   OK (0.36s)
        1.18 ms ±  56 μs
    isAlphaNum
      base:           OK (0.26s)
        7.16 ms ± 456 μs
      unicode-data:   OK (0.24s)
        1.45 ms ± 112 μs, 0.20x
    isControl
      base:           OK (0.13s)
        6.99 ms ± 689 μs
      unicode-data:   OK (0.74s)
        1.30 ms ±  29 μs, 0.19x
    isMark
      base:           OK (0.14s)
        7.56 ms ± 678 μs
      unicode-data:   OK (0.42s)
        1.39 ms ±  49 μs, 0.18x
    isPrint
      base:           OK (0.26s)
        7.08 ms ± 394 μs
      unicode-data:   OK (0.26s)
        1.62 ms ±  87 μs, 0.23x
    isPunctuation
      base:           OK (0.14s)
        7.19 ms ± 691 μs
      unicode-data:   OK (0.24s)
        1.41 ms ± 127 μs, 0.20x
    isSeparator
      base:           OK (0.27s)
        7.47 ms ± 368 μs
      unicode-data:   OK (0.23s)
        1.40 ms ±  87 μs, 0.19x
    isSymbol
      base:           OK (0.14s)
        7.22 ms ± 719 μs
      unicode-data:   OK (0.23s)
        1.42 ms ±  88 μs, 0.20x
    isWhiteSpace
      unicode-data:   OK (0.21s)
        1.22 ms ±  88 μs
    isHangul
      unicode-data:   OK (0.37s)
        1.20 ms ±  62 μs
    isHangulLV
      unicode-data:   OK (0.20s)
        1.17 ms ± 108 μs
    isJamo
      unicode-data:   OK (0.20s)
        1.19 ms ± 101 μs
    jamoLIndex
      unicode-data:   OK (0.21s)
        1.23 ms ±  87 μs
    jamoVIndex
      unicode-data:   OK (0.20s)
        1.18 ms ±  91 μs
    jamoTIndex
      unicode-data:   OK (0.20s)
        1.20 ms ± 103 μs
  Unicode.Char.General.Compat
    isAlpha
      base:           OK (0.26s)
        7.09 ms ± 360 μs
      unicode-data:   OK (0.23s)
        1.39 ms ±  87 μs, 0.20x
    isLetter
      base:           OK (0.15s)
        7.70 ms ± 711 μs
      unicode-data:   OK (0.23s)
        1.40 ms ± 106 μs, 0.18x
    isSpace
      base:           OK (0.28s)
        3.70 ms ± 242 μs
      unicode-data:   OK (0.16s)
        1.42 ms ±  90 μs, 0.38x
  Unicode.Char.Identifiers
    isIDContinue
      unicode-data:   OK (0.20s)
        1.18 ms ±  94 μs
    isIDStart
      unicode-data:   OK (0.20s)
        1.20 ms ±  89 μs
    isXIDContinue
      unicode-data:   OK (0.36s)
        1.19 ms ±  61 μs
    isXIDStart
      unicode-data:   OK (0.20s)
        1.18 ms ± 104 μs
    isPatternSyntax
      unicode-data:   OK (0.36s)
        1.18 ms ±  70 μs
    isPatternWhitespace
      unicode-data:   OK (0.37s)
        1.21 ms ±  72 μs
  Unicode.Char.Normalization
    isCombining
      unicode-data:   OK (0.21s)
        1.21 ms ± 111 μs
    combiningClass
      unicode-data:   OK (0.24s)
        1.45 ms ± 140 μs
    isCombiningStarter
      unicode-data:   OK (0.20s)
        1.18 ms ± 100 μs
    isDecomposable
      Canonical
        unicode-data: OK (0.36s)
          1.18 ms ±  50 μs
      Kompat
        unicode-data: OK (0.37s)
          1.20 ms ±  53 μs
    decomposeHangul
      unicode-data:   OK (0.21s)
        1.20 ms ±  93 μs
  Unicode.Char.Numeric
    isNumeric
      unicode-data:   OK (0.43s)
        1.48 ms ±  59 μs
    numericValue
      unicode-data:   OK (0.27s)
        1.71 ms ± 139 μs
    integerValue
      unicode-data:   OK (0.39s)
        2.64 ms ± 221 μs
  Unicode.Char.Numeric.Compat
    isNumber
      base:           OK (0.14s)
        7.64 ms ± 724 μs
      unicode-data:   OK (0.56s)
        1.95 ms ± 120 μs, 0.25x

@harendra-kumar harendra-kumar requested a review from wismill June 14, 2022 07:14
@wismill
Copy link
Collaborator

wismill commented Jun 14, 2022

However, I cannot explain why the benchmarks were 400 us when we used the full range and now they are in milliseconds. Were we erroring out or something on encountering the first invalid char?

I think it is due to how nf works. Try using range with tuples.

@wismill
Copy link
Collaborator

wismill commented Jun 15, 2022

You may want to check #75.

@harendra-kumar
Copy link
Member Author

You may want to check #75.

We can close this one. You can add relevant changes on top of #75.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants