Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many Pretouchs will cost a large amount of time #703

Closed
lvan100 opened this issue Oct 18, 2024 · 10 comments
Closed

Too many Pretouchs will cost a large amount of time #703

lvan100 opened this issue Oct 18, 2024 · 10 comments

Comments

@lvan100
Copy link
Contributor

lvan100 commented Oct 18, 2024

In our case, a project has almost 3000 structs, and structs will more and more in the future. That will cost a large amount of time if all types are pretouched. In one project, it will cost almost 5 minutes. So what can we do to reduce this cost?

@lvan100 lvan100 changed the title Too many calls to Pretouch will cost a large amount of time Too many Pretouchs will cost a large amount of time Oct 18, 2024
@liuq19
Copy link
Collaborator

liuq19 commented Oct 18, 2024

could you try the option:

WithCompileMaxInlineDepth(2) // default is 3,

reference: https://github.com/bytedance/sonic?tab=readme-ov-file#pretouch

@lvan100
Copy link
Contributor Author

lvan100 commented Oct 18, 2024

image

We use the default options, and the option mentioned above is 1 but not 3.

@liuq19
Copy link
Collaborator

liuq19 commented Oct 18, 2024

Yeah, i recommend using WithCompileMaxInlineDepth(2)

@lvan100
Copy link
Contributor Author

lvan100 commented Oct 18, 2024

I test the cost of Pretouch of the same type on different CPU, but the cost difference is too huge.

Apple M3 54.792µs
Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz 70.892763ms

How can be this? Is't because the test is running in a docker and the cpu is oversold? I'm fall down.

@liuq19
Copy link
Collaborator

liuq19 commented Oct 18, 2024

I tested the cost of Pretouch of the same type on different CPUs, but the cost difference is too huge.

Apple M3 54.792µs Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz 70.892763ms

How can this be? Is it because the test is running in a docker and the CPU is oversold? I'm falling.

There are two problems:

  1. Is the pretouch cost in x86 and aarch64 different? The answer is that the implementation is different.

  2. whether the pretouch cost in x86 is still high, even though you try the WithCompileMaxInlineDepth(2), could you provide more details ?

@lvan100
Copy link
Contributor Author

lvan100 commented Oct 23, 2024

I set an environment variable by export SONIC_USE_OPTDEC=1 , then the pretouch run faster, 70s->16s. Does this mean that JIT cannot work on my machine?

@liuq19
Copy link
Collaborator

liuq19 commented Oct 23, 2024

yeah, you can try the non-jit implementation by the environment and pretouch is faster

In x86_64: has two impelemts:
JIT: default implement
NON-JIT: should use SONIC_USE_OPTDEC=1 enable

In aarch64: the only default implement is NON-JIT

NON-JIT implement compare to JIT
pretouch is faster, decode is faster, encode is slower

@lvan100
Copy link
Contributor Author

lvan100 commented Oct 23, 2024

I conducted some experiments and gathered data, which led me to a few conclusions:

  • The decode performance of both JIT and non-JIT is similar, while the encode performance of non-JIT is slightly worse, but still better than the standard library.

  • In terms of warm-up, non-JIT is significantly faster than JIT, especially for decoding.

Therefore, in most cases, using non-JIT is sufficient, particularly when it's uncertain whether the CPU supports JIT.

Does my conclusion seem correct?

The data was measured on Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz.

Both = SONIC_ENCODER_USE_VM=1;SONIC_USE_OPTDEC=1

  Default Both ENCODER_USE_VM=1 USE_OPTDEC=1
sonic.Pretouch 638.024441ms 13.132942ms 568.038219ms 92.612697ms
sonic.Unmarshal 1.911174528s 1.736119887s 1.896258507s 1.747521228s
  Default is JIT, and 3 times the performance of the standard library     Non-JIT, the same performance as JIT, but very faster Pretouch
sonic.Marshal 581.627122ms 1.03100576s 1.064066789s 591.692847ms
  Default is JIT, and 3 times the performance of the standard library   Non-JIT, half performance of JIT, but faster Pretouch  
stdjson.Unmarshal 6.640908859s 6.503718488s 6.551327596s 6.480074988s
stdjson.Marshal 1.397290711s 1.416261616s 1.390964037s 1.400312776s

@liuq19
Copy link
Collaborator

liuq19 commented Oct 23, 2024

yeah, the conclusion is similar to our tests.

thanks for your test data. In the future (actually not short), we may remove jit, but we need to fix the performance problem in marshal.

@lvan100
Copy link
Contributor Author

lvan100 commented Oct 23, 2024

OK, thank you for your reply, it's truly an honor to take part in Sonic's discussion. Salute to your work!

@lvan100 lvan100 closed this as completed Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants