Too many `Pretouch`s will cost a large amount of time #703

lvan100 · 2024-10-18T06:29:19Z

In our case, a project has almost 3000 structs, and structs will more and more in the future. That will cost a large amount of time if all types are pretouched. In one project, it will cost almost 5 minutes. So what can we do to reduce this cost?

liuq19 · 2024-10-18T06:34:25Z

could you try the option:

WithCompileMaxInlineDepth(2) // default is 3,

reference: https://github.com/bytedance/sonic?tab=readme-ov-file#pretouch

lvan100 · 2024-10-18T06:37:46Z

We use the default options, and the option mentioned above is 1 but not 3.

liuq19 · 2024-10-18T06:42:03Z

Yeah, i recommend using WithCompileMaxInlineDepth(2)

lvan100 · 2024-10-18T07:40:26Z

I test the cost of Pretouch of the same type on different CPU, but the cost difference is too huge.

Apple M3 54.792µs
Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz 70.892763ms

How can be this? Is't because the test is running in a docker and the cpu is oversold? I'm fall down.

liuq19 · 2024-10-18T08:02:58Z

I tested the cost of Pretouch of the same type on different CPUs, but the cost difference is too huge.

Apple M3 54.792µs Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz 70.892763ms

How can this be? Is it because the test is running in a docker and the CPU is oversold? I'm falling.

There are two problems:

Is the pretouch cost in x86 and aarch64 different? The answer is that the implementation is different.
whether the pretouch cost in x86 is still high, even though you try the WithCompileMaxInlineDepth(2), could you provide more details ?

lvan100 · 2024-10-23T03:57:49Z

I set an environment variable by export SONIC_USE_OPTDEC=1 , then the pretouch run faster, 70s->16s. Does this mean that JIT cannot work on my machine?

liuq19 · 2024-10-23T04:03:55Z

yeah, you can try the non-jit implementation by the environment and pretouch is faster

In x86_64: has two impelemts:
JIT: default implement
NON-JIT: should use SONIC_USE_OPTDEC=1 enable

In aarch64: the only default implement is NON-JIT

NON-JIT implement compare to JIT
pretouch is faster, decode is faster, encode is slower

lvan100 · 2024-10-23T11:32:49Z

I conducted some experiments and gathered data, which led me to a few conclusions:

The decode performance of both JIT and non-JIT is similar, while the encode performance of non-JIT is slightly worse, but still better than the standard library.
In terms of warm-up, non-JIT is significantly faster than JIT, especially for decoding.

Therefore, in most cases, using non-JIT is sufficient, particularly when it's uncertain whether the CPU supports JIT.

Does my conclusion seem correct?

The data was measured on Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz.

Both = SONIC_ENCODER_USE_VM=1;SONIC_USE_OPTDEC=1

	Default	Both	ENCODER_USE_VM=1	USE_OPTDEC=1
sonic.Pretouch	638.024441ms	13.132942ms	568.038219ms	92.612697ms
sonic.Unmarshal	1.911174528s	1.736119887s	1.896258507s	1.747521228s
	Default is JIT, and 3 times the performance of the standard library			Non-JIT, the same performance as JIT, but very faster Pretouch
sonic.Marshal	581.627122ms	1.03100576s	1.064066789s	591.692847ms
	Default is JIT, and 3 times the performance of the standard library		Non-JIT, half performance of JIT, but faster Pretouch
stdjson.Unmarshal	6.640908859s	6.503718488s	6.551327596s	6.480074988s
stdjson.Marshal	1.397290711s	1.416261616s	1.390964037s	1.400312776s

liuq19 · 2024-10-23T11:49:45Z

yeah, the conclusion is similar to our tests.

thanks for your test data. In the future (actually not short), we may remove jit, but we need to fix the performance problem in marshal.

lvan100 · 2024-10-23T12:21:58Z

OK, thank you for your reply, it's truly an honor to take part in Sonic's discussion. Salute to your work!

lvan100 changed the title ~~Too many calls to Pretouch will cost a large amount of time~~ Too many Pretouchs will cost a large amount of time Oct 18, 2024

lvan100 closed this as completed Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many `Pretouch`s will cost a large amount of time #703

Too many `Pretouch`s will cost a large amount of time #703

lvan100 commented Oct 18, 2024 •

edited

Loading

liuq19 commented Oct 18, 2024 •

edited

Loading

lvan100 commented Oct 18, 2024

liuq19 commented Oct 18, 2024

lvan100 commented Oct 18, 2024

liuq19 commented Oct 18, 2024 •

edited

Loading

lvan100 commented Oct 23, 2024

liuq19 commented Oct 23, 2024 •

edited

Loading

lvan100 commented Oct 23, 2024

liuq19 commented Oct 23, 2024 •

edited

Loading

lvan100 commented Oct 23, 2024

Too many Pretouchs will cost a large amount of time #703

Too many Pretouchs will cost a large amount of time #703

Comments

lvan100 commented Oct 18, 2024 • edited Loading

liuq19 commented Oct 18, 2024 • edited Loading

lvan100 commented Oct 18, 2024

liuq19 commented Oct 18, 2024

lvan100 commented Oct 18, 2024

liuq19 commented Oct 18, 2024 • edited Loading

lvan100 commented Oct 23, 2024

liuq19 commented Oct 23, 2024 • edited Loading

lvan100 commented Oct 23, 2024

liuq19 commented Oct 23, 2024 • edited Loading

lvan100 commented Oct 23, 2024

Too many `Pretouch`s will cost a large amount of time #703

Too many `Pretouch`s will cost a large amount of time #703

lvan100 commented Oct 18, 2024 •

edited

Loading

liuq19 commented Oct 18, 2024 •

edited

Loading

liuq19 commented Oct 18, 2024 •

edited

Loading

liuq19 commented Oct 23, 2024 •

edited

Loading

liuq19 commented Oct 23, 2024 •

edited

Loading