Model execution is single threaded? #1663

akhauriyash · 2024-03-12T00:10:00Z

Hello,

I am trying to run the following script:
https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm

I use the script below:

OMP_NUM_THREADS=32 python run_clm_no_trainer.py     --model facebook/opt-1.3b    
 --quantize     --sq     --alpha 0.5     --ipex     --output_dir "saved_results"     --int8_bf16_mixed

However, on htop I see that only a single thread is being used. Even if I set torch.set_num_threads(32). It is extremely slow, making smoothquant unusable in my case.

I have a system with Intel® Xeon® Gold 5218 Processor.

Am I missing something? Thanks!

The text was updated successfully, but these errors were encountered:

violetch24 · 2024-03-14T08:50:45Z

Hi @akhauriyash , I was not able to reproduce this issue on several machines yet. Could you please share your enviroment where the issue occurs using pip list?

chensuyue assigned violetch24 Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model execution is single threaded? #1663

Model execution is single threaded? #1663

akhauriyash commented Mar 12, 2024

violetch24 commented Mar 14, 2024

Model execution is single threaded? #1663

Model execution is single threaded? #1663

Comments

akhauriyash commented Mar 12, 2024

violetch24 commented Mar 14, 2024