This document is used to list steps of reproducing TensorFlow Intel® Neural Compressor quantization and smooth quantization of language models such as OPT and GPT2.
# Install Intel® Neural Compressor
pip install neural-compressor
pip install -r requirements
python main.py --model_name_or_path <MODEL_NAME>
<MODEL_NAME>
can be following:
- gpt2-medium
- facebook/opt-125m
bash run_quant.sh --input_model=<MODEL_NAME>
Or you can use
python main.py --model_name_or_path <MODEL_NAME> --sq
bash run_benchmark.sh --input_model=<MODEL_NAME>
bash run_benchmark.sh --input_model=<MODEL_NAME> --int8=true