Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
benchmark.py		benchmark.py
main.py		main.py
requirements.txt		requirements.txt
run_benchmark.sh		run_benchmark.sh
run_quant.sh		run_quant.sh

README.md

Step-by-Step

This document is used to list steps of reproducing TensorFlow Intel® Neural Compressor quantization and smooth quantization of language models such as OPT and GPT2.

Prerequisite

# Install Intel® Neural Compressor
pip install neural-compressor
pip install -r requirements

Run

Basic quantization

python main.py --model_name_or_path <MODEL_NAME>

<MODEL_NAME> can be following:

gpt2-medium
facebook/opt-125m

Smooth quant

bash run_quant.sh --input_model=<MODEL_NAME>

Or you can use

python main.py --model_name_or_path <MODEL_NAME> --sq

Benchmark

Get the FP32 performance

bash run_benchmark.sh --input_model=<MODEL_NAME>

Get the INT8 performance

bash run_benchmark.sh --input_model=<MODEL_NAME> --int8=true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smoothquant

smoothquant

README.md

Step-by-Step

Prerequisite

Run

Basic quantization

Smooth quant

Benchmark

Get the FP32 performance

Get the INT8 performance

Files

smoothquant

Directory actions

More options

Directory actions

More options

Latest commit

History

smoothquant

Folders and files

parent directory

README.md

Step-by-Step

Prerequisite

Run

Basic quantization

Smooth quant

Benchmark

Get the FP32 performance

Get the INT8 performance