This example load a language translation model and confirm its accuracy and speed based on SQuAD task.
pip install neural-compressor
pip install -r requirements.txt
Note: Validated ONNX Runtime Version.
Supported model identifier from huggingface.co:
Model Identifier |
---|
mrm8488/spanbert-finetuned-squadv1 |
salti/bert-base-multilingual-cased-finetuned-squad |
distilbert-base-uncased-distilled-squad |
bert-large-uncased-whole-word-masking-finetuned-squad |
deepset/roberta-large-squad2 |
python prepare_model.py --input_model=mrm8488/spanbert-finetuned-squadv1 --output_model=spanbert-finetuned-squadv1.onnx # or other supported model identifier
Download SQuAD dataset from SQuAD dataset link.
Dynamic quantization:
bash run_quant.sh --input_model=/path/to/model \ # model path as *.onnx
--output_model=/path/to/model_tune
bash run_benchmark.sh --input_model=/path/to/model \ # model path as *.onnx
--batch_size=batch_size \
--mode=performance # or accuracy