[ Back to index ]
cm docker script --tags=build,nvidia,inference,server
cmr "run-mlperf inference _find-performance _all-scenarios" \
--model=gptj-99 --implementation=nvidia-original --device=cuda --backend=tensorrt \
--category=edge --division=open --quiet
- Use
--division=closed
to run all scenarios for the closed division. - Use
--category=datacenter
to run datacenter scenarios - Use
--model=gptj-99.9
to run the high-accuracy model - Use
--rerun
to force a rerun even when result files (from a previous run) exist
cmr "run-mlperf inference _submission _all-scenarios" --model=gptj-99 \
--device=cuda --implementation=nvidia-original --backend=tensorrt \
--execution-mode=valid \
--category=edge --division=open --quiet --skip_submission_generation=yes
- Use
--power=yes
for measuring power. It is ignored for accuracy and compliance runs - Use
--division=closed
to run all scenarios for the closed division. No compliance runs are there for gptj. --offline_target_qps
,--server_target_qps
, and--singlestream_target_latency
can be used to override the determined performance numbers- Use
--model=gptj-99.9
to run the high-accuracy model - Use
--rerun
to force a rerun even when result files (from a previous run) exist
Follow this guide to generate the submission tree and upload your results.
Check the MLCommons Task Force on Automation and Reproducibility and get in touch via public Discord server.
- CM automation for Nvidia's MLPerf inference implementation was developed by Arjun Suresh and Grigori Fursin.
- Nvidia's MLPerf inference implementation was developed by Zhihan Jiang, Ethan Cheng, Yiheng Zhang and Jinho Suh.