Skip to content

(TG) TG model perf tests #475

(TG) TG model perf tests

(TG) TG model perf tests #475

Triggered via schedule November 27, 2024 00:10
Status Failure
Total duration 5h 32m 30s
Artifacts 3
build-artifact-profiler  /  ...  /  build-docker-image
1m 10s
build-artifact-profiler / build-docker-image / build-docker-image
Matrix: build-artifact-profiler / build-artifact
Matrix: tg-model-perf-tests / tg-model-perf-tests
Fit to window
Zoom out
Zoom in

Annotations

5 errors, 9 warnings, and 11 notices
tg-model-perf-tests / TG CNN model perf tests
Process completed with exit code 1.
pcie-cards-are-being-used-cleanup
Tenstorrent cards seem to be in use. Killing PIDs and exiting unsuccessfully. This can happen if a test hung and is normally an issue with the test, rather than infra.
tg-model-perf-tests / TG LLM model perf tests
Process completed with exit code 1.
tg-model-perf-tests / TG LLM model perf tests
Process completed with exit code 1.
tg-model-perf-tests / TG LLM model perf tests
The action 'Run model perf regression tests' has timed out after 60 minutes.
tg-model-perf-tests / t3k CCL all_gather perf tests
Failed to download action 'https://api.github.com/repos/tenstorrent/tt-metal/tarball/0390e0cf2c4b29d59fcc5bd3c61ca06caa69de7e'. Error: Resource temporarily unavailable (api.github.com:443)
tg-model-perf-tests / t3k CCL all_gather perf tests
Back off 27.062 seconds before retry.
tg-model-perf-tests / t3k CCL all_gather perf tests
Failed to download action 'https://api.github.com/repos/getsentry/action-setup-venv/tarball/a133e6fd5fa6abd3f590a1c106abda344f5df69f'. Error: Resource temporarily unavailable (api.github.com:443)
tg-model-perf-tests / t3k CCL all_gather perf tests
Back off 26.216 seconds before retry.
tg-model-perf-tests / TG LLM model perf tests
Failed to restore: getCacheEntry failed: Request timeout: /kE5pH1GYM3Yhxzhzfofu0B5IIUz8dzneBRSYuWcoacd9fWqJEP/_apis/artifactcache/cache?keys=setup-venv-Linux-py-3.8.18-%2Fhome%2Fubuntu%2Factions-runner%2F_work%2F_tool%2FPython%2F3.8.18%2Fx64%2Fbin%2Fpython-6e53e915dc6cae7bc216bca21416e65c2c37d74d62bc7e916a52ccd90b584ee7-.%2Fcreate_venv.sh&version=0f2a4d78a25b8dc6a98c7870cee2871c84b54ade7e9a0c38e3b80906041e7a71
unsuccessful-reset-attempt-cleanup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-cleanup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-cleanup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-cleanup
Unsuccessful board reset, trying again in 1 minute ...
printing-out-smi-info-cleanup
Touching and printing out SMI info
printing-out-smi-info-cleanup
Touching and printing out SMI info
successful-reset-cleanup
tt-smi reset was successful
reset-successful-cleanup
tt-smi reset was successful
printing-out-smi-info-cleanup
Touching and printing out SMI info
successful-reset-cleanup
tt-smi reset was successful
reset-successful-cleanup
tt-smi reset was successful
printing-out-smi-info-cleanup
Touching and printing out SMI info
attempting-reset-cleanup
Attempting to reset card(s).
successful-reset-cleanup
tt-smi reset was successful
reset-successful-cleanup
tt-smi reset was successful

Artifacts

Produced during runtime
Name Size
TTMetal_build_wormhole_b0_profiler
306 MB
perf-report-csv--wormhole_b0--bare-metal
1.51 KB
perf-report-csv-CNN-wormhole_b0-
521 Bytes