Self-hosted runner (nightly-past-ci-caller) #148
self-nightly-past-ci-caller.yml
on: schedule
Build Nightly CI Docker Images
/
Nightly PyTorch + Stable TensorFlow
48m 2s
Build Nightly CI Docker Images
/
Nightly PyTorch + DeepSpeed
8m 18s
Matrix: Nightly CI / Setup
Waiting for pending jobs
Matrix: Nightly CI / Torch CUDA extension tests
Waiting for pending jobs
Matrix: Nightly CI / Model tests
Waiting for pending jobs
Matrix: Nightly CI / Model tests
Waiting for pending jobs
Matrix: PyTorch 1.13 / Setup
Matrix: PyTorch 1.13 / Torch CUDA extension tests
Matrix: PyTorch 1.13 / Model tests
Waiting for pending jobs
Matrix: PyTorch 1.13 / Model tests
Waiting for pending jobs
Matrix: PyTorch 1.12 / Setup
Matrix: PyTorch 1.12 / Torch CUDA extension tests
Matrix: PyTorch 1.12 / Model tests
Waiting for pending jobs
Matrix: PyTorch 1.12 / Model tests
Waiting for pending jobs
Matrix: PyTorch 1.11 / Setup
Matrix: PyTorch 1.11 / Torch CUDA extension tests
Matrix: PyTorch 1.11 / Model tests
Matrix: PyTorch 1.11 / Model tests
Matrix: TensorFlow 2.11 / Setup
Waiting for pending jobs
Matrix: TensorFlow 2.11 / Torch CUDA extension tests
Waiting for pending jobs
Matrix: TensorFlow 2.11 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.11 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.10 / Setup
Waiting for pending jobs
Matrix: TensorFlow 2.10 / Torch CUDA extension tests
Waiting for pending jobs
Matrix: TensorFlow 2.10 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.10 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.9 / Setup
Waiting for pending jobs
Matrix: TensorFlow 2.9 / Torch CUDA extension tests
Waiting for pending jobs
Matrix: TensorFlow 2.9 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.9 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.8 / Setup
Waiting for pending jobs
Matrix: TensorFlow 2.8 / Torch CUDA extension tests
Waiting for pending jobs
Matrix: TensorFlow 2.8 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.8 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.7 / Setup
Waiting for pending jobs
Matrix: TensorFlow 2.7 / Torch CUDA extension tests
Waiting for pending jobs
Matrix: TensorFlow 2.7 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.7 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.6 / Setup
Waiting for pending jobs
Matrix: TensorFlow 2.6 / Torch CUDA extension tests
Waiting for pending jobs
Matrix: TensorFlow 2.6 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.6 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.5 / Setup
Waiting for pending jobs
Matrix: TensorFlow 2.5 / Torch CUDA extension tests
Waiting for pending jobs
Matrix: TensorFlow 2.5 / Model tests
Waiting for pending jobs
Matrix: TensorFlow 2.5 / Model tests
Waiting for pending jobs
TensorFlow 2.5
/
Send results to webhook
Annotations
13 errors and 15 warnings
Build Nightly CI Docker Images / Nightly PyTorch + DeepSpeed
System.IO.IOException: No space left on device : '/home/runner/runners/2.314.1/_diag/Worker_20240331-022643-utc.log'
at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.Diagnostics.TextWriterTraceListener.Flush()
at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.314.1/_diag/Worker_20240331-022643-utc.log'
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.314.1/_diag/Worker_20240331-022643-utc.log'
at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.Diagnostics.TextWriterTraceListener.Flush()
at System.Diagnostics.TraceSource.Flush()
at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
at GitHub.Runner.Common.TraceManager.Dispose()
at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
at GitHub.Runner.Common.HostContext.Dispose()
at GitHub.Runner.Worker.Program.Main(String[] args)
at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.Diagnostics.TextWriterTraceListener.Flush()
at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
at GitHub.Runner.Common.Tracing.Error(Exception exception)
at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
|
Build Nightly CI Docker Images / Nightly PyTorch + Stable TensorFlow
The hosted runner: GitHub Actions 6 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
|
PyTorch 1.13 / Torch CUDA extension tests (single-gpu)
Process completed with exit code 1.
|
PyTorch 1.13 / Torch CUDA extension tests (multi-gpu)
Process completed with exit code 1.
|
PyTorch 1.12 / Torch CUDA extension tests (single-gpu)
Process completed with exit code 1.
|
PyTorch 1.12 / Torch CUDA extension tests (multi-gpu)
Process completed with exit code 1.
|
PyTorch 1.11 / Setup (multi-gpu)
Docker pull failed with exit code 1
|
PyTorch 1.11 / Setup (single-gpu)
The job was canceled because "multi-gpu" failed.
|
PyTorch 1.11 / Setup (single-gpu)
The operation was canceled.
|
Self-hosted runner (nightly-past-ci-caller)
Strategy expansion exceeded 256 results for job 'run_past_ci_pytorch_1-13.run_tests_single_gpu'
|
Self-hosted runner (nightly-past-ci-caller)
Strategy expansion exceeded 256 results for job 'run_past_ci_pytorch_1-12.run_tests_multi_gpu'
|
Self-hosted runner (nightly-past-ci-caller)
Strategy expansion exceeded 256 results for job 'run_past_ci_pytorch_1-13.run_tests_multi_gpu'
|
Self-hosted runner (nightly-past-ci-caller)
Strategy expansion exceeded 256 results for job 'run_past_ci_pytorch_1-12.run_tests_single_gpu'
|
Build Nightly CI Docker Images / Nightly PyTorch + Stable TensorFlow
You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 95 MB
|
PyTorch 1.13 / Torch CUDA extension tests (single-gpu)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/upload-artifact@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
PyTorch 1.13 / Torch CUDA extension tests (multi-gpu)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/upload-artifact@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
PyTorch 1.13 / Send results to webhook
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/download-artifact@v3, actions/upload-artifact@v3, geekyeggo/delete-artifact@v2. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
PyTorch 1.13 / Send results to webhook
No files were found with the provided path: test_failure_tables. No artifacts will be uploaded.
|
PyTorch 1.12 / Setup (multi-gpu)
Docker pull failed with exit code 1, back off 2.088 seconds before retry.
|
PyTorch 1.12 / Torch CUDA extension tests (single-gpu)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/upload-artifact@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
PyTorch 1.12 / Torch CUDA extension tests (multi-gpu)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/upload-artifact@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
PyTorch 1.12 / Send results to webhook
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/download-artifact@v3, actions/upload-artifact@v3, geekyeggo/delete-artifact@v2. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
PyTorch 1.12 / Send results to webhook
No files were found with the provided path: test_failure_tables. No artifacts will be uploaded.
|
PyTorch 1.11 / Setup (multi-gpu)
You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 12 MB
|
PyTorch 1.11 / Setup (multi-gpu)
Docker pull failed with exit code 1, back off 8.736 seconds before retry.
|
PyTorch 1.11 / Setup (multi-gpu)
Docker pull failed with exit code 1, back off 1.697 seconds before retry.
|
PyTorch 1.11 / Send results to webhook
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/download-artifact@v3, actions/upload-artifact@v3, geekyeggo/delete-artifact@v2. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
PyTorch 1.11 / Send results to webhook
No files were found with the provided path: test_failure_tables. No artifacts will be uploaded.
|