Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] CI Test should always terminate after 1 hour #14680

Closed
1 task done
lupyuen opened this issue Nov 7, 2024 · 8 comments · Fixed by #14849
Closed
1 task done

[FEATURE] CI Test should always terminate after 1 hour #14680

lupyuen opened this issue Nov 7, 2024 · 8 comments · Fixed by #14849
Labels
Type: Enhancement New feature or request

Comments

@lupyuen
Copy link
Member

lupyuen commented Nov 7, 2024

Is your feature request related to a problem? Please describe.

CI Test will sometimes run for 6 hours (before getting killed by GitHub):

This is not so great because:

  1. It will increase our usage of GitHub Runners. Which may overrun the GitHub Actions Budget allocated by ASF.
  2. Suppose right after CI Test there's another build. If CI Test runs for all 6 hours, then the build after CI Test will never run.
  3. We are now running our own Ubuntu PCs as a NuttX Build Farm. The PCs will hang forever until we restart the Build Jobs.

Describe the solution you'd like

CI Test should complete within 1 hour. It should gracefully terminate itself (and report an error) if the runtime exceeds 1 hour.

Describe alternatives you've considered

Right now I'm manually killing all CI Jobs that run over 3 hours. And restarting the Ubuntu PCs in our NuttX Build Farm.

Verification

  • I have verified before submitting the report.
@lupyuen lupyuen added the Type: Enhancement New feature or request label Nov 7, 2024
@simbit18
Copy link
Contributor

simbit18 commented Nov 7, 2024

@lupyuen maybe we also need to put a maximum number of minutes for a job to run.

GitHub Actions timeout
https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_idtimeout-minutes

@lupyuen
Copy link
Member Author

lupyuen commented Nov 7, 2024

@simbit18 Yep right now it quits after 6 hours: https://github.com/NuttX/nuttx/actions/runs/11714861244

@simbit18
Copy link
Contributor

simbit18 commented Nov 7, 2024

@lupyuen so we should put

jobs:
  build:
    runs-on: ubuntu-latest
    timeout-minutes: 180 #  (3 hours) Decrease this timeout value as needed

@lupyuen
Copy link
Member Author

lupyuen commented Nov 7, 2024

@simbit18 Hmmm suppose right after CI Test there's another build. If CI Test runs for all 3 hours, then the build after CI Test will never run. So actually I prefer if CI Test could terminate itself (after 1 hour) and let other builds run.

Unless we always park CI Test at the end of the job?

@simbit18
Copy link
Contributor

simbit18 commented Nov 7, 2024

@simbit18 Hmmm suppose right after CI Test there's another build. If CI Test runs for all 3 hours, then the build after CI Test will never run. So actually I prefer if CI Test could terminate itself (after 1 hour) and let other builds run.
right !!!

Describe the solution you'd like
CI Test should complete within 1 hour. It should gracefully terminate itself (and report an error) if the runtime exceeds 1 hour.

This in my opinion is the right solution

The GitHub Actions timeout is only for safety and not to fall back into the tunnel #14376

@lupyuen
Copy link
Member Author

lupyuen commented Nov 15, 2024

@lupyuen
Copy link
Member Author

lupyuen commented Nov 16, 2024

Wonder if this will work for GitHub CI? I'm testing it for macOS Build Farm:
https://github.com/lupyuen/nuttx-build-farm/blob/main/run-job-macos.sh#L131-L144

## If CI Test Hangs: Kill it after 1 hour
( sleep 3600 ; echo Killing pytest... ; pkill -f pytest )&

## Run the CI Job
./cibuild.sh -i -c -A -R testlist/$job.dat

@lupyuen
Copy link
Member Author

lupyuen commented Nov 16, 2024

Yep this kills the CI Test after 2 hours! (Assuming our jobs are not supposed to exceed 2 hours)

We changed build.yml:

cd sources/nuttx/tools/ci
if [ "X${{matrix.boards}}" = "Xcodechecker" ]; then
  ./cibuild.sh -c -A -N -R --codechecker testlist/${{matrix.boards}}.dat
else
  ## Inserted this
  ( sleep 7200 ; echo Killing pytest... ; pkill -f pytest )&
  ./cibuild.sh -c -A -N -R -S testlist/${{matrix.boards}}.dat
fi

(Build Log says "Killing pytest... Terminated" and fails correctly later)

lupyuen added a commit to lupyuen2/wip-nuttx that referenced this issue Nov 19, 2024
CI Test will sometimes run for 6 hours (before getting auto-terminated by GitHub):
- apache#14808
- apache#14680

This is a problem because:
- It will increase our usage of GitHub Runners. Which may overrun the [GitHub Actions Budget](https://infra.apache.org/github-actions-policy.html) allocated by ASF.
- Suppose right after CI Test there's another build. If CI Test runs for all 6 hours, then the build after CI Test will never run.

For this PR: We assume that Every CI Job (e.g. risc-v-05) will complete normally within 2 hours. If any CI Job exceeds 2 hours: This PR will kill the CI Test Process `pytest` and allow the next build to run.
@lupyuen lupyuen linked a pull request Nov 19, 2024 that will close this issue
xiaoxiang781216 pushed a commit that referenced this issue Nov 19, 2024
CI Test will sometimes run for 6 hours (before getting auto-terminated by GitHub):
- #14808
- #14680

This is a problem because:
- It will increase our usage of GitHub Runners. Which may overrun the [GitHub Actions Budget](https://infra.apache.org/github-actions-policy.html) allocated by ASF.
- Suppose right after CI Test there's another build. If CI Test runs for all 6 hours, then the build after CI Test will never run.

For this PR: We assume that Every CI Job (e.g. risc-v-05) will complete normally within 2 hours. If any CI Job exceeds 2 hours: This PR will kill the CI Test Process `pytest` and allow the next build to run.
JaeheeKwon pushed a commit to JaeheeKwon/nuttx that referenced this issue Nov 28, 2024
CI Test will sometimes run for 6 hours (before getting auto-terminated by GitHub):
- apache#14808
- apache#14680

This is a problem because:
- It will increase our usage of GitHub Runners. Which may overrun the [GitHub Actions Budget](https://infra.apache.org/github-actions-policy.html) allocated by ASF.
- Suppose right after CI Test there's another build. If CI Test runs for all 6 hours, then the build after CI Test will never run.

For this PR: We assume that Every CI Job (e.g. risc-v-05) will complete normally within 2 hours. If any CI Job exceeds 2 hours: This PR will kill the CI Test Process `pytest` and allow the next build to run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants