Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(docker): integrate cuda/no-cuda jobs into single job #5363

Merged
merged 28 commits into from
Nov 7, 2024

Conversation

youtalk
Copy link
Member

@youtalk youtalk commented Oct 23, 2024

Description

Resolved #5082

As a result of the activity autowarefoundation/autoware.universe#8695, the only packages dependent on CUDA are the sensing/perception components of autoware.universe. Therefore, the container images with the -cuda suffix are actually only needed for universe-sensing-perception-devel and universe-sensing-perception.

Additionally, since there is currently only one arm64 self-hosted runner, the no-cuda/cuda jobs in the docker-build-and-push-arm64 workflow experience double the waiting time.

This PR consolidates the no-cuda/cuda jobs and changes the process so that only the CUDA dependent packages in the sensing/perception component are built in an additional stage.

Dockerfile

Tests performed

https://github.com/autowarefoundation/autoware/actions/runs/11718481450
https://github.com/autowarefoundation/autoware/actions/runs/11718483615

Effects on system behavior

Not applicable.

Interface changes

Pre-review checklist for the PR author

The PR author must check the checkboxes below when creating the PR.

In-review checklist for the PR reviewers

The PR reviewers must check the checkboxes below before approval.

Post-review checklist for the PR author

The PR author must check the checkboxes below before merging.

  • There are no open discussions or they are tracked via tickets.

After all checkboxes are checked, anyone who has write access can merge the PR.

Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Yutaka Kondo <[email protected]>
This reverts commit 0c7459d.
@youtalk youtalk added type:containers Docker containers, containerization of components, or container orchestration. component:openadkit Issues or Features related to Open AD Kit tag:run-health-check Run health-check labels Oct 23, 2024
@youtalk youtalk self-assigned this Oct 23, 2024
Signed-off-by: Yutaka Kondo <[email protected]>
@youtalk youtalk force-pushed the upstream-refine-cuda-workflow branch from d63ab93 to 37adf25 Compare October 23, 2024 21:44
Signed-off-by: Yutaka Kondo <[email protected]>
@youtalk youtalk marked this pull request as ready for review October 23, 2024 22:10
@youtalk youtalk requested a review from oguzkaganozt as a code owner October 23, 2024 22:10
Signed-off-by: Yutaka Kondo <[email protected]>
@youtalk
Copy link
Member Author

youtalk commented Oct 28, 2024

@wenrir

Additionally, have you considered using matrix instead? In order to combine the cude and non-cuda workflows? they seem to do basically the same thing.

One of the purposes of this PR is rather to stop the matrix, because in this organization, there is currently only one arm64 self-hosted runner available, and using the matrix would require double the execution time.
https://github.com/autowarefoundation/autoware/pull/5363/files#diff-de0ada9957e6211b26d1e4d4d49c0525a9f36224a5b8c743b5ff7339130f2aabL17-L34

@wenrir
Copy link

wenrir commented Oct 28, 2024

@youtalk, thank you for the quick action and response, I understand (and agree) with the feedback.

(I'm unable to resolve my own comments, sorry)

Copy link
Contributor

@oguzkaganozt oguzkaganozt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youtalk I think in overal terms the PR solves the complicated nature of docker-build workflows and better utilizes the cache by seperating CUDA and non-CUDA builds.

But in the other hand Dockerfile would become rather complicated to track and maintain, since it is not preferrable to have multiple Dockerfiles I suggest putting some of the recurring scripts to a directory.

For example:

docker/
├── Dockerfile
├── scripts/
│   ├── install-rosdep-dependencies.sh
│   ├── build-ros-packages.sh
│   └── cleanup.sh
└── etc/
    └── ros_entrypoint.sh

I think this kind of improvement can be done in a seperate PR, and I can draft a PR for that after this PR.

And also with this PR we are not doing health-check builds with CUDA anymore as i can understand or am i missing something ? And if so would it miss some CUDA related build fails ?

@youtalk
Copy link
Member Author

youtalk commented Nov 7, 2024

@oguzkaganozt I've fix a minor cache problem c009784. I hope the CI jobs will be succeeded.

https://github.com/autowarefoundation/autoware/actions/runs/11718481450
https://github.com/autowarefoundation/autoware/actions/runs/11718483615

@mitsudome-r @xmfcx docker-build (cuda) and docker-build (no-cuda) will no longer be executed, so please remove them from the required CI checks if you approved this PR. And after this PR is merged, please add a required check for docker-build.

Screenshot 2024-11-07 at 16 31 33

@xmfcx xmfcx closed this Nov 7, 2024
@xmfcx xmfcx reopened this Nov 7, 2024
@xmfcx
Copy link
Contributor

xmfcx commented Nov 7, 2024

Removed ✔️

@oguzkaganozt oguzkaganozt self-requested a review November 7, 2024 15:12
Copy link
Contributor

@oguzkaganozt oguzkaganozt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@youtalk youtalk merged commit 9dca304 into main Nov 7, 2024
29 of 30 checks passed
@youtalk youtalk deleted the upstream-refine-cuda-workflow branch November 7, 2024 22:39
youtalk added a commit to youtalk/autoware that referenced this pull request Nov 8, 2024
* chore(docker): remove `/autoware/log` after `colcon build` (autowarefoundation#5329)

* chore(.github): always run `Show disk space` (autowarefoundation#5354)

always show disk space

Signed-off-by: Yutaka Kondo <[email protected]>

* build(autoware.repos): remove ament_cmake fork repository (autowarefoundation#5360)

Signed-off-by: Chengyong Lin <[email protected]>

* fix: remove `ndt_omp` (autowarefoundation#5390)

Removed ndt_omp

Signed-off-by: Shintaro Sakoda <[email protected]>

* fix: change the organization of `awsim_sensor_kit_launch` from RobotecAI to tier4 (autowarefoundation#5403)

Fixed organization of `awsim_sensor_kit_launch` from RobotecAI to tier4

Signed-off-by: Shintaro Sakoda <[email protected]>

* chore(.devcontainer): rename `.devcontainer` directories (autowarefoundation#5407)

rename .devcontainer dirs

Signed-off-by: Yutaka Kondo <[email protected]>

* feat(docker): integrate `cuda`/`no-cuda` jobs into single job (autowarefoundation#5363)

---------

Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Chengyong Lin <[email protected]>
Signed-off-by: Shintaro Sakoda <[email protected]>
Co-authored-by: chgyg <[email protected]>
Co-authored-by: SakodaShintaro <[email protected]>
Co-authored-by: SakodaShintaro <[email protected]>
@youtalk
Copy link
Member Author

youtalk commented Nov 8, 2024

@mitsudome-r @xmfcx docker-build (cuda) and docker-build (no-cuda) will no longer be executed, so please remove them from the required CI checks if you approved this PR. And after this PR is merged, please add a required check for docker-build.

@mitsudome-r @xmfcx Please add a required check for docker-build.

@xmfcx
Copy link
Contributor

xmfcx commented Nov 8, 2024

@youtalk -san, docker-build is a composite action, not a workflow.

Do you mean to add docker-build-and-push as a required workflow?

But that workflow only runs on:

on:
  push:
    branches:
      - main
    tags:
  workflow_dispatch:

not every PR.

What should I do then?

@xmfcx
Copy link
Contributor

xmfcx commented Nov 8, 2024

Sorry, I've just realized it was part of the health-check.
I've added it to the required checks, tested in:

@youtalk
Copy link
Member Author

youtalk commented Nov 9, 2024

@xmfcx Thank you!

youtalk added a commit to youtalk/autoware that referenced this pull request Nov 14, 2024
* chore(docker): remove `/autoware/log` after `colcon build` (autowarefoundation#5329)

* chore(.github): always run `Show disk space` (autowarefoundation#5354)

always show disk space

Signed-off-by: Yutaka Kondo <[email protected]>

* build(autoware.repos): remove ament_cmake fork repository (autowarefoundation#5360)

Signed-off-by: Chengyong Lin <[email protected]>

* fix: remove `ndt_omp` (autowarefoundation#5390)

Removed ndt_omp

Signed-off-by: Shintaro Sakoda <[email protected]>

* fix: change the organization of `awsim_sensor_kit_launch` from RobotecAI to tier4 (autowarefoundation#5403)

Fixed organization of `awsim_sensor_kit_launch` from RobotecAI to tier4

Signed-off-by: Shintaro Sakoda <[email protected]>

* chore(.devcontainer): rename `.devcontainer` directories (autowarefoundation#5407)

rename .devcontainer dirs

Signed-off-by: Yutaka Kondo <[email protected]>

* feat(docker): integrate `cuda`/`no-cuda` jobs into single job (autowarefoundation#5363)

* chore(.github): rename `bake-target` to `target-image` and add descriptions to args (autowarefoundation#5413)

* rename target-image and add descriptions

Signed-off-by: Yutaka Kondo <[email protected]>

* fix feedback

Signed-off-by: Yutaka Kondo <[email protected]>

---------

Signed-off-by: Yutaka Kondo <[email protected]>

* feat(docker): install CUDA development/runtime libraries only 1 time (autowarefoundation#5419)

* add base-cuda

Signed-off-by: Yutaka Kondo <[email protected]>

* update svg

Signed-off-by: Yutaka Kondo <[email protected]>

* not cache cuda packages

Signed-off-by: Yutaka Kondo <[email protected]>

* push base-cuda image

Signed-off-by: Yutaka Kondo <[email protected]>

* Update docker-build-and-push.yaml

* Update docker-build-and-push-arm64.yaml

* separate jobs

Signed-off-by: Yutaka Kondo <[email protected]>

---------

Signed-off-by: Yutaka Kondo <[email protected]>

* refactor(docker): replace the multiple `rosdep` commands with `resolve_rosdep_keys.sh`. (autowarefoundation#5424)

* run resolve_rosdep_keys.sh

Signed-off-by: Yutaka Kondo <[email protected]>

* style(pre-commit): autofix

* update .dockerignore

Signed-off-by: Yutaka Kondo <[email protected]>

* chmod +x

Signed-off-by: Yutaka Kondo <[email protected]>

* fix location

Signed-off-by: Yutaka Kondo <[email protected]>

* remove rosdep update

Signed-off-by: Yutaka Kondo <[email protected]>

---------

Signed-off-by: Yutaka Kondo <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* refactor(docker): replace the multiple `colcon` commands with `build_and_clean.sh`. (autowarefoundation#5425)

* run resolve_rosdep_keys.sh

Signed-off-by: Yutaka Kondo <[email protected]>

* style(pre-commit): autofix

* update .dockerignore

Signed-off-by: Yutaka Kondo <[email protected]>

* chmod +x

Signed-off-by: Yutaka Kondo <[email protected]>

* fix location

Signed-off-by: Yutaka Kondo <[email protected]>

* remove rosdep update

Signed-off-by: Yutaka Kondo <[email protected]>

* run build_and_clean.sh

Signed-off-by: Yutaka Kondo <[email protected]>

* style(pre-commit): autofix

---------

Signed-off-by: Yutaka Kondo <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* refactor(docker): replace the multiple `rm -rf` commands with `cleanup_system.sh`. (autowarefoundation#5426)

* fix(.github): fix `target-image` (autowarefoundation#5428)

fix target-image

Signed-off-by: Yutaka Kondo <[email protected]>

* chore: remove tvm_utility artifacts from Ansible tasks (autowarefoundation#5427)

Signed-off-by: Esteve Fernandez <[email protected]>

---------

Signed-off-by: Yutaka Kondo <[email protected]>
Signed-off-by: Chengyong Lin <[email protected]>
Signed-off-by: Shintaro Sakoda <[email protected]>
Signed-off-by: Esteve Fernandez <[email protected]>
Co-authored-by: chgyg <[email protected]>
Co-authored-by: SakodaShintaro <[email protected]>
Co-authored-by: SakodaShintaro <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Esteve Fernandez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:openadkit Issues or Features related to Open AD Kit tag:run-health-check Run health-check type:containers Docker containers, containerization of components, or container orchestration.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Build the CUDA packages with the sensing/perception container as the parent
4 participants