Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipelines fail frequently #112

Open
patrickakk opened this issue Oct 22, 2024 · 11 comments
Open

Pipelines fail frequently #112

patrickakk opened this issue Oct 22, 2024 · 11 comments
Assignees
Labels
blocked Blocked by other action – Status of issue bug Something isn't working - Kind of issue
Milestone

Comments

@patrickakk
Copy link
Contributor

@dpancic
Recently a lot of pipelines fail, because of the error specified below. It doesn't work to choose "re-run failed". Usually it only works when changing a file (adding a space), Doing a commit and push. Then it works.
So it doesn't has anything to do with the code or the tests.

In the past it sometimes worked to re-run the failed jobs. Now that usually doesn't work. And it seems that the error happens more frequently.

Is there anything we can do about this?

Thanks in advance for your help.

image

@patrickakk patrickakk added the bug Something isn't working - Kind of issue label Oct 22, 2024
@patrickakk patrickakk added this to the 2024-11 milestone Oct 22, 2024
@patrickakk
Copy link
Contributor Author

Update:
It seems like everything works fine, as long as the pipelines are triggered often. Maybe at least once a week?

As soon as the were no commits and therefore no pipelines did run for approx 3 weeks or longer, this problems seems to happen. Maybe because in that case something else did change in the meantime?

Unfortunately, at this moment, I can't describe it more detailed. Does this information help?

@patrickakk
Copy link
Contributor Author

Update: The situation above does not apply. Right now it failed again, although there were other successful pipelines at the same day.

@acdh-ch
Copy link
Contributor

acdh-ch commented Dec 2, 2024

@patrickakk the problem is caused by missing image tag. It seems that sometimes the image tag is not forwarded from the build stage to deployment stage. I tried to fix it by adding 30 sec delays before deployment stage.
Please test it and, if works, add it to the test and prod branches. If problem appears again, please check first generated auto-deploy-yamls before you try with the next commit. I assume that you will not see image tag in the yaml file. Adding delay is an attempt to give more time to the runner for generating auto-deploy.yaml file with all necessary values.

@patrickakk patrickakk assigned patrickakk and unassigned dpancic Dec 2, 2024
@patrickakk patrickakk added the blocked Blocked by other action – Status of issue label Dec 2, 2024
@patrickakk patrickakk modified the milestones: 2024-11, 2024-12 Dec 2, 2024
@patrickakk
Copy link
Contributor Author

@acdh-ch
Thanks for the explanation and fix.

How should I test it? Since the problem does not always occur, but only "sometimes".

Adding this to test and prod branche, will be done "automatically" when to code from dev is merged, when working to the next release?

For now I'll move the issue to the January milestone and keep it open, as a reminder to check it.

@patrickakk patrickakk modified the milestones: 2024-12, 2025-01 Dec 5, 2024
@patrickakk
Copy link
Contributor Author

@acdh-ch
Today a few pipelines worked fine.

Than they failed again:
https://github.com/acdh-oeaw/dhcr-main/actions/runs/12226073948
https://github.com/acdh-oeaw/dhcr-main/actions/runs/12226098669

For the first one 1 checked the auto-deploy-app-values.yaml and it does contain an image, tag?

Does this provide you sufficient info?

@patrickakk patrickakk assigned acdh-ch and unassigned patrickakk Dec 8, 2024
@patrickakk patrickakk removed the blocked Blocked by other action – Status of issue label Dec 8, 2024
@acdh-ch
Copy link
Contributor

acdh-ch commented Dec 9, 2024

It seems that although auto-deploy-app-values.yaml contains an image tag, it is not always forwarded to the auto deploy application.

The next try is setting image tag to latest in autodeploy app template helper. With this change the auto deploy app should deploy image even if tag is not forwarded. The only difference will be in the image name. In cases tag is not forwarded deployed image will have tag latest.

@patrickakk patrickakk assigned patrickakk and unassigned acdh-ch Dec 9, 2024
@patrickakk patrickakk added the blocked Blocked by other action – Status of issue label Dec 9, 2024
@patrickakk
Copy link
Contributor Author

@acdh-ch

Thanks.

Both pipelines (dev and test) runned fine now. But, unfortunately that doesn't say anything, since this error only occurs "sometimes".

I'll mark the issue as "blocked" and wait until the end of January to see how it goes.

@patrickakk patrickakk assigned acdh-ch and unassigned patrickakk Dec 11, 2024
@patrickakk patrickakk removed the blocked Blocked by other action – Status of issue label Dec 11, 2024
@patrickakk
Copy link
Contributor Author

@acdh-ch
Today, same problem, see here:
https://github.com/acdh-oeaw/dhcr-main/actions/runs/12280448811

Since I needed the pipelines to work, I used the usual trick:
Add one space in a file, commit and push.

Does this provide you with sufficient info?

@acdh-ch
Copy link
Contributor

acdh-ch commented Dec 11, 2024

@patrickakk There is a new error. The image is tagged with test:%!s(float64=9.544892e+06)
It seems that when image tag (commit) starts with a number, the tag is misinterpreted as a floating-point number and not treated as a string by auto-deploy-app. That is causing an error and pipeline fails.
Tried to fix this by using ''. The tag is now changed to '$tag'.

@patrickakk
Copy link
Contributor Author

@acdh-ch

Everything worked fine today.

Thank you.

@patrickakk patrickakk added the blocked Blocked by other action – Status of issue label Dec 12, 2024
@patrickakk patrickakk assigned patrickakk and unassigned acdh-ch Dec 12, 2024
@patrickakk
Copy link
Contributor Author

Status update:
Keep issue open until end of January and see if further problems occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Blocked by other action – Status of issue bug Something isn't working - Kind of issue
Projects
None yet
Development

No branches or pull requests

3 participants