Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEDA agentpool pod failing with stopped hear from agent deployment-687dd7d466-vtmx4 #5053

Closed
vijayakrishnar opened this issue Oct 5, 2023 · 3 comments
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity

Comments

@vijayakrishnar
Copy link

Report

we have ADO pipelines those are configured to use self-hosted agents and those running on AKS as a pod, we have used KEDA to auto scale based on no.of jobs from ADO and each job it triggers new pod or agent on AKS.

it works perfectly for small jobs where those jobs completed with in 10-15 or 30 mins, if job taking too long like more than 40 mins or 102 hrs, those jobs are failing with stopped hearing from agent without any reason.

we don't have time set limits to kill the pod, we have bigger AKS cluster with multiple nodes, still pods getting killed after 40 mins sometime 1-2 hours and so on.

Expected Behavior

it supposed to kill the agent or pod only when there is no activity on agent for more then 15 mins.

Actual Behavior

agent supposed run job fully without killing pod.

Steps to Reproduce the Problem

  1. Setup ADO pipeline that will run job on AKS with KEDA auto scaling
  2. KEDA will be using ADO PAT token to connect
  3. KEDA pod should be killed if no activity for more than 15 mins

Logs from KEDA operator

##[error]We stopped hearing from agent deployment-687dd7d466-vtmx4. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610

KEDA Version

2.9.3

Kubernetes Version

1.25

Platform

Microsoft Azure

Scaler Details

Azure Pipelines

Anything else?

No response

@vijayakrishnar vijayakrishnar added the bug Something isn't working label Oct 5, 2023
@JorTurFer
Copy link
Member

JorTurFer commented Oct 5, 2023

Hello,
KEDA doesn't kill jobs except on ScaledJob updates if the rollout.strategy is default:
image

KEDA doesn't manage the az agent either, KEDA just create the jobs using the given spec, so I think that the issue is in the agent side, not in KEDA.

Do you see any log in KEDA operator pod? The log that you sent isn't from KEDA components.

@JorTurFer JorTurFer moved this from To Triage to To Do in Roadmap - KEDA Core Oct 15, 2023
Copy link

stale bot commented Dec 4, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Dec 4, 2023
@JorTurFer
Copy link
Member

I think that we can close this

@github-project-automation github-project-automation bot moved this from To Do to Ready To Ship in Roadmap - KEDA Core Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
Archived in project
Development

No branches or pull requests

2 participants