KEDA agentpool pod failing with stopped hear from agent deployment-687dd7d466-vtmx4 #5053

vijayakrishnar · 2023-10-05T04:01:04Z

Report

we have ADO pipelines those are configured to use self-hosted agents and those running on AKS as a pod, we have used KEDA to auto scale based on no.of jobs from ADO and each job it triggers new pod or agent on AKS.

it works perfectly for small jobs where those jobs completed with in 10-15 or 30 mins, if job taking too long like more than 40 mins or 102 hrs, those jobs are failing with stopped hearing from agent without any reason.

we don't have time set limits to kill the pod, we have bigger AKS cluster with multiple nodes, still pods getting killed after 40 mins sometime 1-2 hours and so on.

Expected Behavior

it supposed to kill the agent or pod only when there is no activity on agent for more then 15 mins.

Actual Behavior

agent supposed run job fully without killing pod.

Steps to Reproduce the Problem

Setup ADO pipeline that will run job on AKS with KEDA auto scaling
KEDA will be using ADO PAT token to connect
KEDA pod should be killed if no activity for more than 15 mins

Logs from KEDA operator

##[error]We stopped hearing from agent deployment-687dd7d466-vtmx4. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610

KEDA Version

2.9.3

Kubernetes Version

1.25

Platform

Microsoft Azure

Scaler Details

Azure Pipelines

Anything else?

No response

The text was updated successfully, but these errors were encountered:

JorTurFer · 2023-10-05T08:16:10Z

Hello,
KEDA doesn't kill jobs except on ScaledJob updates if the rollout.strategy is default:

KEDA doesn't manage the az agent either, KEDA just create the jobs using the given spec, so I think that the issue is in the agent side, not in KEDA.

Do you see any log in KEDA operator pod? The log that you sent isn't from KEDA components.

stale · 2023-12-04T09:19:13Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

JorTurFer · 2023-12-04T10:28:07Z

I think that we can close this

vijayakrishnar added the bug Something isn't working label Oct 5, 2023

keda-automation added this to Roadmap - KEDA Core Oct 5, 2023

github-project-automation bot moved this to To Triage in Roadmap - KEDA Core Oct 5, 2023

JorTurFer moved this from To Triage to To Do in Roadmap - KEDA Core Oct 15, 2023

stale bot added the stale All issues that are marked as stale due to inactivity label Dec 4, 2023

JorTurFer closed this as completed Dec 4, 2023

github-project-automation bot moved this from To Do to Ready To Ship in Roadmap - KEDA Core Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEDA agentpool pod failing with stopped hear from agent deployment-687dd7d466-vtmx4 #5053

KEDA agentpool pod failing with stopped hear from agent deployment-687dd7d466-vtmx4 #5053

vijayakrishnar commented Oct 5, 2023

JorTurFer commented Oct 5, 2023 •

edited

Loading

stale bot commented Dec 4, 2023

JorTurFer commented Dec 4, 2023

KEDA agentpool pod failing with stopped hear from agent deployment-687dd7d466-vtmx4 #5053

KEDA agentpool pod failing with stopped hear from agent deployment-687dd7d466-vtmx4 #5053

Comments

vijayakrishnar commented Oct 5, 2023

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Oct 5, 2023 • edited Loading

stale bot commented Dec 4, 2023

JorTurFer commented Dec 4, 2023

JorTurFer commented Oct 5, 2023 •

edited

Loading