yamux: keepalive failed: i/o deadline reached #11509

pgrmega · 2022-01-20T15:52:54Z

Hello,

Overview of the Issue

We use packer to automatically build HyperV VMs.
The scripts are directly executed on the HyperV server heberged in our own local infrastructure.
We build several VMs in parallel (up to 7).

Sometime we encounter an error "yamux: keepalive failed: i/o deadline reached".
We cannot determine a link between this error and any element (CPU, memory, network, disk, etc...).

We looked for any known issue looking like what we experiment and find #3378, #5360, #6410, #6475
But it didn't helped us => we are not running in Azure but on a Windows Server 2016.

Reproduction Steps

No specific element identified to reproduce the issue.
We lowered the number of VMs in parallel.
We tried to check if reboot of the server improve the behavior.
But no way to find a determining element.

Packer version

From packer version 1.7.8

Operating system and Environment details

Windows Server 2016
Intel Xeon CPU X5650 @ 2.67GHz (2procs)
RAM : 192 GB

Log Fragments and crash.log files

2022/01/19 23:01:22 [ERR] yamux: keepalive failed: i/o deadline reached
2022/01/19 23:02:07 [INFO] (telemetry) ending powershell
2022/01/19 23:02:22 [ERR] yamux: keepalive failed: i/o deadline reached
2022/01/19 23:02:22 [INFO] (telemetry) ending hyperv-vmcx
2022/01/19 23:02:22 [INFO] 537 bytes written for 'stderr'
2022/01/19 23:02:22 [INFO] 20095 bytes written for 'stdout'
2022/01/19 23:02:22 [ERR] Error decoding response stream 6: EOF
2022/01/19 23:02:22 [INFO] RPC endpoint: Communicator ended with: 123
2022/01/19 23:03:12 packer.exe plugin: [INFO] 537 bytes written for 'stderr'
2022/01/19 23:03:12 packer.exe plugin: [INFO] 20095 bytes written for 'stdout'
2022/01/19 23:03:12 packer.exe plugin: [ERR] Error decoding response stream 5: EOF
2022/01/19 23:03:12 packer.exe plugin: c:/Windows/Temp/script-61e85116-8b32-7360-3555-f83ea4f243bc.ps1 returned with exit code 123
2022/01/19 23:03:12 packer.exe plugin: waiting for all plugin processes to complete...

Thank you for any help

The text was updated successfully, but these errors were encountered:

nywilken · 2022-01-26T21:30:33Z

Thanks for reaching out. We’ll take a look to see what might be causing this. Thanks for linking the other issues in the description.

azr · 2022-02-15T10:21:08Z

Hey @pgrmega, do you have more info for us to be able to repro this ? Like a simplified packer build config ? And, maybe the system + shell this is running on ?

We will also need the full log.

Thanks !

pgrmega · 2022-03-01T09:53:26Z

Hello,

We migrated the server on wich it was running from Windows Server 2016 to Windows Server 2019.
And the problem does not occurs since this migration.

This was running on Powershell 5.2 on Windows Server 2016.

JSON Config Files :

  "variables": {
	  ...
	},
  "builders": [
    {
	"type": "hyperv-vmcx",
	"clone_from_vmcx_path": "{{user `PackerVMCXPath`}}",
	"vm_name": "{{user `VMName`}}",
	"temp_path": "{{user `PackerTempPath`}}",
	"output_directory": "{{user `PackerExportPath`}}",
	"boot_wait": "5m",
	"headless": true,

	"generation": "{{user `VMGeneration`}}",
	"memory": 12288,
	"enable_dynamic_memory": true,
	"cpus": 6,
	"switch_name": "Microsoft Network Adapter Multiplexor Driver - Virtual Switch",
		
	"communicator": "winrm",
	"winrm_use_ssl": false, 
	"winrm_insecure": true,
	"winrm_port": 5985,
	"winrm_timeout": "1h",
	"winrm_username": "{{user `VMAdminUser`}}",
    "winrm_password": "{{user `VMAdminPwd`}}",
    
    "shutdown_command": "IF EXIST C:\\exploitation\\run-sysprep.cmd (CALL C:\\exploitation\\run-sysprep.cmd)",
    "shutdown_timeout": "30m"
    
    }
  ],
  "provisioners": [
    ...
  ]
}

Hi-Angel · 2023-12-01T12:03:50Z

I'm not even sure where this error is coming from. I cloned both the packer repo and the plugin I use, and neither has deadline reached text in their code 🤷‍♂️

rsergeevbgc · 2024-06-17T16:49:23Z

We seem to experience the same issue on the following Nomad version run on M3 macbook pro.

nomad -v
Nomad v1.7.7
BuildDate 2024-04-16T19:26:43Z
Revision 0f34c85ee63f6472bd2db1e2487611f4b176c70c

This happens when we run nomad in a dev mode to construct a local cluster for testing. No other jobs run on a cluster. The This is a portion of an error:

2024-06-17T11:51:57.163-0400 [DEBUG] worker: ack evaluation: worker_id=6b2571cd-ad0e-4bbe-eae8-8a1cf854b5d1 eval_id=550ddcd7-83f9-fc31-b812-cd91929c61b0 type=_core namespace=- job_id=deployment-gc node_id="" triggered_by=scheduled
2024-06-17T11:51:57.163-0400 [DEBUG] core.sched: node GC scanning before cutoff index: index=0 node_gc_threshold=24h0m0s
2024-06-17T11:51:57.163-0400 [DEBUG] worker: ack evaluation: worker_id=a024c56d-604e-12af-60dc-b3f050845abd eval_id=48563a36-3321-9500-dbe1-6a23c0906474 type=_core namespace=- job_id=job-gc node_id="" triggered_by=scheduled
2024-06-17T11:51:57.164-0400 [DEBUG] worker: ack evaluation: worker_id=1c06a282-c128-3f0e-3d3b-1dd79d22c8a9 eval_id=2df3a1de-a8f2-e7a2-a570-2deadf6815fd type=_core namespace=- job_id=node-gc node_id="" triggered_by=scheduled
2024-06-17T11:51:57.290-0400 [WARN]  nomad.heartbeat: node TTL expired: node_id=68db21fe-a70b-ae80-16d4-1a4465059846
2024-06-17T11:52:09.858-0400 [ERROR] client: yamux: keepalive failed: i/o deadline reached
2024-06-17T11:52:09.858-0400 [ERROR] nomad.rpc: yamux: keepalive failed: i/o deadline reached
2024-06-17T11:52:09.858-0400 [ERROR] client.rpc: error performing RPC to server: error="rpc error: EOF" rpc=Node.GetClientAllocs server=XX.XXX.XX.XXX:PPPP
2024-06-17T11:52:09.859-0400 [ERROR] client.rpc: error performing RPC to server: error="rpc error: EOF" rpc=Node.UpdateStatus server=XX.XXX.XX.XXX:PPPP
2024-06-17T11:52:09.859-0400 [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: EOF" rpc=Node.UpdateStatus server=XX.XXX.XX.XXX:PPPP
2024-06-17T11:52:09.859-0400 [ERROR] client: error heartbeating. retrying: error="failed to update status: rpc error: EOF" period=1s
2024-06-17T11:52:09.858-0400 [ERROR] nomad.rpc: multiplex_v2 conn accept failed: error="keepalive timeout"
2024-06-17T11:52:09.865-0400 [DEBUG] client.consul: bootstrap contacting Consul DCs: consul_dcs=["local"]
2024-06-17T11:52:09.869-0400 [INFO]  client.consul: discovered following servers: servers=[XX.XXX.XX.XXX:PPPP]
2024-06-17T11:52:10.862-0400 [DEBUG] client: state updated: node_status=ready
2024-06-17T11:52:10.862-0400 [WARN]  client: missed heartbeat: req_latency=1.723458ms heartbeat_ttl=17.502723309s since_last_heartbeat=41.074529s
2024-06-17T11:52:20.394-0400 [DEBUG] http: request complete: method=GET path=/v1/namespaces?index=1 duration=5m7.06801275s
2024-06-17T11:52:36.185-0400 [DEBUG] http: request complete: method=GET path="/v1/jobs?meta=true&namespace=%2A&index=1" duration=5m16.53492425s
2024-06-17T11:56:36.761-0400 [WARN]  nomad.heartbeat: node TTL expired: node_id=68db21fe-a70b-ae80-16d4-1a4465059846
2024-06-17T11:56:49.877-0400 [ERROR] client: yamux: keepalive failed: i/o deadline reached
2024-06-17T11:56:49.877-0400 [ERROR] nomad.rpc: yamux: keepalive failed: i/o deadline reached
2024-06-17T11:56:49.877-0400 [ERROR] nomad.rpc: multiplex_v2 conn accept failed: error="keepalive timeout"
2024-06-17T11:56:49.878-0400 [ERROR] client.rpc: error performing RPC to server: error="rpc error: EOF" rpc=Node.GetClientAllocs server=XX.XXX.XX.XXX:PPPP
2024-06-17T11:56:49.878-0400 [ERROR] client.rpc: error performing RPC to server: error="rpc error: EOF" rpc=Node.UpdateStatus server=XX.XXX.XX.XXX:PPPP
2024-06-17T11:56:49.878-0400 [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: EOF" rpc=Node.UpdateStatus server=XX.XXX.XX.XXX:PPPP
2024-06-17T11:56:49.878-0400 [ERROR] client: error heartbeating. retrying: error="failed to update status: rpc error: EOF" period=1s
2024-06-17T11:56:49.882-0400 [DEBUG] client.consul: bootstrap contacting Consul DCs: consul_dcs=["local"]
2024-06-17T11:56:49.887-0400 [INFO]  client.consul: discovered following servers: servers=[XX.XXX.XX.XXX:PPPP]
2024-06-17T11:56:50.880-0400 [DEBUG] client: state updated: node_status=ready

This works normal when run on our CentOS servers.

pgrmega added the bug label Jan 20, 2022

nywilken added the stage/thinking Flagged for internal discussions about possible enhancements label Jan 26, 2022

azr added the stage/waiting-reply label Feb 15, 2022

github-actions bot removed the stage/waiting-reply label Mar 1, 2022

azr added the stage/waiting-on-upstream This issue is waiting on an upstream change label Mar 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yamux: keepalive failed: i/o deadline reached #11509

yamux: keepalive failed: i/o deadline reached #11509

pgrmega commented Jan 20, 2022

nywilken commented Jan 26, 2022

azr commented Feb 15, 2022 •

edited

Loading

pgrmega commented Mar 1, 2022 •

edited

Loading

Hi-Angel commented Dec 1, 2023

rsergeevbgc commented Jun 17, 2024 •

edited

Loading

yamux: keepalive failed: i/o deadline reached #11509

yamux: keepalive failed: i/o deadline reached #11509

Comments

pgrmega commented Jan 20, 2022

Overview of the Issue

Reproduction Steps

Packer version

Operating system and Environment details

Log Fragments and crash.log files

nywilken commented Jan 26, 2022

azr commented Feb 15, 2022 • edited Loading

pgrmega commented Mar 1, 2022 • edited Loading

Hi-Angel commented Dec 1, 2023

rsergeevbgc commented Jun 17, 2024 • edited Loading

azr commented Feb 15, 2022 •

edited

Loading

pgrmega commented Mar 1, 2022 •

edited

Loading

rsergeevbgc commented Jun 17, 2024 •

edited

Loading