Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not retrieve the pipelines after upgrading Kubeflow from 1.8 to .19 #584

Open
eleblebici opened this issue Nov 14, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@eleblebici
Copy link

eleblebici commented Nov 14, 2024

Bug Description

After upgrading the Kubeflow from 1.8 to 1.9, the pipelines on the UI are not retrievable. It is giving the following error in UI:

An error occurred
upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: 111

We have the following log in the "istio-ingressgateway-workload":

2024-11-08T07:10:45.273942613Z [2024-11-08T07:10:44.383Z] "GET /pipeline/apis/v2beta1/pipelines?page_token=&page_size=10&sort_by=created_at%20desc&filter= HTTP/1.1" 503 URX via_upstream - "-" 0 152 61 58 "X.Y.0.149,X.Z.103.216" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:131.0) Gecko/20100101 Firefox/131.0" "43f03c5c-8bb4-421d-81c3-d2b4d91f190c" "test.com" "X.A.178.97:3000" outbound|3000||kfp-ui.kubeflow.svc.cluster.local X.A.178.87:34498 X.A.178.87:8080 X.A.103.216:59664 - -

And the following logs in the "ml-pipeline-ui" container:

2024-11-14T09:27:19.072Z [ml-pipeline-ui] (node:14) UnhandledPromiseRejectionWarning: FetchError: request to http://metadata/computeMetadata/v1/project/project-id failed, reason: getaddrinfo ENOTFOUND metadata
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at ClientRequest.<anonymous> (/server/node_modules/node-fetch/lib/index.js:1491:11)
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at ClientRequest.emit (events.js:400:28)
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at Socket.socketErrorListener (_http_client.js:475:9)
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at Socket.emit (events.js:400:28)
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at emitErrorNT (internal/streams/destroy.js:106:8)
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at emitErrorCloseNT (internal/streams/destroy.js:74:3)
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at processTicksAndRejections (internal/process/task_queues.js:82:21)
2024-11-14T09:27:19.072Z [ml-pipeline-ui] (node:14) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 10)

It seems similar to this one: kubeflow/pipelines#11247

We tried setting the environment DISABLE_GKE_METADATA for the ml-pipeline-ui container and re-applied the statefulset. But it is giving the same error though the environment seems to be added.

We think that it is because of pebble overwrites it: https://github.com/canonical/kfp-operators/blob/main/charms/kfp-ui/src/components/pebble_components.py#L61

To Reproduce

I could not reproduce that after upgrading from 1.8 to 1.9.

Environment

Charmed Kubeflow 1.9
Juju 3.4.5

Relevant Log Output

2024-11-14T09:27:19.072Z [ml-pipeline-ui] (node:14) UnhandledPromiseRejectionWarning: FetchError: request to http://metadata/computeMetadata/v1/project/project-id failed, reason: getaddrinfo ENOTFOUND metadata
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at ClientRequest.<anonymous> (/server/node_modules/node-fetch/lib/index.js:1491:11)
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at ClientRequest.emit (events.js:400:28)
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at Socket.socketErrorListener (_http_client.js:475:9)
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at Socket.emit (events.js:400:28)
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at emitErrorNT (internal/streams/destroy.js:106:8)
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at emitErrorCloseNT (internal/streams/destroy.js:74:3)
2024-11-14T09:27:19.072Z [ml-pipeline-ui]     at processTicksAndRejections (internal/process/task_queues.js:82:21)
2024-11-14T09:27:19.072Z [ml-pipeline-ui] (node:14) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 10)

Additional Context

No response

@eleblebici eleblebici added the bug Something isn't working label Nov 14, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6551.

This message was autogenerated

@kimwnasptd
Copy link
Contributor

thanks for the issue @eleblebici!

We'll cover this issue as also part of #582

@eleblebici
Copy link
Author

thank you @kimwnasptd

We are also observing "connection refuses" within the logs of "apiserver" container of kfp-api pod:

2024-11-18T09:25:00.356074040Z 2024-11-18T09:25:00.355Z [pebble] Check "kfp-api-up" failure 1 (threshold 3): Get "http://localhost:8888/apis/v1beta1/healthz": dial tcp [::1]:8888: connect: connection refused
2024-11-18T09:25:15.240695126Z 2024-11-18T09:25:15.240Z [pebble] GET /v1/notices?timeout=30s 30.000203043s 200
2024-11-18T09:25:38.144743678Z 2024-11-18T09:25:38.144Z [pebble] GET /v1/plan?format=yaml 352.29µs 200
2024-11-18T09:25:38.147001963Z 2024-11-18T09:25:38.146Z [pebble] POST /v1/layers 449.603µs 200
2024-11-18T09:25:38.161284312Z 2024-11-18T09:25:38.161Z [pebble] POST /v1/services 6.767914ms 202
2024-11-18T09:25:38.179053207Z 2024-11-18T09:25:38.178Z [pebble] GET /v1/changes/1130/wait?timeout=4.000s 16.999503ms 200
2024-11-18T09:25:38.301617525Z 2024-11-18T09:25:38.301Z [pebble] GET /v1/checks?names=kfp-api-up 74.329µs 200
2024-11-18T09:25:45.241515279Z 2024-11-18T09:25:45.241Z [pebble] GET /v1/notices?timeout=30s 30.0003448s 200
2024-11-18T09:26:15.242530092Z 2024-11-18T09:26:15.242Z [pebble] GET /v1/notices?timeout=30s 30.000440524s 200
2024-11-18T09:26:45.243597346Z 2024-11-18T09:26:45.243Z [pebble] GET /v1/notices?timeout=30s 30.000517511s 200
2024-11-18T09:27:15.244901542Z 2024-11-18T09:27:15.244Z [pebble] GET /v1/notices?timeout=30s 30.000961745s 200
2024-11-18T09:27:45.246629417Z 2024-11-18T09:27:45.246Z [pebble] GET /v1/notices?timeout=30s 30.001126268s 200
2024-11-18T09:28:15.247376020Z 2024-11-18T09:28:15.247Z [pebble] GET /v1/notices?timeout=30s 30.000312421s 200
2024-11-18T09:28:45.248478808Z 2024-11-18T09:28:45.248Z [pebble] GET /v1/notices?timeout=30s 30.000430172s 200
2024-11-18T09:29:15.249946977Z 2024-11-18T09:29:15.249Z [pebble] GET /v1/notices?timeout=30s 30.001073468s 200
2024-11-18T09:29:45.251188958Z 2024-11-18T09:29:45.250Z [pebble] GET /v1/notices?timeout=30s 30.000593107s 200
2024-11-18T09:29:56.725191419Z 2024-11-18T09:29:56.725Z [pebble] GET /v1/plan?format=yaml 467.559µs 200
2024-11-18T09:29:56.727727979Z 2024-11-18T09:29:56.727Z [pebble] POST /v1/layers 764.095µs 200
2024-11-18T09:29:56.743053371Z 2024-11-18T09:29:56.742Z [pebble] POST /v1/services 7.757464ms 202
2024-11-18T09:29:56.762801129Z 2024-11-18T09:29:56.762Z [pebble] GET /v1/changes/1131/wait?timeout=4.000s 18.692434ms 200
2024-11-18T09:29:56.881044630Z 2024-11-18T09:29:56.880Z [pebble] GET /v1/checks?names=kfp-api-up 62.061µs 200
2024-11-18T09:30:15.252629072Z 2024-11-18T09:30:15.252Z [pebble] GET /v1/notices?timeout=30s 30.000955823s 200
2024-11-18T09:30:45.253894521Z 2024-11-18T09:30:45.253Z [pebble] GET /v1/notices?timeout=30s 30.000901806s 200
2024-11-18T09:31:15.254995847Z 2024-11-18T09:31:15.254Z [pebble] GET /v1/notices?timeout=30s 30.000644512s 200
2024-11-18T09:31:45.255716666Z 2024-11-18T09:31:45.255Z [pebble] GET /v1/notices?timeout=30s 30.000355191s 200
2024-11-18T09:32:15.257141106Z 2024-11-18T09:32:15.256Z [pebble] GET /v1/notices?timeout=30s 30.000882644s 200
2024-11-18T09:32:45.257923226Z 2024-11-18T09:32:45.257Z [pebble] GET /v1/notices?timeout=30s 30.000353311s 200
2024-11-18T09:33:15.258904802Z 2024-11-18T09:33:15.258Z [pebble] GET /v1/notices?timeout=30s 30.000515506s 200
2024-11-18T09:33:45.260400775Z 2024-11-18T09:33:45.260Z [pebble] GET /v1/notices?timeout=30s 30.001048313s 200
2024-11-18T09:34:15.261257405Z 2024-11-18T09:34:15.260Z [pebble] GET /v1/notices?timeout=30s 30.000259529s 200
2024-11-18T09:34:27.525540762Z 2024-11-18T09:34:27.525Z [pebble] GET /v1/plan?format=yaml 464.379µs 200
2024-11-18T09:34:27.527813468Z 2024-11-18T09:34:27.527Z [pebble] POST /v1/layers 584.352µs 200
2024-11-18T09:34:27.542909628Z 2024-11-18T09:34:27.542Z [pebble] POST /v1/services 7.216855ms 202
2024-11-18T09:34:27.560369106Z 2024-11-18T09:34:27.560Z [pebble] GET /v1/changes/1132/wait?timeout=4.000s 16.63498ms 200
2024-11-18T09:34:27.697026695Z 2024-11-18T09:34:27.696Z [pebble] GET /v1/checks?names=kfp-api-up 78.81µs 200
2024-11-18T09:34:45.261731382Z 2024-11-18T09:34:45.261Z [pebble] GET /v1/notices?timeout=30s 30.000177102s 200
2024-11-18T09:35:15.262580201Z 2024-11-18T09:35:15.262Z [pebble] GET /v1/notices?timeout=30s 30.000331611s 200
2024-11-18T09:35:45.263374675Z 2024-11-18T09:35:45.263Z [pebble] GET /v1/notices?timeout=30s 30.000379449s 200
2024-11-18T09:36:15.264846259Z 2024-11-18T09:36:15.264Z [pebble] GET /v1/notices?timeout=30s 30.001166415s 200
2024-11-18T09:36:45.265606909Z 2024-11-18T09:36:45.265Z [pebble] GET /v1/notices?timeout=30s 30.000429633s 200
2024-11-18T09:37:15.267185778Z 2024-11-18T09:37:15.267Z [pebble] GET /v1/notices?timeout=30s 30.00115639s 200
2024-11-18T09:37:45.267898122Z 2024-11-18T09:37:45.267Z [pebble] GET /v1/notices?timeout=30s 30.000257938s 200
2024-11-18T09:38:15.269088643Z 2024-11-18T09:38:15.268Z [pebble] GET /v1/notices?timeout=30s 30.000861213s 200
2024-11-18T09:38:45.269962779Z 2024-11-18T09:38:45.269Z [pebble] GET /v1/notices?timeout=30s 30.000224698s 200
2024-11-18T09:39:15.271592348Z 2024-11-18T09:39:15.271Z [pebble] GET /v1/notices?timeout=30s 30.001144879s 200
2024-11-18T09:39:27.528805796Z 2024-11-18T09:39:27.528Z [pebble] Check "kfp-api-up" failure 1 (threshold 3): Get "http://localhost:8888/apis/v1beta1/healthz": dial tcp [::1]:8888: connect: connection refused
2024-11-18T09:39:38.143636451Z 2024-11-18T09:39:38.143Z [pebble] GET /v1/plan?format=yaml 1.070653ms 200
2024-11-18T09:39:38.145699505Z 2024-11-18T09:39:38.145Z [pebble] POST /v1/layers 476.134µs 200
2024-11-18T09:39:38.159889966Z 2024-11-18T09:39:38.159Z [pebble] POST /v1/services 6.188255ms 202
2024-11-18T09:39:38.179424891Z 2024-11-18T09:39:38.179Z [pebble] GET /v1/changes/1133/wait?timeout=4.000s 18.597678ms 200
2024-11-18T09:39:38.305824945Z 2024-11-18T09:39:38.305Z [pebble] GET /v1/checks?names=kfp-api-up 69.856µs 200
2024-11-18T09:39:45.273247169Z 2024-11-18T09:39:45.272Z [pebble] GET /v1/notices?timeout=30s 30.001087391s 200
2024-11-18T09:40:15.274639645Z 2024-11-18T09:40:15.274Z [pebble] GET /v1/notices?timeout=30s 30.001133004s 200
2024-11-18T09:40:45.276196762Z 2024-11-18T09:40:45.275Z [pebble] GET /v1/notices?timeout=30s 30.001056298s 200
2024-11-18T09:41:15.277080708Z 2024-11-18T09:41:15.276Z [pebble] GET /v1/notices?timeout=30s 30.000547183s 200
2024-11-18T09:41:45.278838353Z 2024-11-18T09:41:45.278Z [pebble] GET /v1/notices?timeout=30s 30.001200037s 200
2024-11-18T09:42:15.279938218Z 2024-11-18T09:42:15.279Z [pebble] GET /v1/notices?timeout=30s 30.000619892s 200
2024-11-18T09:42:45.281421971Z 2024-11-18T09:42:45.281Z [pebble] GET /v1/notices?timeout=30s 30.001074684s 200
2024-11-18T09:43:15.282181145Z 2024-11-18T09:43:15.281Z [pebble] GET /v1/notices?timeout=30s 30.000172484s 200
2024-11-18T09:43:45.282771011Z 2024-11-18T09:43:45.282Z [pebble] GET /v1/notices?timeout=30s 30.000404198s 200
2024-11-18T09:44:15.283705759Z 2024-11-18T09:44:15.283Z [pebble] GET /v1/notices?timeout=30s 30.000260526s 200
2024-11-18T09:44:38.147367537Z 2024-11-18T09:44:38.147Z [pebble] Check "kfp-api-up" failure 1 (threshold 3): Get "http://localhost:8888/apis/v1beta1/healthz": dial tcp [::1]:8888: connect: connection refused
2024-11-18T09:44:41.769449130Z 2024-11-18T09:44:41.769Z [pebble] GET /v1/plan?format=yaml 420.141µs 200
2024-11-18T09:44:41.771614762Z 2024-11-18T09:44:41.771Z [pebble] POST /v1/layers 510.247µs 200
2024-11-18T09:44:41.788080205Z 2024-11-18T09:44:41.787Z [pebble] POST /v1/services 9.071798ms 202
2024-11-18T09:44:41.805457471Z 2024-11-18T09:44:41.805Z [pebble] GET /v1/changes/1134/wait?timeout=4.000s 16.574574ms 200
2024-11-18T09:44:41.928889550Z 2024-11-18T09:44:41.928Z [pebble] GET /v1/checks?names=kfp-api-up 61.348µs 200
2024-11-18T09:44:45.284183195Z 2024-11-18T09:44:45.284Z [pebble] GET /v1/notices?timeout=30s 30.000180313s 200
2024-11-18T09:45:15.284833920Z 2024-11-18T09:45:15.284Z [pebble] GET /v1/notices?timeout=30s 30.000150228s 200
2024-11-18T09:45:45.285359496Z 2024-11-18T09:45:45.285Z [pebble] GET /v1/notices?timeout=30s 30.000170229s 200
2024-11-18T09:46:15.286796650Z 2024-11-18T09:46:15.286Z [pebble] GET /v1/notices?timeout=30s 30.00108555s 200
2024-11-18T09:46:45.287507954Z 2024-11-18T09:46:45.287Z [pebble] GET /v1/notices?timeout=30s 30.000253215s 200
2024-11-18T09:47:15.288570513Z 2024-11-18T09:47:15.288Z [pebble] GET /v1/notices?timeout=30s 30.000662851s 200
2024-11-18T09:47:45.289625116Z 2024-11-18T09:47:45.289Z [pebble] GET /v1/notices?timeout=30s 30.000570751s 200
2024-11-18T09:48:15.290508511Z 2024-11-18T09:48:15.290Z [pebble] GET /v1/notices?timeout=30s 30.000385218s 200
2024-11-18T09:48:45.291765852Z 2024-11-18T09:48:45.291Z [pebble] GET /v1/notices?timeout=30s 30.000666432s 200
2024-11-18T09:49:15.292941261Z 2024-11-18T09:49:15.292Z [pebble] GET /v1/notices?timeout=30s 30.000592343s 200
2024-11-18T09:49:41.774002260Z 2024-11-18T09:49:41.773Z [pebble] Check "kfp-api-up" failure 1 (threshold 3): Get "http://localhost:8888/apis/v1beta1/healthz": dial tcp [::1]:8888: connect: connection refused
2024-11-18T09:49:45.294052807Z 2024-11-18T09:49:45.293Z [pebble] GET /v1/notices?timeout=30s 30.000601274s 200
2024-11-18T09:49:59.259960010Z 2024-11-18T09:49:59.259Z [pebble] GET /v1/plan?format=yaml 1.460104ms 200
2024-11-18T09:49:59.262885474Z 2024-11-18T09:49:59.262Z [pebble] POST /v1/layers 732.729µs 200
2024-11-18T09:49:59.280996270Z 2024-11-18T09:49:59.280Z [pebble] POST /v1/services 8.803332ms 202
2024-11-18T09:49:59.300681853Z 2024-11-18T09:49:59.300Z [pebble] GET /v1/changes/1135/wait?timeout=4.000s 18.586788ms 200
2024-11-18T09:49:59.436543363Z 2024-11-18T09:49:59.436Z [pebble] GET /v1/checks?names=kfp-api-up 113.87µs 200
2024-11-18T09:50:15.295400823Z 2024-11-18T09:50:15.295Z [pebble] GET /v1/notices?timeout=30s 30.000664943s 200
2024-11-18T09:50:45.296678601Z 2024-11-18T09:50:45.296Z [pebble] GET /v1/notices?timeout=30s 30.000910485s 200
2024-11-18T09:51:15.297625988Z 2024-11-18T09:51:15.297Z [pebble] GET /v1/notices?timeout=30s 30.000503061s 200
2024-11-18T09:51:45.299288132Z 2024-11-18T09:51:45.299Z [pebble] GET /v1/notices?timeout=30s 30.001188866s 200
2024-11-18T09:52:15.300450234Z 2024-11-18T09:52:15.300Z [pebble] GET /v1/notices?timeout=30s 30.000836577s 200
2024-11-18T09:52:45.301719238Z 2024-11-18T09:52:45.301Z [pebble] GET /v1/notices?timeout=30s 30.000967983s 200
2024-11-18T09:53:15.302638899Z 2024-11-18T09:53:15.302Z [pebble] GET /v1/notices?timeout=30s 30.000478762s 200
2024-11-18T09:53:45.304049297Z 2024-11-18T09:53:45.303Z [pebble] GET /v1/notices?timeout=30s 30.000915182s 200
2024-11-18T09:54:15.304719690Z 2024-11-18T09:54:15.304Z [pebble] GET /v1/notices?timeout=30s 30.000336128s 200
2024-11-18T09:54:16.582884168Z 2024-11-18T09:54:16.582Z [pebble] GET /v1/plan?format=yaml 414.417µs 200
2024-11-18T09:54:16.584950560Z 2024-11-18T09:54:16.584Z [pebble] POST /v1/layers 467.407µs 200
2024-11-18T09:54:16.600830393Z 2024-11-18T09:54:16.600Z [pebble] POST /v1/services 7.842136ms 202
2024-11-18T09:54:16.619580417Z 2024-11-18T09:54:16.619Z [pebble] GET /v1/changes/1136/wait?timeout=4.000s 17.848238ms 200
2024-11-18T09:54:16.746705331Z 2024-11-18T09:54:16.746Z [pebble] GET /v1/checks?names=kfp-api-up 223.081µs 200
2024-11-18T09:54:45.306518562Z 2024-11-18T09:54:45.306Z [pebble] GET /v1/notices?timeout=30s 30.000789093s 200
2024-11-18T09:55:15.308235453Z 2024-11-18T09:55:15.308Z [pebble] GET /v1/notices?timeout=30s 30.001160195s 200
2024-11-18T09:55:45.309954095Z 2024-11-18T09:55:45.309Z [pebble] GET /v1/notices?timeout=30s 30.001143531s 200
2024-11-18T09:56:15.310674320Z 2024-11-18T09:56:15.310Z [pebble] GET /v1/notices?timeout=30s 30.000207561s 200
2024-11-18T09:56:45.312075195Z 2024-11-18T09:56:45.311Z [pebble] GET /v1/notices?timeout=30s 30.000969338s 200
2024-11-18T09:57:15.312797357Z 2024-11-18T09:57:15.312Z [pebble] GET /v1/notices?timeout=30s 30.000274586s 200
2024-11-18T09:57:45.314028896Z 2024-11-18T09:57:45.313Z [pebble] GET /v1/notices?timeout=30s 30.000842164s 200
2024-11-18T09:58:15.314759821Z 2024-11-18T09:58:15.314Z [pebble] GET /v1/notices?timeout=30s 30.000294789s 200
2024-11-18T09:58:45.316124941Z 2024-11-18T09:58:45.315Z [pebble] GET /v1/notices?timeout=30s 30.000478817s 200

I think the check is running in every 5 minutes and sometimes it is giving the "connection refused" error.

I've just wanted to share that though I am not sure if it is related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants