bug: frontend need restart to recover from breakdown #9239

zwang28 · 2023-04-17T11:55:11Z

Describe the bug

We encounter frontend breakdown, as below:

dev=> set visibility_mode to all;
SET_VARIABLE
dev=> select * from xxx limit 1;
ERROR:  QueryError: internal error: error trying to connect: deadline has elapsed
dev=> set visibility_mode to all;
SET_VARIABLE
dev=> select * from xxx limit 1;
ERROR:  QueryError: internal error: error trying to connect: dns error: failed to lookup address information: Name or service not known

After restarting frontend, the breakdown is gone.

To Reproduce

No response

Expected behavior

No response

Additional context

There has been compute node leaving/joining the cluster. Can it be related to frontend's compute node client pool?

The text was updated successfully, but these errors were encountered:

ZENOTME · 2023-04-19T09:25:43Z

Is possible following case happen🤔:
A compute node leaving but the worker_node_manager didn't update yet. And then the query try to get a rpc client using the outdated worker node so that the connect failed.

zwang28 · 2023-04-19T09:53:07Z

Is possible following case happen🤔:
A compute node leaving but the worker_node_manager didn't update yet. And then the query try to get a rpc client using the outdated worker node so that the connect failed.

The error remains for a long time until restart.

ZENOTME · 2023-04-21T06:24:10Z

Do we have more concrate log to reproduce this bug? Such as what reschedule happen before compute join or leave the cluster.

hzxa21 · 2023-05-19T10:29:45Z

Do we have more concrate log to reproduce this bug? Such as what reschedule happen before compute join or leave the cluster.

cc @zwang28 Any more information can be provided?

Assign to @ZENOTME first. Feel free to reassign.

zwang28 · 2023-05-19T10:52:21Z

Any more information can be provided?

Informed @ZENOTME when it occurred once again recently.
No other information from my side.

zwang28 added type/bug Something isn't working component/batch Batch related related issue. labels Apr 17, 2023

github-actions bot added this to the release-0.19 milestone Apr 17, 2023

This comment was marked as resolved.

Sign in to view

hzxa21 assigned ZENOTME May 19, 2023

zwang28 removed this from the release-0.19 milestone Jul 14, 2023

zwang28 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: frontend need restart to recover from breakdown #9239

bug: frontend need restart to recover from breakdown #9239

zwang28 commented Apr 17, 2023 •

edited

Loading

ZENOTME commented Apr 19, 2023

zwang28 commented Apr 19, 2023

This comment was marked as resolved.

ZENOTME commented Apr 21, 2023

hzxa21 commented May 19, 2023

zwang28 commented May 19, 2023 •

edited

Loading

bug: frontend need restart to recover from breakdown #9239

bug: frontend need restart to recover from breakdown #9239

Comments

zwang28 commented Apr 17, 2023 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Additional context

ZENOTME commented Apr 19, 2023

zwang28 commented Apr 19, 2023

This comment was marked as resolved.

ZENOTME commented Apr 21, 2023

hzxa21 commented May 19, 2023

zwang28 commented May 19, 2023 • edited Loading

zwang28 commented Apr 17, 2023 •

edited

Loading

zwang28 commented May 19, 2023 •

edited

Loading