Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frontend unexpected end of file error all the time #13515

Closed
cyliu0 opened this issue Nov 20, 2023 · 12 comments · Fixed by #13849
Closed

Frontend unexpected end of file error all the time #13515

cyliu0 opened this issue Nov 20, 2023 · 12 comments · Fixed by #13849
Assignees
Labels
priority/low type/bug Something isn't working
Milestone

Comments

@cyliu0
Copy link
Collaborator

cyliu0 commented Nov 20, 2023

Describe the bug

When I test with nightly-20231116 image, I got the following error in the frontend logs all the time

ERROR pgwire::pg_server: error when reading message error=unexpected end of file

Log

image

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

nightly-20231116

Additional context

No response

@cyliu0 cyliu0 added the type/bug Something isn't working label Nov 20, 2023
@github-actions github-actions bot added this to the release-1.5 milestone Nov 20, 2023
@lmatz
Copy link
Contributor

lmatz commented Nov 20, 2023

How is the cluster started?

According to the timestamp, It looks like some service connects to Risingwave periodically

@cyliu0
Copy link
Collaborator Author

cyliu0 commented Nov 21, 2023

How is the cluster started?

Via kube-bench

According to the timestamp, It looks like some service connects to Risingwave periodically

There are upstream MySQL Direct CDC, sysbench, psql client.

@xiangjinwu
Copy link
Contributor

maybe health check similar to #12899?

@fuyufjh fuyufjh modified the milestones: release-1.5, release-1.6 Dec 6, 2023
@fuyufjh
Copy link
Member

fuyufjh commented Dec 6, 2023

How can we reproduce? It seems a client is keeping connecting the frontend

@cyliu0
Copy link
Collaborator Author

cyliu0 commented Dec 6, 2023

Actually, I think all my clusters hitting this issue in the test env. Like this alive one in namespace backfill-20231206-063706 running right now by https://buildkite.com/risingwave-test/backfill/builds/18#018c3dd6-93f9-4ad9-827d-83073cc2a20e

There is only one psql connection which is executing a delete SQL right now.

Frontend log
image

@chenzl25
Copy link
Contributor

chenzl25 commented Dec 6, 2023

Could you run show processlist on this frontend?

@chenzl25
Copy link
Contributor

chenzl25 commented Dec 6, 2023

According to the log, it seems 127.0.0.1 will connect to the frontend 1 time per 5s and 10.0.48.11 2 times per 10s.

@cyliu0
Copy link
Collaborator Author

cyliu0 commented Dec 6, 2023

dev=> show processlist;
 Id | User |       Host        | Database |   Time    |         Info
----+------+-------------------+----------+-----------+-----------------------
 24 | root | 10.0.34.194:41750 | dev      | 7741800ms | DELETE FROM backfill;
 25 | root | 127.0.0.1:49326   | dev      | 0ms       | show processlist;

@chenzl25
Copy link
Contributor

chenzl25 commented Dec 6, 2023

I think it is highly possible that some outer programs use telnet 127.0.0.1 4566 to connect to the frontend.

That would produce the same log.

2023-12-06T17:37:05.272187+08:00  INFO pgwire::pg_server: accept connection peer_addr=127.0.0.1:49960
2023-12-06T17:37:09.90075+08:00 ERROR pgwire::pg_server: error when reading message error=unexpected end of file

@chenzl25
Copy link
Contributor

chenzl25 commented Dec 6, 2023

Maybe k8s also has some health checks to the frontend? cc @arkbriar

@arkbriar
Copy link
Contributor

arkbriar commented Dec 6, 2023

Maybe k8s also has some health checks to the frontend? cc @arkbriar

Yes. There's a probe checking TCP connectivity of the SQL port. K8s will kill the process if the proce failed for 3 times continuously.

@arkbriar
Copy link
Contributor

arkbriar commented Dec 6, 2023

Looks like the connections were started by the kubelet because of probing. Is it possible to hide the error when a TCP connection closed without sending anything? It also makes sense since a client hasn't been initialized and can fail silently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/low type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants