Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help Wanted: Not able to upgrade to 1.6.0 #14556

Closed
sheltonsuen opened this issue Jan 13, 2024 · 10 comments
Closed

Help Wanted: Not able to upgrade to 1.6.0 #14556

sheltonsuen opened this issue Jan 13, 2024 · 10 comments
Labels
type/bug Something isn't working
Milestone

Comments

@sheltonsuen
Copy link

sheltonsuen commented Jan 13, 2024

Describe the bug

When tring to upgrade from 1.5.4 to 1.6.0

the compute node keep crash due to probe failed, but no other error logs in compute node and meta node

image image image image

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

helm upgrade

The version of RisingWave

1.6.0

Additional context

No response

@sheltonsuen sheltonsuen added the type/bug Something isn't working label Jan 13, 2024
@github-actions github-actions bot added this to the release-1.7 milestone Jan 13, 2024
@sheltonsuen
Copy link
Author

and k8s version:
image

@sheltonsuen sheltonsuen changed the title Not able to upgrade to 1.6.0 Help Wanted: Not able to upgrade to 1.6.0 Jan 13, 2024
@sheltonsuen
Copy link
Author

It's a cluster issue not related to risingwave, close it

@sheltonsuen
Copy link
Author

Still have this issue, the compute pods keep crash with exist code 1 after startup, no other message collected

@sheltonsuen sheltonsuen reopened this Jan 14, 2024
@sheltonsuen
Copy link
Author

I have try to enlarge the probe timeout and delay time, but not help

@lmatz
Copy link
Contributor

lmatz commented Jan 14, 2024

the compute pods keep crash with exist code 1 after startup

what's the log of RW
If RW crashes, there must be some error or panic logs.

what's the status of pods, what does it say in the REASON field?

@lmatz
Copy link
Contributor

lmatz commented Jan 15, 2024

I just rerun the upgrade test for v1.5.4 and v1.6.0, and it works as expected.

Normally, if the upgrade fails, there should be some logs about certain inconsistencies, e.g.:

  1. key values in the old meta cannot be recognized by the new meta
  2. key values in the state of old operators cannot be recognized by the new operators

They must trigger some panic and will be shown in the log.

@sheltonsuen
Copy link
Author

I just rerun the upgrade test for v1.5.4 and v1.6.0, and it works as expected.

Normally, if the upgrade fails, there should be some logs about certain inconsistencies, e.g.:

  1. key values in the old meta cannot be recognized by the new meta
  2. key values in the state of old operators cannot be recognized by the new operators

They must trigger some panic and will be shown in the log.

Thank you for the test,

I didn't collect any panic log from compute node, maybe it's another new instance up, or somthing

Based on your test, later today(once our tester not using current env) I will start another upgrade to see if there any usefull log

@lmatz
Copy link
Contributor

lmatz commented Jan 16, 2024

Hi @sheltonsuen

could you check if "connector node" is still set to replica "1" in the yaml file?

If it is, please delete it or set the replica to 0.

The connector node is embedded into compute node, and no longer need to be set explicitly.

@sheltonsuen
Copy link
Author

Hi @sheltonsuen

could you check if "connector node" is still set to replica "1" in the yaml file?

If it is, please delete it or set the replica to 0.

The connector node is embedded into compute node, and no longer need to be set explicitly.

It's working after setting replica to 0, thanks

@zwang28
Copy link
Contributor

zwang28 commented Jan 18, 2024

FYI, connector node should be disabled in helm via

connectorComponent:
  enabled: false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants