Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Handle Race Condition for Istio Gateway Secret #2078

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

LeelaChacha
Copy link
Contributor

Fix for #2004

@LeelaChacha LeelaChacha requested a review from a team as a code owner November 29, 2024 11:15
@kyma-bot kyma-bot added the cla: yes Indicates the PR's author has signed the CLA. label Nov 29, 2024
@kyma-bot kyma-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 29, 2024
@c-pius
Copy link
Contributor

c-pius commented Nov 29, 2024

FYI: we also had some checkmarx related issues: #2075
I think we can close my PR in favor of yours, would you mind double checking the fixes though? Maybe you will need to change the passing of the configs and the panics in the same way.

@LeelaChacha
Copy link
Contributor Author

LeelaChacha commented Nov 29, 2024

the panics

WDYT about my comment on the panic? Should the goroutine exit with a bootstrap error? @c-pius

kcpClient client.Client
type Handler struct {
kcpClient client.Client
kcpClientset *kubernetes.Clientset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it make more sense only use Clientset in this case, then this package is decoupled with controller-runtime

func (gsh *GatewaySecretHandler) StartRootCertificateWatch(clientset *kubernetes.Clientset,
log logr.Logger,
) {
func (h *Handler) StartRootCertificateWatch() {
ctx, cancel := context.WithCancel(context.TODO())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use TODO context, in this case, use background

}

func (h *Handler) handleAlreadyCreatedRootCertificate(ctx context.Context) {
rootCASecrets, err := h.kcpClientset.CoreV1().Secrets(istioNamespace).List(ctx, apimetav1.ListOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use .List but not .Get?

FieldSelector: fields.OneTermEqualSelector(apimetav1.ObjectNameField, kcpRootSecretName).String(),
})
if err != nil {
log.Error(err, "unable to start watching root certificate")
h.log.Error(err, "unable to start watching root certificate")
panic(err)
Copy link
Contributor

@ruanxin ruanxin Dec 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you panic in goruntine, it just crashed this goruntine itself, but in your case, if can't watching root certificate, the whole lifecycle manager should not started, right? That's basically the reason why no secret found, the lifecycle manager still running, instead it should crashed and new instance get created.

In this case, you need to send error back to go main goruntine and panic there. So you should use channel here.

Another option is not design whole StartRootCertificateWatch as goruntine, instead, before WatchEvents, you can still running in the main goruntine, you only need to make WatchEvents as a dedicated thread, it can simplify your panic handling.

if len(rootCASecrets.Items) != 1 {
h.log.Error(errExpectedOneRootCASecret, errExpectedOneRootCASecret.Error(),
"found", len(rootCASecrets.Items))
panic(fmt.Errorf("%w: found %d", errExpectedOneRootCASecret, len(rootCASecrets.Items)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, if you panic in a goruntine, the main thread is still running. https://github.com/kyma-project/lifecycle-manager/pull/2078/files#r1865020057

Copy link
Contributor

@ruanxin ruanxin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check my comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Indicates the PR's author has signed the CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants