Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the potential race condition of InitializeSubnetService #879

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zhengxiexie
Copy link
Contributor

It is possible that an error occurs when trying to send to the already closed fatalErrors channel, resulting in a panic.

@codecov-commenter
Copy link

codecov-commenter commented Nov 11, 2024

Codecov Report

Attention: Patch coverage is 70.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 70.60%. Comparing base (bf1880a) to head (8fba48c).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/nsx/services/subnet/subnet.go 70.00% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #879      +/-   ##
==========================================
- Coverage   70.61%   70.60%   -0.01%     
==========================================
  Files          95       95              
  Lines       15094    15110      +16     
==========================================
+ Hits        10658    10669      +11     
- Misses       3710     3715       +5     
  Partials      726      726              
Flag Coverage Δ
unit-tests 70.60% <70.00%> (-0.01%) ⬇️
Files with missing lines Coverage Δ
pkg/nsx/services/subnet/subnet.go 30.17% <70.00%> (+1.85%) ⬆️

@zhengxiexie zhengxiexie force-pushed the topic/zhengxie/main/race_condition branch from 2534c5b to 076af3a Compare November 11, 2024 04:46
It is possible that an error occurs when trying to send to the already closed `fatalErrors` channel, resulting in a panic.
@zhengxiexie zhengxiexie force-pushed the topic/zhengxie/main/race_condition branch from 076af3a to 8fba48c Compare November 11, 2024 05:16
go.work.sum
.tool-versions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed?

@@ -65,20 +65,39 @@ func InitializeSubnetService(service common.Service) (*SubnetService, error) {
},
}

// Use sync.Once to ensure channel is closed only once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to change like this,

func InitializeSubnetService(service common.Service) (*SubnetService, error) {
	wg := sync.WaitGroup{}
	fatalErrors := make(chan error, 1)
	defer close(fatalErrors)
	subnetService := &SubnetService{
		Service: service,
		SubnetStore: &SubnetStore{
			ResourceStore: common.ResourceStore{
				Indexer: cache.NewIndexer(keyFunc, cache.Indexers{
					common.TagScopeSubnetCRUID:    subnetIndexFunc,
					common.TagScopeSubnetSetCRUID: subnetSetIndexFunc,
					common.TagScopeVMNamespace:    subnetIndexVMNamespaceFunc,
					common.TagScopeNamespace:      subnetIndexNamespaceFunc,
				}),
				BindingType: model.VpcSubnetBindingType(),
			},
		},
	}

	wg.Add(1)
	go subnetService.InitializeResourceStore(&wg, fatalErrors, ResourceTypeSubnet, nil, subnetService.SubnetStore)
	wg.Wait()

	if len(fatalErrors) > 0 {
		err := <-fatalErrors
		return subnetService, err
	}

	return subnetService, nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/vmware-tanzu/nsx-operator/blob/main/pkg/nsx/services/securitypolicy/firewall.go#L116-L116
The initial reason is to return as early as possible if there are multiple resources.
You should consider closing channel fatalErrors in this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fatalErrors is closed in "defer" after it is declared in my example.
Even if there are multiple resources, we still need to wait until all sub-tasks are done rather than leave them out of control, otherwise it may lead to unexpected issues (e.g., a zombie routine).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this version has no zombie routine.

@zhengxiexie
Copy link
Contributor Author

/e2e

2 similar comments
@zhengxiexie
Copy link
Contributor Author

/e2e

@zhengxiexie
Copy link
Contributor Author

/e2e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants