Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xen-dom-mgmt: Fix potential concurrency issues #72

Merged
merged 2 commits into from
Aug 5, 2024

Conversation

Deedone
Copy link
Contributor

@Deedone Deedone commented Apr 19, 2024

Add refcounting to the domain structure to make it thread-safe. Fix possible deadlock in console deinit.

Copy link
Collaborator

@firscity firscity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 1 more comment:
commit message Change xs_remove_xenstore_backends() signature - this is a consequence, not the reason. Please add proper explanation to commit topic (use domain reference instead of domid for backend init etc.)

xen-dom-mgmt/src/xen-dom-mgmt.c Outdated Show resolved Hide resolved
xen-dom-mgmt/src/xen-dom-mgmt.c Outdated Show resolved Hide resolved
@GrygiriiS
Copy link

GrygiriiS commented Apr 22, 2024

Huh. lock on lock on lock + ref count. It looks very over-complicated and over-designed. In my strong opinion, the locking is the problem of Zephyr app and not a library. More over, domain management/control process should be simple and straight forward in Zephyr app which definitely should not require to have ref_count at minimum. And I think It all should just work with one global lock.

@lorc
Copy link
Collaborator

lorc commented Apr 23, 2024

domain_destroy routine gets a reference
and waits until it's reference is the only one before destroying the
domain.

A am not sure I got this logic. domain_destroy should put down one reference. Real domain destruction should be performed in put_domain when reference counter reaches zero. Take a look at linux kref as example.

@Deedone
Copy link
Contributor Author

Deedone commented Apr 25, 2024

Moved freeing to put_domain and removed the per-domain lock.

@GrygiriiS
Copy link

GrygiriiS commented May 7, 2024

@Deedone Could you send first patch alone as it's definitely fix "xen_console: Fix deadlock in xen_stop_domain_console()".

Regarding protection - My point stands, per-doms locks and ref_count is over complications and just domctrl_global_lock and global_console_lock should be enough.
Domain operation are heavy weight and time consuming and definitely it's not required to perform any domctrl operations even a dozen times per sec. More over, any domctrl ops (even more any call into Xen) are not Real-time compatible as can't be estimated). Any "performance" optimizations without strict use-case or open issue is, unfortunately, usually just waste of time.

@Deedone Deedone force-pushed the domain_refcount branch from eedc047 to 62fa7f1 Compare May 7, 2024 09:54
@lorc
Copy link
Collaborator

lorc commented May 9, 2024

Moved freeing to put_domain and removed the per-domain lock.

What I meant, is that whole domain_destroy should be triggered by put_domain(). Because current implementation has no sense: someone may be holding domain reference and trying to do something with it, while you are stopping console, destroying xenstore entries and so on. There is no sense in having reference for a domain, when it basically is destroyed already. At least I see no use case for this. Maybe I am wrong.

I believe @GrygiriiS is right and we have no use for reference counting for now. Maybe global domain_lock will be sufficient. On other hand, we had global pci_lock in Xen and it made lots and lots headache when we tried implement more complex logic in PCI subsystem... So I am a little hesitant about introducing huge global locks.

Anyways, @firscity is maintaining this project and he knows it better, so last word is his.

In the meantime, you can have my:

Reviewed-by: Volodymyr Babchuk <[email protected]> for the first two patches. Could you please create a separate PR with at least first one, so we can merge it faster?

@firscity
Copy link
Collaborator

firscity commented May 9, 2024

There is no sense in having reference for a domain, when it basically is destroyed already.

Yeah, this part should be changed as you suggested before.

So I am a little hesitant about introducing huge global locks.

We already have at least 3 ways to create and manage domains - directly from application via zephyr-xenlib, via xrun json configurations and via Zephyr shell command. I think that global locking (altogether with all other locks in console, xenstore etc.) will definitely bring more problems and will be less flexible than refcounting.
I may be wrong, but I think suggestion with decrementing refcount inside destroy_domain() and moving its actual logic to put_domain() should work fine.

Copy link
Collaborator

@firscity firscity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With checkpatch fixed:
Reviewed-by: Dmytro Firsov <[email protected]>

domain->refcount--;
if (domain->refcount == 0) {
if (domain->f_dom0less) {
LOG_ERR("dom0less domain#%u operation not supported", domain->domid);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fully understand what will happen in this case. We will be stuck with struct xen_domain that has refcount==0 forever?

Also you need to unlock the mutex.

Shall we replace this with ASSERT(!domain->f_dom0less) ? @firscity, @GrygiriiS ?

@Deedone Deedone force-pushed the domain_refcount branch 2 times, most recently from c59bc5f to d3a1abc Compare August 1, 2024 04:05
@Deedone
Copy link
Contributor Author

Deedone commented Aug 1, 2024

Added assert, fixed checkpatch

Copy link
Collaborator

@lorc lorc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Volodymyr Babchuk <[email protected]>

Deedone added 2 commits August 2, 2024 09:28
Make xs_remove_xenstore_backends accept a pointer to xen_domain instead
of domain_id to improve ownership management of the domain object.
Practically this removes the need to get an extra reference to the
domain during the destroy sequence.

Signed-off-by: Mykyta Poturai <[email protected]>
Reviewed-by: Dmytro Firsov <[email protected]>
Acked-by: Volodymyr Babchuk <[email protected]>
Add reference counting to the domain structure to make accessing it
safe without global locks. domain_create initializes the refcount to 1
to prevent the domain from being freed right away, this original
reference is dropped in domain_destroy to trigger freeing the domain.

Also update test headers to keep them in sync with the changes.

Signed-off-by: Mykyta Poturai <[email protected]>
Reviewed-by: Dmytro Firsov <[email protected]>
Acked-by: Volodymyr Babchuk <[email protected]>
@firscity firscity merged commit 801d3b1 into xen-troops:main Aug 5, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants