Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CI for CUPS 2.5 #857

Merged
merged 2 commits into from
Jan 5, 2024
Merged

Fix CI for CUPS 2.5 #857

merged 2 commits into from
Jan 5, 2024

Conversation

zdohnal
Copy link
Member

@zdohnal zdohnal commented Jan 4, 2024

I found time to look why CI fails on Linux and came up with two fixes - for deadlock and infinite loop.

@zdohnal
Copy link
Member Author

zdohnal commented Jan 4, 2024

@michaelrsweet linux tests probably passes on httpAddrGetList() because the Linux machine hostname is not .local.... so mDNS resolution might be broken there as well... but what is interesting, Alex's job which is not so old and Mac CI failed before on the issue, his job passes... looks like it depends on something inside CI...

@zdohnal zdohnal requested a review from michaelrsweet January 4, 2024 14:02
Copy link
Member

@michaelrsweet michaelrsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

FWIW, I've still not been able to reproduce the hangs outside of the CI services, so there is something different about that environment... :/

Update remaining time after `usleep()`, otherwise there is no difference
in elapsed time, so `remaining` will never be less or equal to 0.

Fixes CI on Linux
Deadlock happens when we are about to destroy DNSSD struct by the end
of `cups_enum_dests()`. The main thread locks the mutex when the other
thread is in avahi poll callback at function `poll()` and unlocked the
mutex before - the other thread tries to lock the mutex once poll
timeout expires, but it cannot because it was locked by the main thread
and wait there. Meanwhile the main thread tries to cancel the other
thread, but the function where the other thread is not a cancellation
point, thus the cancel event is ignored and the main thread thread waits
indefinitely for the end of the other thread.

We can make the other thread asynchronous (which would cancel the thread
immediately) or release the mutex earlier in `cupsDNSSDDelete()`. The
commit does the latter.

Fixes CI for Linux
@zdohnal
Copy link
Member Author

zdohnal commented Jan 5, 2024

@michaelrsweet If you mean the hangs in Linux pipelines, one was directly connected to code with Avahi support, so you would need to build CUPS with Avahi, ideally in some Linux VM. And the deadlock happens basically due race condition, when the detached thread just unlocked the mutex and waits in poll() and in this interval the main thread tries to destroy DNSSD struct and locks the mutex...
The infinite loop happened because those two cups_elapsed(), which change the time baseline, were close to each other, thus the time difference was too small or none to get the loop finish in actual 1s as the default remaining time is. What I can think of is what could happen and could cause normal behavior is that your machine takes more time between those calls (so there is a bigger difference) or the time is more precise, so a time difference is generated as well. This one was a tough to realize for me, since I always first connect by gdb when debugging, so of course I was not as quick as machine, so I didn't realize this until I watched strace output and thought about it :D .
But I could reproduce both issues locally on my Fedora 38 laptop.

I was thinking about the MacOS issue as well - we add .local. to machine's hostname in httpGetHostname() for MacOS if the hostname does not contain a dot. So as a workaround we can set the machine hostname to a new which will contain a dot, but this way we will lose the coverage of this part of code. Do you know what package/module/library does mDNS resolution in MacOS?
I know Mac uses mDNSResponder, but I don't know if it contains plugin for resolving .local addresses in glibc functions like getaddrinfo() - nss-mdns does it in Linux by adding a new value into the line 'hosts' in name service switch configuration file (either /etc/nsswitch.conf or via authselect), which makes glibc to load a library provided by nss-mdns, which sends the resolving request to Avahi, processes response and gives answer to the glibc function. Does it work a similar way on Mac?

@zdohnal zdohnal merged commit cbfa37b into OpenPrinting:master Jan 5, 2024
5 of 6 checks passed
@michaelrsweet
Copy link
Member

@zdohnal I've been mostly doing my local Linux testing on Ubuntu, either in a VM of some sort (multipass and Parallels) or on my dedicated Linux laptop. Like I said, I wasn't able to make any of them fail, but then I always have network printers that show up for cups_enum_dests...

WRT mDNS on macOS, it is indeed mDNSResponder and that software will always be able to resolve the local hostname + .local, even when all external network interfaces are down (I've tested that to try to isolate the cause of the CI hang...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants