Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling early startup issues #219

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

marmarek
Copy link
Member

@marmarek marmarek commented Dec 2, 2024

Better handle stopping the service while it's still starting up. And also improve handling Xorg startup errors (much simpler alternative to #176).

In case the service is stopped while Xorg is still starting up (and
gui-agent still waits for the Xorg connectin in mkghandles), gui-agent
would exit before killing Xorg and Xorg would try connecting back to the
gui-agent forever, delaying the shutdown.

Fix this by moving signal registration earlier, before Xorg startup.
Since ghandles_for_vchan_reinitialize is now set before its fully
initialized, initialize x_pid field explicitly and leave all the other
fields zeroed (instead of random stack rubble).
Register proper signal handler for SIGCHLD, and collect the Xorg's
zombie in it.

This has two effects:
1. The main loop can explicitly exit on Xorg termination, not only via
   receiving EOF on the socket.
2. Due to not ignoring SIGCHLD anymore, accept() in mkghandles will also
   notice Xorg early exit and not wait indefinitely (it will fail with
   EINTR). For this case, improve error message.

There is still a small race on startup, if Xorg exits before reaching
accept() (or listen()) call. Handle this by checking just before
accept() call. It isn't perfect (there is still a few instructions
window where it wouldn't notice it in time), but it's good enough for
practical purposes.

QubesOS/qubes-issues#8060
Use X's logging function instead of plain perror, to ensure the message
is written in appropriate Xorg's log.
@qubesos-bot
Copy link

qubesos-bot commented Dec 3, 2024

OpenQA test summary

Complete test suite and dependencies: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2024120315-4.3&flavor=pull-requests

Test run included the following:

New failures, excluding unstable

Compared to: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2024111705-4.3&flavor=update

  • system_tests_pvgrub_salt_storage

    • TC_41_HVMGrub_fedora-40-xfce: test_000_standalone_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

    • TC_41_HVMGrub_fedora-40-xfce: test_010_template_based_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

  • system_tests_devices

    • TC_00_List_whonix-workstation-17: test_001_list_loop_mounted (failure)
      AssertionError: Device test-inst-vm:loop0::0 (/tmp/test.img) should...
  • system_tests_kde_gui_interactive

    • simple_gui_apps: unnamed test (unknown)

    • simple_gui_apps: Failed (test died)
      # Test died: no candidate needle with tag(s) 'menu-vm-work' matched...

    • simple_gui_apps: unnamed test (unknown)

  • system_tests_guivm_vnc_gui_interactive

    • gui_keyboard_layout: unnamed test (unknown)

    • gui_keyboard_layout: Failed (test died)
      # Test died: no candidate needle with tag(s) 'work-xterm, work-xter...

    • gui_keyboard_layout: unnamed test (unknown)

  • system_tests_audio

  • system_tests_audio@hw1

  • system_tests_gui_interactive@hw1

    • startup: unnamed test (unknown)
    • startup: Failed (test died)
      # Test died: no candidate needle with tag(s) 'nm-connection-establi...

Failed tests

19 failures
  • system_tests_pvgrub_salt_storage

    • TC_41_HVMGrub_fedora-40-xfce: test_000_standalone_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

    • TC_41_HVMGrub_fedora-40-xfce: test_010_template_based_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 120 seco...

  • system_tests_extra

    • TC_00_QVCTest_whonix-gateway-17: test_010_screenshare (failure)
      ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^... AssertionError: 0 == 0
  • system_tests_devices

    • TC_00_List_whonix-workstation-17: test_001_list_loop_mounted (failure)
      AssertionError: Device test-inst-vm:loop0::0 (/tmp/test.img) should...
  • system_tests_kde_gui_interactive

    • simple_gui_apps: unnamed test (unknown)

    • simple_gui_apps: Failed (test died)
      # Test died: no candidate needle with tag(s) 'menu-vm-work' matched...

    • simple_gui_apps: unnamed test (unknown)

  • system_tests_basic_vm_qrexec_gui_zfs

    • switch_pool: Failed (test died)
      # Test died: command 'dnf install -y ./zfs-release.rpm' failed at /...
  • system_tests_guivm_vnc_gui_interactive

    • gui_keyboard_layout: unnamed test (unknown)

    • gui_keyboard_layout: Failed (test died)
      # Test died: no candidate needle with tag(s) 'work-xterm, work-xter...

    • gui_keyboard_layout: unnamed test (unknown)

  • system_tests_audio

  • system_tests_audio@hw1

  • system_tests_gui_interactive@hw1

    • startup: unnamed test (unknown)
    • startup: Failed (test died)
      # Test died: no candidate needle with tag(s) 'nm-connection-establi...

Fixed failures

Compared to: https://openqa.qubes-os.org/tests/119126#dependencies

2 fixed
  • system_tests_kde_gui_interactive

    • gui_keyboard_layout: Failed (test died)
      # Test died: command 'test "$(cd ~user;ls e1*)" = "$(qvm-run -p wor...
  • system_tests_audio@hw1

Unstable tests

  • system_tests_audio

    TC_20_AudioVM_PipeWire_fedora-40-xfce/test_260_audio_mic_enabled_switch_audiovm (2/5 times with errors)
    • job 116847 AssertionError: too short audio, expected 10s, got 0.00013605442176...
    • job 117586 AssertionError: too short audio, expected 10s, got 0.00013605442176...
  • system_tests_audio@hw1

    TC_20_AudioVM_PipeWire_fedora-40-xfce/test_260_audio_mic_enabled_switch_audiovm (2/5 times with errors)
    • job 116847 AssertionError: too short audio, expected 10s, got 0.00013605442176...
    • job 117586 AssertionError: too short audio, expected 10s, got 0.00013605442176...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants