Skip to content

Commit

Permalink
Tracewrap 8.1 (#32)
Browse files Browse the repository at this point in the history
* target/s390x: Fix the "ignored match" case in VSTRS

Currently the emulation of VSTRS recognizes partial matches in presence
of \0 in the haystack, which, according to PoP, is not correct:

    If the ZS flag is one and a zero byte was detected
    in the second operand, then there can not be a
    partial match ...

Add a check for this. While at it, fold a number of explicitly handled
special cases into the generic logic.

Cc: [email protected]
Reported-by: Claudio Fontana <[email protected]>
Closes: https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg00633.html
Fixes: 1d706f314191 ("target/s390x: vxeh2: vector string search")
Signed-off-by: Ilya Leoshkevich <[email protected]>
Message-Id: <[email protected]>
Tested-by: Claudio Fontana <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
(cherry picked from commit 791b2b6a930273db694b9ba48bbb406e78715927)
Signed-off-by: Michael Tokarev <[email protected]>

* target/s390x: Use a 16-bit immediate in VREP

Unlike most other instructions that contain an immediate element index,
VREP's one is 16-bit, and not 4-bit. The code uses only 8 bits, so
using, e.g., 0x101 does not lead to a specification exception.

Fix by checking all 16 bits.

Cc: [email protected]
Fixes: 28d08731b1d8 ("s390x/tcg: Implement VECTOR REPLICATE")
Signed-off-by: Ilya Leoshkevich <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
(cherry picked from commit 23e87d419f347b6b5f4da3bf70d222acc24cdb64)
Signed-off-by: Michael Tokarev <[email protected]>

* target/s390x: Fix VSTL with a large length

The length is always truncated to 16 bytes. Do not probe more than
that.

Cc: [email protected]
Fixes: 0e0a5b49ad58 ("s390x/tcg: Implement VECTOR STORE WITH LENGTH")
Signed-off-by: Ilya Leoshkevich <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
(cherry picked from commit 6db3518ba4fcddd71049718f138552999f0d97b4)
Signed-off-by: Michael Tokarev <[email protected]>

* target/s390x: Check reserved bits of VFMIN/VFMAX's M5

VFMIN and VFMAX should raise a specification exceptions when bits 1-3
of M5 are set.

Cc: [email protected]
Fixes: da4807527f3b ("s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM)")
Signed-off-by: Ilya Leoshkevich <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
(cherry picked from commit 6a2ea6151835aa4f5fee29382a421c13b0e6619f)
Signed-off-by: Michael Tokarev <[email protected]>

* include/hw/virtio/virtio-gpu: Fix virtio-gpu with blob on big endian hosts

Using "-device virtio-gpu,blob=true" currently does not work on big
endian hosts (like s390x). The guest kernel prints an error message
like:

 [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x10c)

and the display stays black. When running QEMU with "-d guest_errors",
it shows an error message like this:

 virtio_gpu_create_mapping_iov: nr_entries is too big (83886080 > 16384)

which indicates that this value has not been properly byte-swapped.
And indeed, the virtio_gpu_create_blob_bswap() function (that should
swap the fields in the related structure) fails to swap some of the
entries. After correctly swapping all missing values here, too, the
virtio-gpu device is now also working with blob=true on s390x hosts.

Fixes: e0933d91b1 ("virtio-gpu: Add virtio_gpu_resource_create_blob")
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2230469
Message-Id: <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
(cherry picked from commit d194362910138776e8abd6bb3c9fb3693254e95f)
Signed-off-by: Michael Tokarev <[email protected]>

* kvm: Introduce kvm_arch_get_default_type hook

kvm_arch_get_default_type() returns the default KVM type. This hook is
particularly useful to derive a KVM type that is valid for "none"
machine model, which is used by libvirt to probe the availability of
KVM.

For MIPS, the existing mips_kvm_type() is reused. This function ensures
the availability of VZ which is mandatory to use KVM on the current
QEMU.

Cc: [email protected]
Signed-off-by: Akihiko Odaki <[email protected]>
Message-id: [email protected]
Reviewed-by: Peter Maydell <[email protected]>
[PMM: added doc comment for new function]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
(cherry picked from commit 5e0d65909c6f335d578b90491e165440c99adf81)
Signed-off-by: Michael Tokarev <[email protected]>

* accel/kvm: Specify default IPA size for arm64

Before this change, the default KVM type, which is used for non-virt
machine models, was 0.

The kernel documentation says:
> On arm64, the physical address size for a VM (IPA Size limit) is
> limited to 40bits by default. The limit can be configured if the host
> supports the extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
> KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
> identifier, where IPA_Bits is the maximum width of any physical
> address used by the VM. The IPA_Bits is encoded in bits[7-0] of the
> machine type identifier.
>
> e.g, to configure a guest to use 48bit physical address size::
>
>     vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48));
>
> The requested size (IPA_Bits) must be:
>
>  ==   =========================================================
>   0   Implies default size, 40bits (for backward compatibility)
>   N   Implies N bits, where N is a positive integer such that,
>       32 <= N <= Host_IPA_Limit
>  ==   =========================================================

> Host_IPA_Limit is the maximum possible value for IPA_Bits on the host
> and is dependent on the CPU capability and the kernel configuration.
> The limit can be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the
> KVM_CHECK_EXTENSION ioctl() at run-time.
>
> Creation of the VM will fail if the requested IPA size (whether it is
> implicit or explicit) is unsupported on the host.
https://docs.kernel.org/virt/kvm/api.html#kvm-create-vm

So if Host_IPA_Limit < 40, specifying 0 as the type will fail. This
actually confused libvirt, which uses "none" machine model to probe the
KVM availability, on M2 MacBook Air.

Fix this by using Host_IPA_Limit as the default type when
KVM_CAP_ARM_VM_IPA_SIZE is available.

Cc: [email protected]
Signed-off-by: Akihiko Odaki <[email protected]>
Message-id: [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
(cherry picked from commit 1ab445af8cd99343f29032b5944023ad7d8edebf)
Signed-off-by: Michael Tokarev <[email protected]>

* target/arm: Fix SME ST1Q

A typo, noted in the bug report, resulting in an
incorrect write offset.

Cc: [email protected]
Fixes: 7390e0e9ab8 ("target/arm: Implement SME LD1, ST1")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1833
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
(cherry picked from commit 4b3520fd93cd49cc56dfcab45d90735cc2e35af7)
Signed-off-by: Michael Tokarev <[email protected]>

* target/arm: Fix 64-bit SSRA

Typo applied byte-wise shift instead of double-word shift.

Cc: [email protected]
Fixes: 631e565450c ("target/arm: Create gen_gvec_[us]sra")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1737
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
(cherry picked from commit cd1e4db73646006039f25879af3bff55b2295ff3)
Signed-off-by: Michael Tokarev <[email protected]>

* docs/about/license: Update LICENSE URL

In early 2021 (see commit 2ad784339e "docs: update README to use
GitLab repo URLs") almost all of the code base was converted to
point to GitLab instead of git.qemu.org. During 2023, git.qemu.org
switched from a git mirror to a http redirect to GitLab (see [1]).

Update the LICENSE URL to match its previous content, displaying
the file raw content similarly to gitweb 'blob_plain' format ([2]).

[1] https://lore.kernel.org/qemu-devel/CABgObfZu3mFc8tM20K-yXdt7F-7eV-uKZN4sKDarSeu7DYoRbA@mail.gmail.com/
[2] https://git-scm.com/docs/gitweb#Documentation/gitweb.txt-blobplain

Reviewed-by: Daniel P. Berrangé <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-ID: <[email protected]>
(cherry picked from commit 09a3fffae00b042bed8ad9c351b1a58c505fde37)
Signed-off-by: Michael Tokarev <[email protected]>

* softmmu: Assert data in bounds in iotlb_to_section

Acked-by: Alex Bennée <[email protected]>
Suggested-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
(cherry picked from commit 86e4f93d827d3c1efd00cd8a906e38a2c0f2b5bc)
Signed-off-by: Michael Tokarev <[email protected]>

* block-migration: Ensure we don't crash during migration cleanup

We can fail the blk_insert_bs() at init_blk_migration(), leaving the
BlkMigDevState without a dirty_bitmap and BlockDriverState. Account
for the possibly missing elements when doing cleanup.

Fix the following crashes:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at ../block/dirty-bitmap.c:359
359         BlockDriverState *bs = bitmap->bs;
 #0  0x0000555555ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at ../block/dirty-bitmap.c:359
 #1  0x0000555555bba331 in unset_dirty_tracking () at ../migration/block.c:371
 #2  0x0000555555bbad98 in block_migration_cleanup_bmds () at ../migration/block.c:681

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
7073        QLIST_FOREACH_SAFE(blocker, &bs->op_blockers[op], list, next) {
 #0  0x0000555555e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
 #1  0x0000555555e9734a in bdrv_op_unblock_all (bs=0x0, reason=0x0) at ../block.c:7095
 #2  0x0000555555bbae13 in block_migration_cleanup_bmds () at ../migration/block.c:690

Signed-off-by: Fabiano Rosas <[email protected]>
Message-id: [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>
(cherry picked from commit f187609f27b261702a17f79d20bf252ee0d4f9cd)
Signed-off-by: Michael Tokarev <[email protected]>

* target/arm: properly document FEAT_CRC32

This is a mandatory feature for Armv8.1 architectures but we don't
state the feature clearly in our emulation list. Also include
FEAT_CRC32 comment in aarch64_max_tcg_initfn for ease of grepping.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Message-id: [email protected]
Cc: [email protected]
Message-Id: <[email protected]>
[PMM: pluralize 'instructions' in docs]
Signed-off-by: Peter Maydell <[email protected]>
(cherry picked from commit 9e771a2fc68d98c5719b877e008d1dca64e6896e)
Signed-off-by: Michael Tokarev <[email protected]>

* linux-user: Adjust brk for load_bias

PIE executables are usually linked at offset 0 and are
relocated somewhere during load.  The hiaddr needs to
be adjusted to keep the brk next to the executable.

Cc: [email protected]
Fixes: 1f356e8c013 ("linux-user: Adjust initial brk when interpreter is close to executable")
Tested-by: Helge Deller <[email protected]>
Reviewed-by: Ilya Leoshkevich <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
(cherry picked from commit aec338d63bc28f1f13d5e64c561d7f1dd0e4b07e)
Signed-off-by: Michael Tokarev <[email protected]>

* target/i386: raise FERR interrupt with iothread locked

Otherwise tcg_handle_interrupt() triggers an assertion failure:

  #5  0x0000555555c97369 in tcg_handle_interrupt (cpu=0x555557434cb0, mask=2) at ../accel/tcg/tcg-accel-ops.c:83
  #6  tcg_handle_interrupt (cpu=0x555557434cb0, mask=2) at ../accel/tcg/tcg-accel-ops.c:81
  #7  0x0000555555b4d58b in pic_irq_request (opaque=<optimized out>, irq=<optimized out>, level=1) at ../hw/i386/x86.c:555
  #8  0x0000555555b4f218 in gsi_handler (opaque=0x5555579423d0, n=13, level=1) at ../hw/i386/x86.c:611
  #9  0x00007fffa42bde14 in code_gen_buffer ()
  #10 0x0000555555c724bb in cpu_tb_exec (cpu=cpu@entry=0x555557434cb0, itb=<optimized out>, tb_exit=tb_exit@entry=0x7fffe9bfd658) at ../accel/tcg/cpu-exec.c:457

Cc: [email protected]
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1808
Reported-by: NyanCatTW1 <https://gitlab.com/a0939712328>
Co-developed-by: Richard Henderson <[email protected]>'
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
(cherry picked from commit c1f27a0c6ae4059a1d809e9c2bc4d47b823c32a3)
Signed-off-by: Michael Tokarev <[email protected]>

* ui/dbus: Properly dispose touch/mouse dbus objects

Fixes: 142ca628a7 ("ui: add a D-Bus display backend")
Fixes: de9f844ce2 ("ui/dbus: Expose a touch device interface")

Signed-off-by: Bilal Elmoussaoui <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Message-Id: <[email protected]>
(cherry picked from commit cb6ccdc9ca705cd8c3ef50e51c16a3732c2fa734)
Signed-off-by: Michael Tokarev <[email protected]>

* ppc/vof: Fix missed fields in VOF cleanup

Failing to reset the of_instance_last makes ihandle allocation continue
to increase, which causes record-replay replay fail to match the
recorded trace.

Not resetting claimed_base makes VOF eventually run out of memory after
some resets.

Cc: Alexey Kardashevskiy <[email protected]>
Fixes: fc8c745d501 ("spapr: Implement Open Firmware client interface")
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit 7b8589d7ce7e23f26ff53338d575a5cbd7818e28)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ppc/e500: fix broken snapshot replay

ppce500_reset_device_tree is registered for system reset, but after
c4b075318eb1 this function rerandomizes rng-seed via
qemu_guest_getrandom_nofail. And when loading a snapshot, it tries to read
EVENT_RANDOM that doesn't exist, so we have an error:

  qemu-system-ppc: Missing random event in the replay log

To fix this, use qemu_register_reset_nosnapshotload instead of
qemu_register_reset.

Reported-by: Vitaly Cheptsov <[email protected]>
Fixes: c4b075318eb1 ("hw/ppc: pass random seed to fdt ")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1634
Signed-off-by: Maksim Kostin <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit 6ec65b69ba17c954414fa23a397fb8a3fcfb4a43)
Signed-off-by: Michael Tokarev <[email protected]>

* target/ppc: Flush inputs to zero with NJ in ppc_store_vscr

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1779
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit af03aeb631eeb81a44d2c0ff5b429cd4b5dc2799)
Signed-off-by: Michael Tokarev <[email protected]>

* target/ppc: Fix LQ, STQ register-pair order for big-endian

LQ, STQ have the same register-pair ordering as LQARX/STQARX., which is
the even (lower) register contains the most significant bits. This is
not implemented correctly for big-endian.

do_ldst_quad() has variables low_addr_gpr and high_addr_gpr which is
confusing because they are low and high addresses, whereas LQARX/STQARX.
and most such things use the low and high values for lo/hi variables.
The conversion to native 128-bit memory access functions missed this
strangeness.

Fix this by changing the if condition, and change the variable names to
hi/lo to match convention.

Cc: [email protected]
Reported-by: Ivan Warren <[email protected]>
Fixes: 57b38ffd0c6f ("target/ppc: Use tcg_gen_qemu_{ld,st}_i128 for LQARX, LQ, STQ")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1836
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit 718209358f2e4f231cbacf974c3299c4fe7beb83)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ide/core: set ERR_STAT in unsupported command completion

Currently, the first time sending an unsupported command
(e.g. READ LOG DMA EXT) will not have ERR_STAT set in the completion.
Sending the unsupported command again, will correctly have ERR_STAT set.

When ide_cmd_permitted() returns false, it calls ide_abort_command().
ide_abort_command() first calls ide_transfer_stop(), which will call
ide_transfer_halt() and ide_cmd_done(), after that ide_abort_command()
sets ERR_STAT in status.

ide_cmd_done() for AHCI will call ahci_write_fis_d2h() which writes the
current status in the FIS, and raises an IRQ. (The status here will not
have ERR_STAT set!).

Thus, we cannot call ide_transfer_stop() before setting ERR_STAT, as
ide_transfer_stop() will result in the FIS being written and an IRQ
being raised.

The reason why it works the second time, is that ERR_STAT will still
be set from the previous command, so when writing the FIS, the
completion will correctly have ERR_STAT set.

Set ERR_STAT before writing the FIS (calling cmd_done), so that we will
raise an error IRQ correctly when receiving an unsupported command.

Signed-off-by: Niklas Cassel <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: [email protected]
Signed-off-by: John Snow <[email protected]>
(cherry picked from commit c3461c6264a7c8ca15b117e91fe5da786924a784)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ide/ahci: write D2H FIS when processing NCQ command

The way that BUSY + PxCI is cleared for NCQ (FPDMA QUEUED) commands is
described in SATA 3.5a Gold:

11.15 FPDMA QUEUED command protocol
DFPDMAQ2: ClearInterfaceBsy
"Transmit Register Device to Host FIS with the BSY bit cleared to zero
and the DRQ bit cleared to zero and Interrupt bit cleared to zero to
mark interface ready for the next command."

PxCI is currently cleared by handle_cmd(), but we don't write the D2H
FIS to the FIS Receive Area that actually caused PxCI to be cleared.

Similar to how ahci_pio_transfer() calls ahci_write_fis_pio() with an
additional parameter to write a PIO Setup FIS without raising an IRQ,
add a parameter to ahci_write_fis_d2h() so that ahci_write_fis_d2h()
also can write the FIS to the FIS Receive Area without raising an IRQ.

Change process_ncq_command() to call ahci_write_fis_d2h() without
raising an IRQ (similar to ahci_pio_transfer()), such that the FIS
Receive Area is in sync with the PxTFD shadow register.

E.g. Linux reads status and error fields from the FIS Receive Area
directly, so it is wise to keep the FIS Receive Area and the PxTFD
shadow register in sync.

Signed-off-by: Niklas Cassel <[email protected]>
Message-id: [email protected]
Signed-off-by: John Snow <[email protected]>
(cherry picked from commit 2967dc8209dd27b61a6ab7bad78cf7c6ec58ddb4)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ide/ahci: simplify and document PxCI handling

The AHCI spec states that:
For NCQ, PxCI is cleared on command queued successfully.

For non-NCQ, PxCI is cleared on command completed successfully.
(A non-NCQ command that completes with error does not clear PxCI.)

The current QEMU implementation either clears PxCI in check_cmd(),
or in ahci_cmd_done().

check_cmd() will clear PxCI for a command if handle_cmd() returns 0.
handle_cmd() will return -1 if BUSY or DRQ is set.

The QEMU implementation for NCQ commands will currently not set BUSY
or DRQ, so they will always have PxCI cleared by handle_cmd().
ahci_cmd_done() will never even get called for NCQ commands.

Non-NCQ commands are executed by ide_bus_exec_cmd().
Non-NCQ commands in QEMU are implemented either in a sync or in an async
way.

For non-NCQ commands implemented in a sync way, the command handler will
return true, and when ide_bus_exec_cmd() sees that a command handler
returns true, it will call ide_cmd_done() (which will call
ahci_cmd_done()). For a command implemented in a sync way,
ahci_cmd_done() will do nothing (since busy_slot is not set). Instead,
after ide_bus_exec_cmd() has finished, check_cmd() will clear PxCI for
these commands.

For non-NCQ commands implemented in an async way (using either aiocb or
pio_aiocb), the command handler will return false, ide_bus_exec_cmd()
will not call ide_cmd_done(), instead it is expected that the async
callback function will call ide_cmd_done() once the async command is
done. handle_cmd() will set busy_slot, if and only if BUSY or DRQ is
set, and this is checked _after_ ide_bus_exec_cmd() has returned.
handle_cmd() will return -1, so check_cmd() will not clear PxCI.
When the async callback calls ide_cmd_done() (which will call
ahci_cmd_done()), it will see that busy_slot is set, and
ahci_cmd_done() will clear PxCI.

This seems racy, since busy_slot is set _after_ ide_bus_exec_cmd() has
returned. The callback might come before busy_slot gets set. And it is
quite confusing that ahci_cmd_done() will be called for all non-NCQ
commands when the command is done, but will only clear PxCI in certain
cases, even though it will always write a D2H FIS and raise an IRQ.

Even worse, in the case where ahci_cmd_done() does not clear PxCI, it
still raises an IRQ. Host software might thus read an old PxCI value,
since PxCI is cleared (by check_cmd()) after the IRQ has been raised.

Try to simplify this by always setting busy_slot for non-NCQ commands,
such that ahci_cmd_done() will always be responsible for clearing PxCI
for non-NCQ commands.

For NCQ commands, clear PxCI when we receive the D2H FIS, but before
raising the IRQ, see AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and
RegFIS:ClearCI.

Signed-off-by: Niklas Cassel <[email protected]>
Message-id: [email protected]
Signed-off-by: John Snow <[email protected]>
(cherry picked from commit e2a5d9b3d9c3d311618160603cc9bc04fbd98796)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared

According to AHCI 1.3.1 definition of PxSACT:
This field is cleared when PxCMD.ST is written from a '1' to a '0' by
software. This field is not cleared by a COMRESET or a software reset.

According to AHCI 1.3.1 definition of PxCI:
This field is also cleared when PxCMD.ST is written from a '1' to a '0'
by software.

Clearing PxCMD.ST is part of the error recovery procedure, see
AHCI 1.3.1, section "6.2 Error Recovery".

If we don't clear PxCI on error recovery, the previous command will
incorrectly still be marked as pending after error recovery.

Signed-off-by: Niklas Cassel <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: [email protected]
Signed-off-by: John Snow <[email protected]>
(cherry picked from commit d73b84d0b664e60fffb66f46e84d0db4a8e1c713)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set

For NCQ, PxCI is cleared on command queued successfully.
For non-NCQ, PxCI is cleared on command completed successfully.
Successfully means ERR_STAT, BUSY and DRQ are all cleared.

A command that has ERR_STAT set, does not get to clear PxCI.
See AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and RegFIS:ClearCI,
and 5.3.16.5 ERR:FatalTaskfile.

In the case of non-NCQ commands, not clearing PxCI is needed in order
for host software to be able to see which command slot that failed.

Signed-off-by: Niklas Cassel <[email protected]>
Message-id: [email protected]
Signed-off-by: John Snow <[email protected]>
(cherry picked from commit 1a16ce64fda11bdf50f0c4ab5d9fdde72c1383a2)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ide/ahci: fix ahci_write_fis_sdb()

When there is an error, we need to raise a TFES error irq, see AHCI 1.3.1,
5.3.13.1 SDB:Entry.

If ERR_STAT is set, we jump to state ERR:FatalTaskfile, which will raise
a TFES IRQ unconditionally, regardless if the I bit is set in the FIS or
not.

Thus, we should never raise a normal IRQ after having sent an error IRQ.

It is valid to signal successfully completed commands as finished in the
same SDB FIS that generates the error IRQ. The important thing is that
commands that did not complete successfully (e.g. commands that were
aborted, do not get the finished bit set).

Before this commit, there was never a TFES IRQ raised on NCQ error.

Signed-off-by: Niklas Cassel <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: [email protected]
Signed-off-by: John Snow <[email protected]>
(cherry picked from commit 7e85cb0db4c693b4e084a00e66fe73a22ed1688a)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ide/ahci: fix broken SError handling

When encountering an NCQ error, you should not write the NCQ tag to the
SError register. This is completely wrong.

The SError register has a clear definition, where each bit represents a
different error, see PxSERR definition in AHCI 1.3.1.

If we write a random value (like the NCQ tag) in SError, e.g. Linux will
read SError, and will trigger arbitrary error handling depending on the
NCQ tag that happened to be executing.

In case of success, ncq_cb() will call ncq_finish().
In case of error, ncq_cb() will call ncq_err() (which will clear
ncq_tfs->used), and then call ncq_finish(), thus using ncq_tfs->used is
sufficient to tell if finished should get set or not.

Signed-off-by: Niklas Cassel <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: [email protected]
Signed-off-by: John Snow <[email protected]>
(cherry picked from commit 9f89423537653de07ca40c18b5ff5b70b104cc93)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/i2c/aspeed: Fix Tx count and Rx size error in buffer pool mode

Fixed inconsistency between the regisiter bit field definition header file
and the ast2600 datasheet. The reg name is I2CD1C:Pool Buffer Control
Register in old register mode and  I2CC0C: Master/Slave Pool Buffer Control
Register in new register mode. They share bit field
[12:8]:Transmit Data Byte Count and bit field
[29:24]:Actual Received Pool Buffer Size according to the datasheet.
According to the ast2600 datasheet,the actual Tx count is
Transmit Data Byte Count plus 1, and the max Rx size is
Receive Pool Buffer Size plus 1, both in Pool Buffer Control Register.
The version before forgot to plus 1, and mistake Rx count for Rx size.

Signed-off-by: Hang Yu <[email protected]>
Fixes: 3be3d6ccf2ad ("aspeed: i2c: Migrate to registerfields API")
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit 97b8aa5ae9ff197394395eda5062ea3681e09c28)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/i2c/aspeed: Fix TXBUF transmission start position error

According to the ast2600 datasheet and the linux aspeed i2c driver,
the TXBUF transmission start position should be TXBUF[0] instead
of TXBUF[1],so the arg pool_start is useless,and the address is not
included in TXBUF.So even if Tx Count equals zero,there is at least
1 byte data needs to be transmitted,and M_TX_CMD should not be cleared
at this condition.The driver url is:
https://github.com/AspeedTech-BMC/linux/blob/aspeed-master-v5.15/drivers/i2c/busses/i2c-ast2600.c

Signed-off-by: Hang Yu <[email protected]>
Fixes: 6054fc73e8f4 ("aspeed/i2c: Add support for pool buffer transfers")
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit 961faf3ddbd8ffcdf776bbcf88af0bc97218114a)
Signed-off-by: Michael Tokarev <[email protected]>

* qemu-options.hx: Rephrase the descriptions of the -hd* and -cdrom options

The current description says that these options will create a device
on the IDE bus, which is only true on x86. So rephrase these sentences
a little bit to speak of "default bus" instead.

Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>
(cherry picked from commit bcd8e243083c878884e52d609deddbe6be17c730)
Signed-off-by: Michael Tokarev <[email protected]>

* docs tests: Fix use of migrate_set_parameter

docs/multi-thread-compression.txt uses parameter names with
underscores instead of dashes.  Wrong since day one.

docs/rdma.txt, tests/qemu-iotests/181, and tests/qtest/test-hmp.c are
wrong the same way since commit cbde7be900d2 (v6.0.0).  Hard to see,
as test-hmp doesn't check whether the commands work, and iotest 181
appears to be unaffected.

Fixes: 263170e679df (docs: Add a doc about multiple thread compression)
Fixes: cbde7be900d2 (migrate: remove QMP/HMP commands for speed, downtime and cache size)
Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>
(cherry picked from commit b21a6e31a182a5ae7436a444f840d49aac07c94f)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/net/vmxnet3: Fix guest-triggerable assert()

The assert() that checks for valid MTU sizes can be triggered by
the guest (e.g. with the reproducer code from the bug ticket
https://gitlab.com/qemu-project/qemu/-/issues/517 ). Let's avoid
this problem by simply logging the error and refusing to activate
the device instead.

Fixes: d05dcd94ae ("net: vmxnet3: validate configuration values during activate")
Signed-off-by: Thomas Huth <[email protected]>
Cc: [email protected]
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>
[Mjt: change format specifier from %d to %u for uint32_t argument]
(cherry picked from commit 90a0778421acdf4ca903be64c8ed19378183c944)
Signed-off-by: Michael Tokarev <[email protected]>

* qxl: don't assert() if device isn't yet initialized

If the PCI BAR isn't yet mapped or was unmapped, QXL_IO_SET_MODE will
assert(). Instead, report a guest bug and keep going.

This can be reproduced with:

cat << EOF | ./qemu-system-x86_64 -vga qxl -m 2048 -nodefaults -qtest stdio
outl 0xcf8 0x8000101c
outl 0xcfc 0xc000
outl 0xcf8 0x80001001
outl 0xcfc 0x01000000
outl 0xc006 0x00
EOF

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1829

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Cc: [email protected]
Signed-off-by: Michael Tokarev <[email protected]>
(cherry picked from commit 95bef686e490bc3afc3f51f5fc6e20bf260b938c)
Signed-off-by: Michael Tokarev <[email protected]>

* virtio: Drop out of coroutine context in virtio_load()

virtio_load() as a whole should run in coroutine context because it
reads from the migration stream and we don't want this to block.

However, it calls virtio_set_features_nocheck() and devices don't
expect their .set_features callback to run in a coroutine and therefore
call functions that may not be called in coroutine context. To fix this,
drop out of coroutine context for calling virtio_set_features_nocheck().

Without this fix, the following crash was reported:

  #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
  #1  0x00007efc738c05d3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
  #2  0x00007efc73873d26 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
  #3  0x00007efc738477f3 in __GI_abort () at abort.c:79
  #4  0x00007efc7384771b in __assert_fail_base (fmt=0x7efc739dbcb8 "", assertion=assertion@entry=0x560aebfbf5cf "!qemu_in_coroutine()",
     file=file@entry=0x560aebfcd2d4 "../block/graph-lock.c", line=line@entry=275, function=function@entry=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:92
  #5  0x00007efc7386ccc6 in __assert_fail (assertion=0x560aebfbf5cf "!qemu_in_coroutine()", file=0x560aebfcd2d4 "../block/graph-lock.c", line=275,
     function=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:101
  #6  0x0000560aebcd8dd6 in bdrv_register_buf ()
  #7  0x0000560aeb97ed97 in ram_block_added.llvm ()
  #8  0x0000560aebb8303f in ram_block_add.llvm ()
  #9  0x0000560aebb834fa in qemu_ram_alloc_internal.llvm ()
  #10 0x0000560aebb2ac98 in vfio_region_mmap ()
  #11 0x0000560aebb3ea0f in vfio_bars_register ()
  #12 0x0000560aebb3c628 in vfio_realize ()
  #13 0x0000560aeb90f0c2 in pci_qdev_realize ()
  #14 0x0000560aebc40305 in device_set_realized ()
  #15 0x0000560aebc48e07 in property_set_bool.llvm ()
  #16 0x0000560aebc46582 in object_property_set ()
  #17 0x0000560aebc4cd58 in object_property_set_qobject ()
  #18 0x0000560aebc46ba7 in object_property_set_bool ()
  #19 0x0000560aeb98b3ca in qdev_device_add_from_qdict ()
  #20 0x0000560aebb1fbaf in virtio_net_set_features ()
  #21 0x0000560aebb46b51 in virtio_set_features_nocheck ()
  #22 0x0000560aebb47107 in virtio_load ()
  #23 0x0000560aeb9ae7ce in vmstate_load_state ()
  #24 0x0000560aeb9d2ee9 in qemu_loadvm_state_main ()
  #25 0x0000560aeb9d45e1 in qemu_loadvm_state ()
  #26 0x0000560aeb9bc32c in process_incoming_migration_co.llvm ()
  #27 0x0000560aebeace56 in coroutine_trampoline.llvm ()

Cc: [email protected]
Buglink: https://issues.redhat.com/browse/RHEL-832
Signed-off-by: Kevin Wolf <[email protected]>
Message-ID: <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
(cherry picked from commit 92e2e6a867334a990f8d29f07ca34e3162fdd6ec)
Signed-off-by: Michael Tokarev <[email protected]>

* arm64: Restore trapless ptimer access

Due to recent KVM changes, QEMU is setting a ptimer offset resulting
in unintended trap and emulate access and a consequent performance
hit. Filter out the PTIMER_CNT register to restore trapless ptimer
access.

Quoting Andrew Jones:

Simply reading the CNT register and writing back the same value is
enough to set an offset, since the timer will have certainly moved
past whatever value was read by the time it's written.  QEMU
frequently saves and restores all registers in the get-reg-list array,
unless they've been explicitly filtered out (with Linux commit
680232a94c12, KVM_REG_ARM_PTIMER_CNT is now in the array). So, to
restore trapless ptimer accesses, we need a QEMU patch to filter out
the register.

See
https://lore.kernel.org/kvmarm/[email protected]/T/#m0770023762a821db2a3f0dd0a7dc6aa54e0d0da9
for additional context.

Cc: [email protected]
Signed-off-by: Andrew Jones <[email protected]>
Signed-off-by: Colton Lewis <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Tested-by: Colton Lewis <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
(cherry picked from commit 682814e2a3c883b27f24b9e7cab47313c49acbd4)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/char/riscv_htif: Fix printing of console characters on big endian hosts

The character that should be printed is stored in the 64 bit "payload"
variable. The code currently tries to print it by taking the address
of the variable and passing this pointer to qemu_chr_fe_write(). However,
this only works on little endian hosts where the least significant bits
are stored on the lowest address. To do this in a portable way, we have
to store the value in an uint8_t variable instead.

Fixes: 5033606780 ("RISC-V HTIF Console")
Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Reviewed-by: Bin Meng <[email protected]>
Reviewed-by: Daniel Henrique Barboza <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Alistair Francis <[email protected]>
(cherry picked from commit c255946e3df4d9660e4f468a456633c24393d468)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/char/riscv_htif: Fix the console syscall on big endian hosts

Values that have been read via cpu_physical_memory_read() from the
guest's memory have to be swapped in case the host endianess differs
from the guest.

Fixes: a6e13e31d5 ("riscv_htif: Support console output via proxy syscall")
Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Reviewed-by: Bin Meng <[email protected]>
Reviewed-by: Daniel Henrique Barboza <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Alistair Francis <[email protected]>
(cherry picked from commit 058096f1c55ab688db7e1d6814aaefc1bcd87f7a)
Signed-off-by: Michael Tokarev <[email protected]>

* target/riscv/cpu.c: add zmmul isa string

zmmul was promoted from experimental to ratified in commit 6d00ffad4e95.
Add a riscv,isa string for it.

Fixes: 6d00ffad4e95 ("target/riscv: move zmmul out of the experimental properties")
Signed-off-by: Daniel Henrique Barboza <[email protected]>
Reviewed-by: Weiwei Li <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Alistair Francis <[email protected]>
(cherry picked from commit 50f9464962fb41f04fd5f42e7ee2cb60942aba89)
Signed-off-by: Michael Tokarev <[email protected]>

* target/riscv: Fix page_check_range use in fault-only-first

Commit bef6f008b98(accel/tcg: Return bool from page_check_range) converts
integer return value to bool type. However, it wrongly converted the use
of the API in riscv fault-only-first, where page_check_range < = 0, should
be converted to !page_check_range.

Signed-off-by: LIU Zhiwei <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alistair Francis <[email protected]>
(cherry picked from commit 4cc9f284d5971ecd8055d26ef74c23ef0be8b8f5)
Signed-off-by: Michael Tokarev <[email protected]>

* target/riscv: Fix zfa fleq.d and fltq.d

Commit a47842d ("riscv: Add support for the Zfa extension") implemented the zfa extension.
However, it has some typos for fleq.d and fltq.d. Both of them misused the fltq.s
helper function.

Fixes: a47842d ("riscv: Add support for the Zfa extension")
Signed-off-by: LIU Zhiwei <[email protected]>
Reviewed-by: Daniel Henrique Barboza <[email protected]>
Reviewed-by: Weiwei Li <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alistair Francis <[email protected]>
(cherry picked from commit eda633a534f8af4abe3a88731bba6dacdb973993)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/intc: Fix upper/lower mtime write calculation

When writing the upper mtime, we should keep the original lower mtime
whose value is given by cpu_riscv_read_rtc() instead of
cpu_riscv_read_rtc_raw(). The same logic applies to writes to lower mtime.

Signed-off-by: Jason Chien <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alistair Francis <[email protected]>
(cherry picked from commit e0922b73baf00c4c19d4ad30d09bb94f7ffea0f4)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/intc: Make rtc variable names consistent

The variables whose values are given by cpu_riscv_read_rtc() should be named
"rtc". The variables whose value are given by cpu_riscv_read_rtc_raw()
should be named "rtc_r".

Signed-off-by: Jason Chien <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alistair Francis <[email protected]>
(cherry picked from commit 9382a9eafccad8dc6a487ea3a8d2bed03dc35db9)
Signed-off-by: Michael Tokarev <[email protected]>

* linux-user/riscv: Use abi type for target_ucontext

We should not use types dependend on host arch for target_ucontext.
This bug is found when run rv32 applications.

Signed-off-by: LIU Zhiwei <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Daniel Henrique Barboza <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alistair Francis <[email protected]>
(cherry picked from commit ae7d4d625cab49657b9fc2be09d895afb9bcdaf0)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/riscv: virt: Fix riscv,pmu DT node path

On a dtb dumped from the virt machine, dt-validate complains:
soc: pmu: {'riscv,event-to-mhpmcounters': [[1, 1, 524281], [2, 2, 524284], [65561, 65561, 524280], [65563, 65563, 524280], [65569, 65569, 524280]], 'compatible': ['riscv,pmu']} should not be valid under {'type': 'object'}
        from schema $id: http://devicetree.org/schemas/simple-bus.yaml#
That's pretty cryptic, but running the dtb back through dtc produces
something a lot more reasonable:
Warning (simple_bus_reg): /soc/pmu: missing or empty reg/ranges property

Moving the riscv,pmu node out of the soc bus solves the problem.

Signed-off-by: Conor Dooley <[email protected]>
Acked-by: Alistair Francis <[email protected]>
Reviewed-by: Daniel Henrique Barboza <[email protected]>
Message-ID: <20230727-groom-decline-2c57ce42841c@spud>
Signed-off-by: Alistair Francis <[email protected]>
(cherry picked from commit 9ff31406312500053ecb5f92df01dd9ce52e635d)
Signed-off-by: Michael Tokarev <[email protected]>

* target/riscv: fix satp_mode_finalize() when satp_mode.supported = 0

In the same emulated RISC-V host, the 'host' KVM CPU takes 4 times
longer to boot than the 'rv64' KVM CPU.

The reason is an unintended behavior of riscv_cpu_satp_mode_finalize()
when satp_mode.supported = 0, i.e. when cpu_init() does not set
satp_mode_max_supported(). satp_mode_max_from_map(map) does:

31 - __builtin_clz(map)

This means that, if satp_mode.supported = 0, satp_mode_supported_max
wil be '31 - 32'. But this is C, so satp_mode_supported_max will gladly
set it to UINT_MAX (4294967295). After that, if the user didn't set a
satp_mode, set_satp_mode_default_map(cpu) will make

cfg.satp_mode.map = cfg.satp_mode.supported

So satp_mode.map = 0. And then satp_mode_map_max will be set to
satp_mode_max_from_map(cpu->cfg.satp_mode.map), i.e. also UINT_MAX. The
guard "satp_mode_map_max > satp_mode_supported_max" doesn't protect us
here since both are UINT_MAX.

And finally we have 2 loops:

        for (int i = satp_mode_map_max - 1; i >= 0; --i) {

Which are, in fact, 2 loops from UINT_MAX -1 to -1. This is where the
extra delay when booting the 'host' CPU is coming from.

Commit 43d1de32f8 already set a precedence for satp_mode.supported = 0
in a different manner. We're doing the same here. If supported == 0,
interpret as 'the CPU wants the OS to handle satp mode alone' and skip
satp_mode_finalize().

We'll also put a guard in satp_mode_max_from_map() to assert out if map
is 0 since the function is not ready to deal with it.

Cc: Alexandre Ghiti <[email protected]>
Fixes: 6f23aaeb9b ("riscv: Allow user to set the satp mode")
Signed-off-by: Daniel Henrique Barboza <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alistair Francis <[email protected]>
(cherry picked from commit 3a2fc23563885c219c73c8f24318921daf02f3f2)
Signed-off-by: Michael Tokarev <[email protected]>

* target/riscv/pmp.c: respect mseccfg.RLB for pmpaddrX changes

When the rule-lock bypass (RLB) bit is set in the mseccfg CSR, the PMP
configuration lock bits must not apply. While this behavior is
implemented for the pmpcfgX CSRs, this bit is not respected for
changes to the pmpaddrX CSRs. This patch ensures that pmpaddrX CSR
writes work even on locked regions when the global rule-lock bypass is
enabled.

Signed-off-by: Leon Schuermann <[email protected]>
Reviewed-by: Mayuresh Chitale <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alistair Francis <[email protected]>
(cherry picked from commit 4e3adce1244e1ca30ec05874c3eca14911dc0825)
Signed-off-by: Michael Tokarev <[email protected]>

* target/riscv: Allocate itrigger timers only once

riscv_trigger_init() had been called on reset events that can happen
several times for a CPU and it allocated timers for itrigger. If old
timers were present, they were simply overwritten by the new timers,
resulting in a memory leak.

Divide riscv_trigger_init() into two functions, namely
riscv_trigger_realize() and riscv_trigger_reset() and call them in
appropriate timing. The timer allocation will happen only once for a
CPU in riscv_trigger_realize().

Fixes: 5a4ae64cac ("target/riscv: Add itrigger support when icount is enabled")
Signed-off-by: Akihiko Odaki <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: LIU Zhiwei <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alistair Francis <[email protected]>
(cherry picked from commit a7c272df82af11c568ea83921b04334791dccd5e)
Signed-off-by: Michael Tokarev <[email protected]>

* virtio-gpu/win32: set the destroy function on load

Don't forget to unmap the resource memory.

Fixes: commit 9462ff469 ("virtio-gpu/win32: allocate shareable 2d resources/images")

Signed-off-by: Marc-André Lureau <[email protected]>
(cherry picked from commit 04562ee88e99d71f4e6017f64123f726dd8b41e1)
Signed-off-by: Michael Tokarev <[email protected]>

* ui: fix crash when there are no active_console

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555888630 in dpy_ui_info_supported (con=0x0) at ../ui/console.c:812
812	    return con->hw_ops->ui_info != NULL;
(gdb) bt
#0  0x0000555555888630 in dpy_ui_info_supported (con=0x0) at ../ui/console.c:812
#1  0x00005555558a44b1 in protocol_client_msg (vs=0x5555578c76c0, data=0x5555581e93f0 <incomplete sequence \373>, len=24) at ../ui/vnc.c:2585
#2  0x00005555558a19ac in vnc_client_read (vs=0x5555578c76c0) at ../ui/vnc.c:1607
#3  0x00005555558a1ac2 in vnc_client_io (ioc=0x5555581eb0e0, condition=G_IO_IN, opaque=0x5555578c76c0) at ../ui/vnc.c:1635

Fixes:
https://issues.redhat.com/browse/RHEL-2600

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Albert Esteve <[email protected]>
(cherry picked from commit 48a35e12faf90a896c5aa4755812201e00d60316)
Signed-off-by: Michael Tokarev <[email protected]>

* s390x/ap: fix missing subsystem reset registration

A subsystem reset contains a reset of AP resources which has been
missing.  Adding the AP bridge to the list of device types that need
reset fixes this issue.

Reviewed-by: Jason J. Herne <[email protected]>
Reviewed-by: Tony Krowiak <[email protected]>
Signed-off-by: Janosch Frank <[email protected]>
Fixes: a51b3153 ("s390x/ap: base Adjunct Processor (AP) object model")
Message-ID: <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
(cherry picked from commit 297ec01f0b9864ea8209ca0ddc6643b4c0574bdb)
Signed-off-by: Michael Tokarev <[email protected]>

* meson: Fix targetos match for illumos and Solaris.

qemu 8.1.0 breaks on illumos platforms due to _XOPEN_SOURCE and others no longer being set correctly, leading to breakage such as:

  https://us-central.manta.mnx.io/pkgsrc/public/reports/trunk/tools/20230908.1404/qemu-8.1.0/build.log

This is a result of meson conversion which incorrectly matches against 'solaris' instead of 'sunos' for uname.

First time submitting a patch here, hope I did it correctly.  Thanks.

Signed-off-by: Jonathan Perkin <[email protected]>
Message-ID: <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
(cherry picked from commit fb0a8b0e238277296907ffe765bf76874cfc1df6)
Signed-off-by: Michael Tokarev <[email protected]>
(Mjt: omit net/meson.build change before v8.1.0-279-g73258b3864, adjust context befor v8.1.0-288-g2fc36530de)

* tpm: fix crash when FD >= 1024 and unnecessary errors due to EINTR

Replace select() with poll() to fix a crash when QEMU has a large number
of FDs. Also use RETRY_ON_EINTR to avoid unnecessary errors due to EINTR.

Cc: [email protected]
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2020133
Fixes: 56a3c24ffc ("tpm: Probe for connected TPM 1.2 or TPM 2")
Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Michael Tokarev <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>
(cherry picked from commit 8e32ddff69b6b4547cc00592ad816484e160817a)
Signed-off-by: Michael Tokarev <[email protected]>

* Update version for 8.1.1 release

Signed-off-by: Michael Tokarev <[email protected]>

* hw/ppc: Introduce functions for conversion between timebase and nanoseconds

These calculations are repeated several times, and they will become
a little more complicated with subsequent changes.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit 7798f5c576d898e7e10c4a2518f3f16411dedeb9)
Signed-off-by: Michael Tokarev <[email protected]>

* host-utils: Add muldiv64_round_up

This will be used for converting time intervals in different base units
to host units, for the purpose of scheduling timers to emulate target
timers. Timers typically must not fire before their requested expiry
time but may fire some time afterward, so rounding up is the right way
to implement these.

Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
[ clg: renamed __muldiv64() to muldiv64_rounding() ]
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit 47de6c4c287079744ceb96f606b3c0457addf380)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ppc: Round up the decrementer interval when converting to ns

The rule of timers is typically that they should never expire before the
timeout, but some time afterward. Rounding timer intervals up when doing
conversion is the right thing to do.

Under most circumstances it is impossible observe the decrementer
interrupt before the dec register has triggered. However with icount
timing, problems can arise. For example setting DEC to 0 can schedule
the timer for now, causing it to fire before any more instructions
have been executed and DEC is still 0.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit eab0888418ab44344864965193cf6cd194ab6858)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ppc: Avoid decrementer rounding errors

The decrementer register contains a relative time in timebase units.
When writing to DECR this is converted and stored as an absolute value
in nanosecond units, reading DECR converts back to relative timebase.

The tb<->ns conversion of the relative part can cause rounding such that
a value writen to the decrementer can read back a different, with time
held constant. This is a particular problem for a deterministic icount
and record-replay trace.

Fix this by storing the absolute value in timebase units rather than
nanoseconds. The math before:
  store:  decr_next = now_ns + decr * ns_per_sec / tb_per_sec
  load:        decr = (decr_next - now_ns) * tb_per_sec / ns_per_sec
  load(store): decr = decr * ns_per_sec / tb_per_sec * tb_per_sec /
                      ns_per_sec

After:
  store:  decr_next = now_ns * tb_per_sec / ns_per_sec + decr
  load:        decr = decr_next - now_ns * tb_per_sec / ns_per_sec
  load(store): decr = decr

Fixes: 9fddaa0c0cab ("PowerPC merge: real time TB and decrementer - faster and simpler exception handling (Jocelyn Mayer)")
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit 8e0a5ac87800ccc6dd5013f89f27652f4480ab33)
Signed-off-by: Michael Tokarev <[email protected]>

* target/ppc: Sign-extend large decrementer to 64-bits

When storing a large decrementer value with the most significant
implemented bit set, it is to be treated as a negative and sign
extended.

This isn't hit for book3s DEC because of another bug, fixing it
in the next patch exposes this one and can cause additional
problems, so fix this first. It can be hit with HDECR and other
edge triggered types.

Fixes: a8dafa52518 ("target/ppc: Implement large decrementer support for TCG")
Signed-off-by: Nicholas Piggin <[email protected]>
[ clg: removed extra cpu and pcc variables shadowing local variables ]
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit c8fbc6b9f2f3c732ee3307093c1c5c367eaa64ae)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ppc: Always store the decrementer value

When writing a value to the decrementer that raises an exception, the
irq is raised, but the value is not stored so the store doesn't appear
to have changed the register when it is read again.

Always store the write value to the register.

Fixes: e81a982aa53 ("PPC: Clean up DECR implementation")
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit febb71d543a8f747b2f8aaf0182d0a385c6a02c3)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ppc: Reset timebase facilities on machine reset

Lower interrupts, delete timers, and set time facility registers
back to initial state on machine reset.

This is not so important for record-replay since timebase and
decrementer are migrated, but it gives a cleaner reset state.

Cc: Mark Cave-Ayland <[email protected]>
Cc: BALATON Zoltan <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
[ clg: checkpatch.pl fixes ]
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit 30d0647bcfa99d4a141eaa843a9fb5b091ddbb76)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/ppc: Read time only once to perform decrementer write

Reading the time more than once to perform an operation always increases
complexity and fragility due to introduced deltas. Simplify the
decrementer write by reading the clock once for the operation.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
(cherry picked from commit ea62f8a5172cf5fcd97df143b758730f6865a625)
Signed-off-by: Michael Tokarev <[email protected]>

* linux-user/hppa: clear the PSW 'N' bit when delivering signals

qemu-hppa may crash when delivering a signal. It can be demonstrated with
this program. Compile the program with "hppa-linux-gnu-gcc -O2 signal.c"
and run it with "qemu-hppa -one-insn-per-tb a.out". It reports that the
address of the flag is 0xb4 and it crashes when attempting to touch it.

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <signal.h>

sig_atomic_t flag;

void sig(int n)
{
	printf("&flag: %p\n", &flag);
	flag = 1;
}

int main(void)
{
	struct sigaction sa;
	struct itimerval it;

	sa.sa_handler = sig;
	sigemptyset(&sa.sa_mask);
	sa.sa_flags = SA_RESTART;
	if (sigaction(SIGALRM, &sa, NULL)) perror("sigaction"), exit(1);

	it.it_interval.tv_sec = 0;
	it.it_interval.tv_usec = 100;
	it.it_value.tv_sec = it.it_interval.tv_sec;
	it.it_value.tv_usec = it.it_interval.tv_usec;

	if (setitimer(ITIMER_REAL, &it, NULL)) perror("setitimer"), exit(1);

	while (1) {
	}
}

The reason for the crash is that the signal handling routine doesn't clear
the 'N' flag in the PSW. If the signal interrupts a thread when the 'N'
flag is set, the flag remains set at the beginning of the signal handler
and the first instruction of the signal handler is skipped.

Signed-off-by: Mikulas Patocka <[email protected]>
Acked-by: Helge Deller <[email protected]>
Cc: [email protected]
Signed-off-by: Helge Deller <[email protected]>
(cherry picked from commit 2529497cb6b298e732e8dbe5212da7925240b4f4)
Signed-off-by: Michael Tokarev <[email protected]>

* linux-user/hppa: lock both words of function descriptor

The code in setup_rt_frame reads two words at haddr, but locks only one.
This patch fixes it to lock both.

Signed-off-by: Mikulas Patocka <[email protected]>
Acked-by: Helge Deller <[email protected]>
Cc: [email protected]
Signed-off-by: Helge Deller <[email protected]>
(cherry picked from commit 5b1270ef1477bb7f240c3bfe2cd8b0fe4721fd51)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/cxl: Fix CFMW config memory leak

Allocate targets and targets[n] resources when all sanity checks are
passed to avoid memory leaks.

Cc: [email protected]
Suggested-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Li Zhijian <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Reviewed-by: Fan Ni <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>
(cherry picked from commit 7b165fa164022b756c2b001d0a1525f98199d3ac)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/cxl: Fix out of bound array access

According to cxl_interleave_ways_enc(), fw->num_targets is allowed to be up
to 16. This also corresponds to CXL r3.0 spec. So, the fw->target_hbs[]
array is iterated from 0 to 15. But it is statically declared of length 8.
Thus, out of bound array access may occur.

Fixes: c28db9e000 ("hw/pci-bridge: Make PCIe and CXL PXB Devices inherit from TYPE_PXB_DEV")
Signed-off-by: Dmitry Frolov <[email protected]>
Reviewed-by: Michael Tokarev <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Jonathan Cameron <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>
(cherry picked from commit de5bbfc602ef1b9b79c494a914c6083a1a23cca2)
Signed-off-by: Michael Tokarev <[email protected]>

* file-posix: Clear bs->bl.zoned on error

bs->bl.zoned is what indicates whether the zone information is present
and valid; it is the only thing that raw_refresh_zoned_limits() sets if
CONFIG_BLKZONED is not defined, and it is also the only thing that it
sets if CONFIG_BLKZONED is defined, but there are no zones.

Make sure that it is always set to BLK_Z_NONE if there is an error
anywhere in raw_refresh_zoned_limits() so that we do not accidentally
announce zones while our information is incomplete or invalid.

This also fixes a memory leak in the last error path in
raw_refresh_zoned_limits().

Signed-off-by: Hanna Czenczek <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Sam Li <[email protected]>
(cherry picked from commit 56d1a022a77ea2125564913665eeadf3e303a671)
Signed-off-by: Michael Tokarev <[email protected]>

* file-posix: Check bs->bl.zoned for zone info

Instead of checking bs->wps or bs->bl.zone_size for whether zone
information is present, check bs->bl.zoned.  That is the flag that
raw_refresh_zoned_limits() reliably sets to indicate zone support.  If
it is set to something other than BLK_Z_NONE, other values and objects
like bs->wps and bs->bl.zone_size must be non-null/zero and valid; if it
is not, we cannot rely on their validity.

Signed-off-by: Hanna Czenczek <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Sam Li <[email protected]>
(cherry picked from commit 4b5d80f3d02096a9bb1f651f6b3401ba40877159)
Signed-off-by: Michael Tokarev <[email protected]>

* file-posix: Fix zone update in I/O error path

We must check that zone information is present before running
update_zones_wp().

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2234374
Fixes: Coverity CID 1512459
Signed-off-by: Hanna Czenczek <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Sam Li <[email protected]>
(cherry picked from commit deab5c9a4ed74f76a713008a42527762b30a7e84)
Signed-off-by: Michael Tokarev <[email protected]>

* file-posix: Simplify raw_co_prw's 'out' zone code

We duplicate the same condition three times here, pull it out to the top
level.

Signed-off-by: Hanna Czenczek <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Sam Li <[email protected]>
(cherry picked from commit d31b50a15dd25a560749b25fc40b6484fd1a57b7)
Signed-off-by: Michael Tokarev <[email protected]>

* tests/file-io-error: New test

This is a regression test for
https://bugzilla.redhat.com/show_bug.cgi?id=2234374.

All this test needs to do is trigger an I/O error inside of file-posix
(specifically raw_co_prw()).  One reliable way to do this without
requiring special privileges is to use a FUSE export, which allows us to
inject any error that we want, e.g. via blkdebug.

Signed-off-by: Hanna Czenczek <[email protected]>
Message-Id: <[email protected]>
[hreitz: Fixed test to be skipped when there is no FUSE support, to
         suppress fusermount's allow_other warning, and to be skipped
         with $IMGOPTSSYNTAX enabled]
Signed-off-by: Hanna Czenczek <[email protected]>
(cherry picked from commit 380448464dd89291cf7fd7434be6c225482a334d)
Signed-off-by: Michael Tokarev <[email protected]>

* include/exec: Widen tlb_hit/tlb_hit_page()

tlb_addr is changed from target_ulong to uint64_t to match the type of
a CPUTLBEntry value, and the addressed is changed to vaddr.

Signed-off-by: Anton Johansson <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
(cherry picked from commit c78edb563942ce80c9c6c03b07397725b006b625)
Signed-off-by: Michael Tokarev <[email protected]>

* hw/arm/boot: Set SCR_EL3.FGTEn when booting kernel

Just like d7ef5e16a17c sets SCR_EL3.HXEn for FEAT_HCX, this commit
handles SCR_EL3.FGTEn for FEAT_FGT:

When we direct boot a kernel on a CPU which emulates EL3, we need to
set up the EL3 system registers as the Linux kernel documentation
specifies:
    https://www.kernel.org/doc/Documentation/arm64/booting.rst

> For CPUs with the Fine Grained Traps (FEAT_FGT) extension present:
> - If EL3 is present and the kernel is entered at EL2:
>   - SCR_EL3.FGTEn (bit 27) must be initialised to 0b1.

Cc: [email protected]
Signed-off-by: Fabian Vogt <[email protected]>
Message-id: [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Sign…
  • Loading branch information
Show file tree
Hide file tree
Showing 133 changed files with 3,043 additions and 570 deletions.
41 changes: 41 additions & 0 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Build target user

on: [pull_request, workflow_dispatch]

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.x
uses: actions/setup-python@v4
with:
python-version: '3.x'
- name: Copy source.list file to include deb-src
run: |
sudo cp /etc/apt/sources.list /etc/apt/sources.list.d/tmp.list
sudo sed -i "s/# deb-src/deb-src/g" /etc/apt/sources.list.d/tmp.list
- name: Install deps
run: |
sudo apt-get update
sudo apt-get --no-install-recommends -y build-dep qemu
sudo apt-get install -y autoconf libtool protobuf-c-compiler
pip3 install --user ninja
- name: Install OCaml
uses: ocaml/setup-ocaml@v2
with:
ocaml-compiler: 4.14.x
dune-cache: true
opam-disable-sandboxing: true
- name: Install piqi
run: |
opam install piqi
- name: Clone qemu and bap-frames
run: |
git clone --depth 1 http://github.com/BinaryAnalysisPlatform/bap-frames.git
git clone --depth 1 http://github.com/BinaryAnalysisPlatform/qemu.git
- name: Build without tracewrap
run: |
cd qemu
./configure --prefix=$HOME --target-list=arm-linux-user
ninja -C build
98 changes: 98 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Overview

Qemu tracer - a tracer based on [qemu](https://github.com/qemu/qemu)
project. It executes a binary executable and saves trace data using
[Protocol Buffer](https://developers.google.com/protocol-buffers/)
format. The contents of the trace data is defined in
[bap-traces](https://github.com/BinaryAnalysisPlatform/bap-traces)
project.

# Installing released binaries

If you don't want to mess with the source and building, then you can just
dowload a tarball with prebuilt binaries. Look at the latest release and
it might happen, that we have built binaries for your linux distribution,
if it is not the case, then create an issue, and we will build it for you.

Let's pretend, that you're using Ubuntu Trusty, and install it. First
download it with your favorite downloader:

```
wget https://github.com/BinaryAnalysisPlatform/qemu/releases/download/v2.0.0-tracewrap-2.0.0-rc1/qemu-tracewrap-ubuntu-14.04.4-LTS.tgz
```

Install it in the specified prefix with a command like `tar -C <prefix> -xf qemu-tracewrap-ubuntu-14.04.4-LTS.tgz`, e.g.,
to install in your home directory:
```
tar -C $HOME -xf qemu-tracewrap-ubuntu-14.04.4-LTS.tgz
```



# Build

## Preparation

Note: the instructions assume that you're using Ubuntu, but it
may work on other systems, that uses apt-get.

Before building the qemu-tracewrap, you need to install the following packages:
* qemu build dependencies
* autoconf, libtool, protobuf-c-compiler
* [piqi library](http://piqi.org/doc/ocaml)

To install qemu build dependencies, use the following command

```bash
$ sudo apt-get --no-install-recommends -y build-dep qemu
```

To install autoconf, libtool, protobuf-c-compiler, use the
following command

```bash
$ sudo apt-get install autoconf libtool protobuf-c-compiler
```

To install [piqi library](http://piqi.org/doc/ocaml) with
[opam](https://opam.ocaml.org/doc/Install.html), use the following command
```bash
$ opam install piqi
```

## Building

Download [bap-frames](https://github.com/BinaryAnalysisPlatform/bap-frames) with
following command

```bash
$ git clone https://github.com/BinaryAnalysisPlatform/bap-frames.git
```

Download qemu tracer with following command

```bash
$ git clone [email protected]:BinaryAnalysisPlatform/qemu.git
```

Change folder to qemu and build tracer:
```bash
$ cd qemu
$ ./configure --prefix=$HOME --with-tracewrap=<absolute-path-to>/bap-frames --target-list=<ARCH>-linux-user
$ ninja -C build
$ ninja -C build install
```

# Usage

To run executable `exec` compiled for `arch`, use `qemu-arch exec` command, e.g.,
`qemu-x86_64 /bin/ls`. It will dump the trace into `ls.frames` file. You can configure
the filename with `-tracefile` option, e.g., `qemu-arm -tracefile arm.ls.frames ls`


Hints: use option -L to set the elf interpreter prefix to 'path'. Use
[fetchlibs.sh](https://raw.githubusercontent.com/BinaryAnalysisPlatform/bap-frames/master/test/fetchlibs.sh)
to download arm and x86 libraries.

# Notes
Only ARM, X86, X86-64, MIPS targets are supported in this branch.
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
8.1.0
8.1.2
4 changes: 3 additions & 1 deletion accel/kvm/kvm-all.c
Original file line number Diff line number Diff line change
Expand Up @@ -2458,7 +2458,7 @@ static int kvm_init(MachineState *ms)
KVMState *s;
const KVMCapabilityInfo *missing_cap;
int ret;
int type = 0;
int type;
uint64_t dirty_log_manual_caps;

qemu_mutex_init(&kml_slots_lock);
Expand Down Expand Up @@ -2523,6 +2523,8 @@ static int kvm_init(MachineState *ms)
type = mc->kvm_type(ms, kvm_type);
} else if (mc->kvm_type) {
type = mc->kvm_type(ms, NULL);
} else {
type = kvm_arch_get_default_type(ms);
}

do {
Expand Down
30 changes: 0 additions & 30 deletions accel/tcg/cpu-exec-common.c
Original file line number Diff line number Diff line change
Expand Up @@ -33,36 +33,6 @@ void cpu_loop_exit_noexc(CPUState *cpu)
cpu_loop_exit(cpu);
}

#if defined(CONFIG_SOFTMMU)
void cpu_reloading_memory_map(void)
{
if (qemu_in_vcpu_thread() && current_cpu->running) {
/* The guest can in theory prolong the RCU critical section as long
* as it feels like. The major problem with this is that because it
* can do multiple reconfigurations of the memory map within the
* critical section, we could potentially accumulate an unbounded
* collection of memory data structures awaiting reclamation.
*
* Because the only thing we're currently protecting with RCU is the
* memory data structures, it's sufficient to break the critical section
* in this callback, which we know will get called every time the
* memory map is rearranged.
*
* (If we add anything else in the system that uses RCU to protect
* its data structures, we will need to implement some other mechanism
* to force TCG CPUs to exit the critical section, at which point this
* part of this callback might become unnecessary.)
*
* This pair matches cpu_exec's rcu_read_lock()/rcu_read_unlock(), which
* only protects cpu->as->dispatch. Since we know our caller is about
* to reload it, it's safe to split the critical section.
*/
rcu_read_unlock();
rcu_read_lock();
}
}
#endif

void cpu_loop_exit(CPUState *cpu)
{
/* Undo the setting in cpu_tb_exec. */
Expand Down
4 changes: 3 additions & 1 deletion accel/tcg/cpu-exec.c
Original file line number Diff line number Diff line change
Expand Up @@ -720,7 +720,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
&& cpu_neg(cpu)->icount_decr.u16.low + cpu->icount_extra == 0) {
/* Execute just one insn to trigger exception pending in the log */
cpu->cflags_next_tb = (curr_cflags(cpu) & ~CF_USE_ICOUNT)
| CF_NOIRQ | 1;
| CF_LAST_IO | CF_NOIRQ | 1;
}
#endif
return false;
Expand Down Expand Up @@ -1032,10 +1032,12 @@ cpu_exec_loop(CPUState *cpu, SyncClocks *sc)
last_tb = NULL;
}
#endif
#ifndef HAS_TRACEWRAP
/* See if we can patch the calling TB. */
if (last_tb) {
tb_add_jump(last_tb, tb_exit, tb);
}
#endif

cpu_loop_exec_tb(cpu, tb, pc, &last_tb, &tb_exit);

Expand Down
6 changes: 4 additions & 2 deletions accel/tcg/tb-maint.c
Original file line number Diff line number Diff line change
Expand Up @@ -1083,7 +1083,8 @@ bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc)
if (current_tb_modified) {
/* Force execution of one insn next time. */
CPUState *cpu = current_cpu;
cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(current_cpu);
cpu->cflags_next_tb =
1 | CF_LAST_IO | CF_NOIRQ | curr_cflags(current_cpu);
return true;
}
return false;
Expand Down Expand Up @@ -1153,7 +1154,8 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
if (current_tb_modified) {
page_collection_unlock(pages);
/* Force execution of one insn next time. */
current_cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(current_cpu);
current_cpu->cflags_next_tb =
1 | CF_LAST_IO | CF_NOIRQ | curr_cflags(current_cpu);
mmap_unlock();
cpu_loop_exit_noexc(current_cpu);
}
Expand Down
9 changes: 2 additions & 7 deletions accel/tcg/tcg-accel-ops-mttcg.c
Original file line number Diff line number Diff line change
Expand Up @@ -100,14 +100,9 @@ static void *mttcg_cpu_thread_fn(void *arg)
break;
case EXCP_HALTED:
/*
* during start-up the vCPU is reset and the thread is
* kicked several times. If we don't ensure we go back
* to sleep in the halted state we won't cleanly
* start-up when the vCPU is enabled.
*
* cpu->halted should ensure we sleep in wait_io_event
* Usually cpu->halted is set, but may have already been
* reset by another thread by the time we arrive here.
*/
g_assert(cpu->halted);
break;
case EXCP_ATOMIC:
qemu_mutex_unlock_iothread();
Expand Down
72 changes: 34 additions & 38 deletions accel/tcg/translator.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,26 +16,19 @@
#include "tcg/tcg-op-common.h"
#include "internal.h"

static void gen_io_start(void)
static void set_can_do_io(DisasContextBase *db, bool val)
{
tcg_gen_st_i32(tcg_constant_i32(1), cpu_env,
offsetof(ArchCPU, parent_obj.can_do_io) -
offsetof(ArchCPU, env));
if (db->saved_can_do_io != val) {
db->saved_can_do_io = val;
tcg_gen_st_i32(tcg_constant_i32(val), cpu_env,
offsetof(ArchCPU, parent_obj.can_do_io) -
offsetof(ArchCPU, env));
}
}

bool translator_io_start(DisasContextBase *db)
{
uint32_t cflags = tb_cflags(db->tb);

if (!(cflags & CF_USE_ICOUNT)) {
return false;
}
if (db->num_insns == db->max_insns && (cflags & CF_LAST_IO)) {
/* Already started in translator_loop. */
return true;
}

gen_io_start();
set_can_do_io(db, true);

/*
* Ensure that this instruction will be the last in the TB.
Expand All @@ -47,14 +40,17 @@ bool translator_io_start(DisasContextBase *db)
return true;
}

static TCGOp *gen_tb_start(uint32_t cflags)
static TCGOp *gen_tb_start(DisasContextBase *db, uint32_t cflags)
{
TCGv_i32 count = tcg_temp_new_i32();
TCGv_i32 count = NULL;
TCGOp *icount_start_insn = NULL;

tcg_gen_ld_i32(count, cpu_env,
offsetof(ArchCPU, neg.icount_decr.u32) -
offsetof(ArchCPU, env));
if ((cflags & CF_USE_ICOUNT) || !(cflags & CF_NOIRQ)) {
count = tcg_temp_new_i32();
tcg_gen_ld_i32(count, cpu_env,
offsetof(ArchCPU, neg.icount_decr.u32) -
offsetof(ArchCPU, env));
}

if (cflags & CF_USE_ICOUNT) {
/*
Expand Down Expand Up @@ -84,18 +80,15 @@ static TCGOp *gen_tb_start(uint32_t cflags)
tcg_gen_st16_i32(count, cpu_env,
offsetof(ArchCPU, neg.icount_decr.u16.low) -
offsetof(ArchCPU, env));
/*
* cpu->can_do_io is cleared automatically here at the beginning of
* each translation block. The cost is minimal and only paid for
* -icount, plus it would be very easy to forget doing it in the
* translator. Doing it here means we don't need a gen_io_end() to
* go with gen_io_start().
*/
tcg_gen_st_i32(tcg_constant_i32(0), cpu_env,
offsetof(ArchCPU, parent_obj.can_do_io) -
offsetof(ArchCPU, env));
}

/*
* cpu->can_do_io is set automatically here at the beginning of
* each translation block. The cost is minimal, plus it would be
* very easy to forget doing it in the translator.
*/
set_can_do_io(db, db->max_insns == 1 && (cflags & CF_LAST_IO));

return icount_start_insn;
}

Expand Down Expand Up @@ -144,18 +137,25 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
db->num_insns = 0;
db->max_insns = *max_insns;
db->singlestep_enabled = cflags & CF_SINGLE_STEP;
db->saved_can_do_io = -1;
db->host_addr[0] = host_pc;
db->host_addr[1] = NULL;

ops->init_disas_context(db, cpu);
tcg_debug_assert(db->is_jmp == DISAS_NEXT); /* no early exit */

/* Start translating. */
icount_start_insn = gen_tb_start(cflags);
icount_start_insn = gen_tb_start(db, cflags);
ops->tb_start(db, cpu);
tcg_debug_assert(db->is_jmp == DISAS_NEXT); /* no early exit */

plugin_enabled = plugin_gen_tb_start(cpu, db, cflags & CF_MEMI_ONLY);
if (cflags & CF_MEMI_ONLY) {
/* We should only see CF_MEMI_ONLY for io_recompile. */
assert(cflags & CF_LAST_IO);
plugin_enabled = plugin_gen_tb_start(cpu, db, true);
} else {
plugin_enabled = plugin_gen_tb_start(cpu, db, false);
}

while (true) {
*max_insns = ++db->num_insns;
Expand All @@ -172,13 +172,9 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
the next instruction. */
if (db->num_insns == db->max_insns && (cflags & CF_LAST_IO)) {
/* Accept I/O on the last instruction. */
gen_io_start();
ops->translate_insn(db, cpu);
} else {
/* we should only see CF_MEMI_ONLY for io_recompile */
tcg_debug_assert(!(cflags & CF_MEMI_ONLY));
ops->translate_insn(db, cpu);
set_can_do_io(db, true);
}
ops->translate_insn(db, cpu);

/*
* We can't instrument after instructions that change control
Expand Down
Loading

0 comments on commit 885ca60

Please sign in to comment.