Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] FW reported error: 13 - Requested power transition failed to complete #7969

Closed
keqiaozhang opened this issue Jul 20, 2023 · 18 comments
Closed
Assignees
Labels
bug Something isn't working as expected I2S Applies to I2S bus for codec connection Intel Linux Daily tests This issue can be found in internal Linux daily tests IPC error IPC error is observed MTL Applies to Meteor Lake platform P1 Blocker bugs or important features SDW SoundWire suspend-resume Issues observed when doing system suspend and resume
Milestone

Comments

@keqiaozhang
Copy link
Collaborator

keqiaozhang commented Jul 20, 2023

Describe the bug
Observed this issue in CI daily test, this issue only happens on MTL-NOCODEC so far. The reproduce rate is 100% when doing the suspend/resume test w/ audio. It should be a FW regression. Will do further checks.

[ 5405.125019] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx      : 0x47000000|0x0: MOD_SET_DX [data size: 8]
[ 5405.125608] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx reply: 0x6700000d|0x0: MOD_SET_DX
[ 5405.125616] sof-audio-pci-intel-mtl 0000:00:1f.3: FW reported error: 13 - Requested power transition failed to complete
[ 5405.125798] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc error for msg 0x47000000|0x0
[ 5405.125806] sof-audio-pci-intel-mtl 0000:00:1f.3: ctx_save IPC error: -22, proceeding with suspend

To Reproduce
check-suspend-resume-with-audio.sh -l 50 -m playback

Reproduction Rate
100%

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
  2. Name of the topology file
    • Topology: {sof-ace-tplg/sof-mtl-nocodec.tplg}
  3. Name of the platform(s) on which the bug is observed.
    • Platform: {mtl-nocodec}

dmesg.txt
mtrace.txt

@keqiaozhang keqiaozhang added bug Something isn't working as expected P1 Blocker bugs or important features IPC error IPC error is observed Intel Linux Daily tests This issue can be found in internal Linux daily tests suspend-resume Issues observed when doing system suspend and resume labels Jul 20, 2023
@keqiaozhang
Copy link
Collaborator Author

This is a FW regression and caused by #7325.
The commit ID should be 965e1c1.

@mengdonglin mengdonglin added MTL Applies to Meteor Lake platform I2S Applies to I2S bus for codec connection labels Jul 26, 2023
@lgirdwood lgirdwood added this to the v2.7 milestone Jul 26, 2023
@mengdonglin mengdonglin added IMR Related to IMR (Isolated Memory Region) on Intel platforms regression identified Identified the commit or PR that introduced a regression labels Jul 31, 2023
@mengdonglin
Copy link
Collaborator

mengdonglin commented Jul 31, 2023

Commit series on this D3 /IMR context save support:
Commit 53d471b [email protected] ipc4: notify Host on the failed IPC Device power transition
Commit 965e1c1 [email protected] board: intel_adsp_ace15_mtpm: Enabled MTL IMR context save
Commit 015488b [email protected] ipc4: added D3 support using Zephyr Power Manager API
Commit 80f27b2 [email protected] board: intel_adsp_ace15_mtpm: disabled CONFIG_PM_DEVICE_RUNTIME_EXCLUSIVE
Commit 4e8040c [email protected] platform: ace: Add pm notifiers to support Zephyr's D3 transition

image

@mengdonglin
Copy link
Collaborator

mengdonglin commented Jul 31, 2023

@keqiaozhang If we remove CONFIG_ADSP_IMR_CONTEXT_SAVE=y to disable IMR context save, can we reproduce this issue with tip of SOF main branch? commit 53d471b is kept and can still report device power transition failure.

@mengdonglin
Copy link
Collaborator

Submitted PR #7994 to check if we can reproduce this issue after disabling IMR context save on MTL.

@mengdonglin
Copy link
Collaborator

mengdonglin commented Jul 31, 2023

This issue can still be reproduced MTLP_RVP_NOCODEC with PR #7994 (IMR context save disabled) on tip of main branch commit 79532b8, test result:
https://sof-ci.ostc.intel.com/#/result/planresultdetail/29596

@lgirdwood
Copy link
Member

This issue can still be reproduced MTLP_RVP_NOCODEC with PR #7994 (IMR context save disabled) on tip of main branch commit 79532b8, test result: https://sof-ci.ostc.intel.com/#/result/planresultdetail/29596

ok, so unrelated to IMR context store. @mwasko fyi.

@mengdonglin mengdonglin removed the IMR Related to IMR (Isolated Memory Region) on Intel platforms label Aug 1, 2023
@mengdonglin
Copy link
Collaborator

mengdonglin commented Aug 1, 2023

@lgirdwood @mwasko @tmleman @wszypelt @keqiaozhang @fredoh9 @alex-cri
Summary of this issue in ww31:

  • NOT reproduced on cavs2.5 platforms.
  • Only reproduced on MTL RVP with SDW and NOCODEC (both single core and multi-core).
  • Not reproduced with MTL with HD-Audio till now.
  • Seems higher reproduction rate with multi-core (MTLP_RVP_NOCODEC_MULTICORE).
  • Disabling IMR context save on MTL doesn't help.

Recent occurrences:

  • ww31.1 on MTLP_RVP_NOCODEC_MULTICORE (report ID: 29577)
  • ww30.5 on MTLP_RVP_SDW, MTLP_RVP_NOCODEC , MTLP_RVP_NOCODEC_MULTICORE (report https://sof-ci.ostc.intel.com/#/result/planresultdetail/29547)
  • ww30.4 on MTLP_RVP_SDW and MTLP_RVP_NOCODEC_MULTICORE (report ID 29509)
  • ww30.3 on MTLP_RVP_NOCODEC_MULTICORE, (report ID 29436)

@mengdonglin mengdonglin removed the regression identified Identified the commit or PR that introduced a regression label Aug 1, 2023
@mengdonglin
Copy link
Collaborator

mengdonglin commented Aug 1, 2023

@lgirdwood @ujfalusi @mwasko @abonislawski @keqiaozhang @tmleman @wszypelt
It seems PR #7995 fixed this issue.
From latest PR test result https://sof-ci.ostc.intel.com/#/result/planresultdetail/29601
Let's wait for new daily test result.

@keqiaozhang
Copy link
Collaborator Author

keqiaozhang commented Aug 1, 2023

It seems PR #7995 fixed this issue.

This issue still exists on MTL platforms. I still can reproduce this issue with tip main branch on ww31.2 (daily test report ID 29617), on MTLP_RVP_NOCODEC_MULTICORE. @lgirdwood @tmleman

@lgirdwood
Copy link
Member

It seems PR #7995 fixed this issue.

This issue still exists on MTL platforms. I still can reproduce this issue with tip main branch on ww31.2 (daily test report ID 29617), on MTLP_RVP_NOCODEC_MULTICORE. @lgirdwood @tmleman

@keqiaozhang did the repro rate change after #7995 was merged ? Any other difference to before ?

@keqiaozhang
Copy link
Collaborator Author

@lgirdwood the repro rate is the same, no differences. This issue happened on 3 different platforms in today's daily test.

@tmleman
Copy link
Contributor

tmleman commented Aug 3, 2023

23ww31.5: I don't see any error in mtrace but I guess the problem comes from ipc device suspension:
https://github.com/thesofproject/sof/blob/main/src/ipc/ipc4/handler.c#L1120-L1123

This could easily be confirmed by finding one of these logs:
https://github.com/thesofproject/sof/blob/main/src/ipc/ipc-zephyr.c#L102-L117

I need to reproduce this behavior in a local environment.

@tmleman
Copy link
Contributor

tmleman commented Aug 8, 2023

23ww32.3: I'm unable to reproduce this issue, work in progress.

@tmleman
Copy link
Contributor

tmleman commented Aug 11, 2023

23ww32.6: root cause: SOF_IPC4_NOTIFY_LOG_BUFFER_STATUS notification added to the IPC queue. FW did not manage to send the message before the D3 entry procedure.

@keqiaozhang thanks for help with confirmation.

I see two possible solutions:

  • vA if only SOF_IPC4_NOTIFY_LOG_BUFFER_STATUS is in IPC queue, we continue the D3 transition procedure. The message will be sent when the DSP wakes up (PR8029).
  • vB we try to send pending notification.

Initially, I would go with version A. In version B, I do not know how HOST will behave when waiting for a SET_DX response when it will receive notification.

@kv2019i
Copy link
Collaborator

kv2019i commented Aug 14, 2023

Option A merged via #8029

@fredoh9
Copy link
Contributor

fredoh9 commented Aug 15, 2023

This problem is not found for two consecutive days

@lgirdwood
Copy link
Member

This problem is not found for two consecutive days

@fredoh9 pls close when happy resolved.

@fredoh9
Copy link
Contributor

fredoh9 commented Aug 16, 2023

no issue found in today's daily test too. Closing now

@fredoh9 fredoh9 closed this as completed Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected I2S Applies to I2S bus for codec connection Intel Linux Daily tests This issue can be found in internal Linux daily tests IPC error IPC error is observed MTL Applies to Meteor Lake platform P1 Blocker bugs or important features SDW SoundWire suspend-resume Issues observed when doing system suspend and resume
Projects
None yet
Development

No branches or pull requests

7 participants