-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base_fw: sync time between host and fw for logging #8480
Conversation
8750270
to
a603a93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the nest step is for the python experts to upstream a small script that takes the kernel log + FW log and muxes them based on the timestamp. @jsarha @marc-hb @aiChaoSONG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent @RanderWang , looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't get the concept at all, and not even what this PR tries to fix.
We already have an IPC4 mechanism for passing the kernel time to the firmware, precisely to make sure the logs are aligned. We had a long discussion on this with @ujfalusi
So what does this PR do that wasn't done before?
src/audio/base_fw.c
Outdated
@@ -234,12 +236,21 @@ static int basefw_mem_state_info(uint32_t *data_offset, char *data) | |||
return 0; | |||
} | |||
|
|||
/* Assumed time interval between host sends timestamp and fw process it */ | |||
#define TIME_SETTING_LATENCY_NS 120000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does this 0.120ms come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guessing is worse than doing nothing because it will sometimes ahead and sometimes behind, super confusing. Please leave this to zero so the skew is always one way.
Please correct me: in general I think it's pretty obvious that receiving some IPC happened after it was sent, I mean neither IPC3 not IPC4 has a lot of concurrency, has any?
(also: don't leave 30 lines between the definition and the only use)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a common issue that host builds the time and sends to fw which process it a little later. At the time fw receives the timestamp, the timestamp is out of date since host time is going forward. The latency value is built from many experiments. It is not accurate and universal. I have a better idea to do it in kernel. Let's wait for it.
host build timestamp ------> host send timestamp by ipc msg -------> fw recieve and process it
| <------- 60 us --------> | <-------- about 60 us ------> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check thesofproject/linux#4706. Kernel will build the latency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're assuming the IPC latency is symmetric. It's clearly not, the Linux driver deals with a single interrupt and will check stream, IPC and SoundWire (in that order). If the host driver is busy with a SoundWire IPC, the IPC will be handled after renabling the global interrupt and going back to the interrupt thread.
On the firmware side, one would hope that Zephyr has a real-time behavior, but IIRC the IPCs are only processed at every firmware tick to avoid disrupting processing, so there some scheduling delay to be added too.
In other words, averaging the IPC latency is just as bad as adding a random fudge factor to the timestamps. The goal is not precise configuration but a means to correlate kernel and firmware. I don't think there's an expectation that there is a better than 1ms alignment, is there?
Back to requirements I guess :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart sure, I fully understand your assumption. But consider the real usage: when we sync timestamp ? Driver just do it after all audio initialization is done after power on, At this time we don't have any IPC or any audio stream, and FW just boot up and have nothing to do. Yes, it is still not very accurate, but without it we will found FW receive IPC msg before kernel send it. This will confuse developer. Currently we have a time error of 100 us, if 1ms error can be expected, I am happy to remove the calibration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 1ms skew is more than good enough. The most important thing is for the clock skew to be very simple, make sense and always be on the same side => don't compute anything. But please mention it the clock skew briefly in the logs.
@plbossart wrote:
Nothing happened before. Kernel sent a message and FW put it to /dev/null This PR actually implements something to make FW react to said IPC message kernel sends. |
Hummm, I am rather puzzled. This time alignment was added for APL back in 2017 or 2018, it's a well-known requirement from the automotive days, and the gregorian Windows time was used. It would be a big miss in the firmware validation if this code was missing.... @mwasko please chime in please. And the 0.120ms remains a mystery. I can't relate it to any usual time constant. |
⏱️ perhaps? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with TIME_SETTING_LATENCY_NS, see below.
I don't understand why the same function is setting global_system_time_info
from data
, but now using the same data
in a different way to set up something totally different and new for logging. Why isn't one thing enough?
src/audio/base_fw.c
Outdated
@@ -234,12 +236,21 @@ static int basefw_mem_state_info(uint32_t *data_offset, char *data) | |||
return 0; | |||
} | |||
|
|||
/* Assumed time interval between host sends timestamp and fw process it */ | |||
#define TIME_SETTING_LATENCY_NS 120000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guessing is worse than doing nothing because it will sometimes ahead and sometimes behind, super confusing. Please leave this to zero so the skew is always one way.
Please correct me: in general I think it's pretty obvious that receiving some IPC happened after it was sent, I mean neither IPC3 not IPC4 has a lot of concurrency, has any?
(also: don't leave 30 lines between the definition and the only use)
src/audio/base_fw.c
Outdated
@@ -32,6 +33,7 @@ DECLARE_SOF_RT_UUID("basefw", basefw_comp_uuid, 0xe398c32, 0x5ade, 0xba4b, | |||
DECLARE_TR_CTX(basefw_comp_tr, SOF_UUID(basefw_comp_uuid), LOG_LEVEL_INFO); | |||
|
|||
static struct ipc4_system_time_info global_system_time_info; | |||
static uint64_t time_delta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static uint64_t time_delta; | |
static uint64_t host_cycles_delta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't resolve comments with no answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint32_t should be enough, can delta > max(uint32_t), it is 112 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not related to precision. Log system use 64bits log so use 64bit
src/audio/base_fw.c
Outdated
if (!IS_ENABLED(CONFIG_LOG_TIMESTAMP_64BIT)) | ||
return IPC4_SUCCESS; | ||
|
||
host_time = k_ns_to_cyc_ceil64(*(uint64_t *)data + TIME_SETTING_LATENCY_NS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very strange: 10 lines earlier data
is cast to struct ipc4_system_time
. But here the same data
is cast to uint64_t
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do it with more code
host_time = low_part | ((uint64_t)high_part << 32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't make myself clear. My main question is: why is data
going to two different places? Why isn't one place enough?
EDIT: I had actually asked that already in #8480 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to simplify the code, I don't want to pack two 32bit number to 64bit one, so reuse data
That would be awesome but the
|
a123465
to
d46b1c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latency value is built from many experiments. It is not accurate and universal. I have a better idea to do it in kernel
@RanderWang please stop trying to re-invent https://www.intel.com/content/www/us/en/docs/programmable/683410/current/precision-time-protocol-ptp-synchronization.html and similar.
Because of interrupt latencies and other issues it's simply impossible to get something accurate, so keep things simple and don't add ANY offset, it will make everything simpler to understand and interpret. Maybe log a message clarifying that there is no PTP involved here, that this is just a loose synchronization for mere convenience.
PS: why is data
going to two different places? Why is one place not enough?
d46b1c4
to
25aff61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description of commit da5edbd which added global_system_time_info
stated "It is then used by FW to translate event timestamps e.g. logs to host system time domain".
So this PR is adding time_delta
which duplicates of global_system_time_info.host_time
, is changing the unit of global_system_time_info.dsp_time
(?) and seems to be generally duplicating the da5edbd feature without really explaining why it should be duplicated.
If what this PR is simpler and better then why not just delete the global_system_time_info
from da5edbd?
It's very confusing to have both features that seem to duplicate each other - in the same function!
src/audio/base_fw.c
Outdated
@@ -234,12 +236,21 @@ static int basefw_mem_state_info(uint32_t *data_offset, char *data) | |||
return 0; | |||
} | |||
|
|||
/* Assumed time interval between host sends timestamp and fw process it */ | |||
#define TIME_SETTING_LATENCY_NS 120000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 1ms skew is more than good enough. The most important thing is for the clock skew to be very simple, make sense and always be on the same side => don't compute anything. But please mention it the clock skew briefly in the logs.
src/audio/base_fw.c
Outdated
if (!(first_block && last_block)) | ||
return IPC4_INVALID_REQUEST; | ||
|
||
global_system_time_info.host_time.val_l = ((const struct ipc4_system_time *)data)->val_l; | ||
global_system_time_info.host_time.val_u = ((const struct ipc4_system_time *)data)->val_u; | ||
|
||
uint64_t current_dsp_time = sof_cycle_get_64(); | ||
current_dsp_cycle = sof_cycle_get_64(); | ||
current_dsp_time = k_cyc_to_us_ceil64(current_dsp_cycle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is changing the unit of global_system_time_info.dsp_time
, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both host & dsp time should be in us, not hw cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both host & dsp time should be in us, not hw cycle.
If you're fixing a bug in global_system_time_info.dsp_time
, then it must be described at the very least in the commit message of a separate and earlier commit (and even better: also described in a GitHub issue). Not secretly hidden in the addition of a new feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer some inline comments here describing the flow, i.e. the host_time is the epoch set by host.
@RanderWang pls change the ceil() API to a floor() API as we could return a time in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aiChaoSONG. I'm just asking for this to be clarified in the code itself. One should not have to search PR comments to find this explanation.
I had a look at some of the timestamps in https://sof-ci.01.org/sofpr/PR8480/build14946/devicetest/index.html and they never seem to match. |
OMG I just discovered #8016 by chance just now: It's incredibly time-consuming to review code when the (lack of) explanations do not even include the URL to the corresponding discussion. |
time_delta is not a duplication of global_system_time_info.host_time. global_system_time_info.host_time is for time query but time_delta is for log time adjust |
Thanks, updated. |
May I ask question here?
|
mtrace only produces fw trace, not kernel logs which is provided by dmesg. Yes, both fw log & kernel log will use unified timestamp
This is defined by debug memory slots 4KB, we can't change it. you can hack the code to enlarge it.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base_fw: convert dsp hw cycle to dsp time
Dsp time is in format of micro second, not hw cycle.
Where does this (new?) information come from?
@RanderWang kernel part merged now, this can be resumed now |
SOFCI TEST |
Dsp time is in format of micro second, not hw cycle. Signed-off-by: Rander Wang <[email protected]>
e8fa893
to
399e055
Compare
It is a usage in host windows driver. I didn't find any documents. |
Remove TODO and restore the code style for C89 |
399e055
to
d48a0dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I have a one more style nit (please check inline). If you need to update for any reason, please update that as well. Otherwise good, thanks!
https://sof-ci.01.org/sofpr/PR8480/build1544/devicetest/index.html and https://sof-ci.01.org/sofpr/PR8480/build1545/devicetest/index.html are empty, no tests have run. This is because the corresponding kernel build is marked as FAIL. @keqiaozhang , after a lot of digging I found this socwatch.tar error below in some Jenkins log. I don't whether it's relevant but it should be fixed anyway.
|
SOFCI TEST EDIT: https://sof-ci.01.org/sofpr/PR8480/build1554/devicetest/index.html and https://sof-ci.01.org/sofpr/PR8480/build1555/devicetest/index.html are running this time. |
It is a general implementation for logging and it doesn't use intel audio hardware feature like ART counter. 64bits timestamp is needed for accuracy since the timestamp used by host is beyond 32bits in most cases. Signed-off-by: Rander Wang <[email protected]>
TGL, TGL-H and ACE platforms will use 64 bits timestamp. Signed-off-by: Rander Wang <[email protected]>
d48a0dc
to
f22f2d4
Compare
@wszypelt Strange error in quickbuild TGL IPC4 tests. I see an error rsyncing log files and one smoketest fails to 12:47:41,146 INFO - Unhandled exception during verification: System.BadImageFormatException: Field token out of range. ... the quality test "16_00_TestSdwTransferLoopback\m0_sourcePDI2_sinkPDI3_output_loopback_2ch_48000Hz_24b.wav" seems to pass before the exception is triggered. |
Internal CI now passing, must have been an unrelated issue. |
CI performance test is failing in recent daily build, this is due to timestamp are aligned between kernel and SOF, details are listed in below link. The firmware timestamp is in sync with the host side now, and becomes much more deterministic. The condition to use --skip-to-first-trace option in CI is eliminated (see the help info for the reason it is used in CI test). Based on this, skip-to-first-trace are not needed anymore, hence remove it from the script. Link: thesofproject/sof#8480 Signed-off-by: Baofeng Tian <[email protected]>
CI performance test is failing in recent daily build, this is due to timestamp are aligned between kernel and SOF, details are listed in below link. The firmware timestamp is in sync with the host side now, and becomes much more deterministic. The condition to use --skip-to-first-trace option in CI is eliminated (see the help info for the reason it is used in CI test). Based on this, skip-to-first-trace are not needed anymore, hence remove it from the script. Link: thesofproject/sof#8480 Signed-off-by: Baofeng Tian <[email protected]>
CI performance test is failing in recent daily build, this is due to timestamp are aligned between kernel and SOF, details are listed in below link. The firmware timestamp is in sync with the host side now, and becomes much more deterministic. The condition to use --skip-to-first-trace option in CI is eliminated (see the help info for the reason it is used in CI test). Based on this, skip-to-first-trace are not needed anymore, hence remove it from the script. Link: thesofproject/sof#8480 Signed-off-by: Baofeng Tian <[email protected]>
CI performance test is failing in recent daily build, this is due to timestamp are aligned between kernel and SOF, details are listed in below link. The firmware timestamp is in sync with the host side after DSP boot up, and becomes much more deterministic. The condition to use --skip-to-first-trace option in CI is eliminated (see the help info for the reason it is used in CI test). Based on this, skip-to-first-trace are not needed anymore, hence remove it from the performance test case script. Link: thesofproject/sof#8480 Signed-off-by: Baofeng Tian <[email protected]>
It is a general implementation for logging and it doesn't use intel audio hardware feature like ART counter. 64bits timerstamp is needed for accurate since the timestamp used by host is beyond 32bits in most cases.
For feature: #8016 Implement SOF_IPC4_FW_PARAM_SYSTEM_TIME to sync host and FW timestamps
The demo:
FW log:
[18014.307276] ipc: ipc_cmd: rx : 0x11000005|0x0
[18014.307286] pipe: pipeline_new: pipeline new pipe_id 0 priority 0
[18014.307910] ipc: ipc_cmd: rx : 0x40000004|0x15
[18014.307990] dma: dma_get: dma_get() ID 0 sref = 1 busy channels 0
[18014.308556] ipc: ipc_cmd: rx : 0x40000006|0x10
[18014.309081] ipc: ipc_cmd: rx : 0x44000006|0x30000018
[18014.309613] ipc: ipc_cmd: rx : 0x40000002|0xa
[18014.310131] ipc: ipc_cmd: rx : 0x11010004|0x0
[18014.310141] pipe: pipeline_new: pipeline new pipe_id 1 priority 0
Kernel log:
[18014.307175] sof-audio-pci-intel-tgl 0000:00:1f.3: Create widget pipeline.1 instance 0 - pipe 1 - core 0
[18014.307196] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx : 0x11000005|0x0: GLB_CREATE_PIPELINE
[18014.307788] sof-audio-pci-intel-tgl 0000:00:1f.3: Create widget host-copier.0.playback instance 0 - pipe 1 - core 0
[18014.307811] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx : 0x40000004|0x15: MOD_INIT_INSTANCE [data size: 84]
[18014.308437] sof-audio-pci-intel-tgl 0000:00:1f.3: widget host-copier.0.playback setup complete
[18014.308451] sof-audio-pci-intel-tgl 0000:00:1f.3: Create widget gain.1.1 instance 0 - pipe 1 - core 0
[18014.308466] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx : 0x40000006|0x10: MOD_INIT_INSTANCE [data size: 64]
[18014.309003] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx : 0x44000006|0x30000018: MOD_LARGE_CONFIG_SET [data size:
[18014.309501] sof-audio-pci-intel-tgl 0000:00:1f.3: widget gain.1.1 setup complete
[18014.309519] sof-audio-pci-intel-tgl 0000:00:1f.3: Create widget mixin.1.1 instance 0 - pipe 1 - core 0
[18014.309530] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx : 0x40000002|0xa: MOD_INIT_INSTANCE [data size: 40]