-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiling report #20
Comments
Replace brute force floating point IDCT with ‘obfuscated’ version, making floating point interpolation speed on par with fixed point. References #20.
Add ‘restrict’ keyword to input/output pointer types. Drop ‘inline’ keyword from static function declarations, the compiler is already smart enough to inline them. References #20.
If we already sharing... On my XLL sample it spends more time there. 64-bit build:
32-bit build:
(only functions with > 1% of time) EDIT: GCC 4.9.2, mingw-w64 3.3.0. dcadec compiled with unmodified settings apart from -pg ofc. DTS Core Audio: 5.1 ch, 48 kHz, 24 bit, 1536 kbps
|
Aren't compiler vendor and settings extremely important here? The project is 100% C code. So maybe you should post compiler version and optimization settings too. Also interesting to finally see a case where the restrict keyword actually does something. |
References foo86#20.
References foo86#20.
I tested a few samples while profiling and this is what I get on a few samples.
Master audio (XLL):
time seconds seconds calls us/call us/call name
24.42 0.21 0.21 37680 5.57 8.49 interpolate_sub32_fixed
17.44 0.36 0.15 7536 19.91 32.65 parse_frame_data
13.95 0.48 0.12 dcadec_waveout_write
12.79 0.59 0.11 602880 0.18 0.18 inverse_dct32_fixed
11.63 0.69 0.10 23145408 0.00 0.00 bits_get_signed_rice
6.98 0.75 0.06 8790544 0.01 0.01 bits_get_signed
3.49 0.78 0.03 5968286 0.01 0.01 bits_get
2.33 0.80 0.02 7536 2.65 2.65 interpolate_lfe_fixed_fir
2.33 0.82 0.02 7536 2.65 16.28 parse_frame
2.33 0.84 0.02 bits_get_unsigned_rice
1.16 0.85 0.01 1055040 0.01 0.01 bits_get_unsigned_vlc
1.16 0.86 0.01 7536 1.33 46.45 filter_hd_ma_frame
0.00 0.86 0.00 1521769 0.00 0.00 bits_get1
0.00 0.86 0.00 135648 0.00 0.00 xll_map_ch_to_spkr
0.00 0.86 0.00 120576 0.00 0.00 bits_skip
0.00 0.86 0.00 90434 0.00 0.00 ta_get_size
0.00 0.86 0.00 60900 0.00 0.00 bits_seek
0.00 0.86 0.00 45216 0.00 0.00 xll_get_lsb_width
0.00 0.86 0.00 37681 0.00 0.00 bits_init
0.00 0.86 0.00 30144 0.00 0.00 bits_check_crc
0.00 0.86 0.00 22609 0.00 0.00 bits_skip1
0.00 0.86 0.00 15073 0.00 0.01 read_frame
0.00 0.86 0.00 10528 0.00 0.00 bits_get_signed_linear
0.00 0.86 0.00 7536 0.00 0.00 bits_align1
0.00 0.86 0.00 7536 0.00 45.12 core_filter
0.00 0.86 0.00 7536 0.00 32.70 core_parse
0.00 0.86 0.00 7536 0.00 0.11 exss_parse
0.00 0.86 0.00 7536 0.00 0.00 reorder_samples
0.00 0.86 0.00 7536 0.00 0.00 xll_assemble_msbs_lsbs
0.00 0.86 0.00 7536 0.00 0.00 xll_filter_band_data
0.00 0.86 0.00 7536 0.00 16.28 xll_parse
0.00 0.86 0.00 22 0.00 0.00 ta_zalloc_size
0.00 0.86 0.00 17 0.00 0.00 ta_free
0.00 0.86 0.00 6 0.00 0.00 ta_alloc_size
0.00 0.86 0.00 5 0.00 0.00 interpolator_create
Core 96kHz:
time seconds seconds calls us/call us/call name
84.91 10.91 10.91 138360 78.86 78.86 interpolate_sub64_float
4.75 11.52 0.61 27672 22.05 28.27 parse_frame_data
4.20 12.06 0.54 dcadec_waveout_write
2.33 12.36 0.30 27672 10.84 20.08 parse_x96_frame_data
1.87 12.60 0.24 19098423 0.01 0.01 bits_get_signed_vlc
0.78 12.70 0.10 26956937 0.00 0.00 bits_get
0.31 12.74 0.04 19174096 0.00 0.00 bits_get_signed
0.23 12.77 0.03 4427520 0.01 0.01 bits_get_unsigned_vlc
0.16 12.79 0.02 8135577 0.00 0.00 bits_get1
0.16 12.81 0.02 27672 0.72 0.72 reorder_samples
0.08 12.82 0.01 166032 0.06 0.06 ta_get_size
0.08 12.83 0.01 27672 0.36 394.72 core_filter
0.08 12.84 0.01 27672 0.36 48.90 core_parse
0.04 12.85 0.01 dcadec_context_filter
0.04 12.85 0.01 dcadec_context_free_exss_info
0.00 12.85 0.00 138360 0.00 0.00 bits_skip
0.00 12.85 0.00 83016 0.00 0.00 bits_skip1
0.00 12.85 0.00 55345 0.00 0.07 read_frame
0.00 12.85 0.00 55344 0.00 0.00 bits_init
0.00 12.85 0.00 55344 0.00 0.00 bits_seek
0.00 12.85 0.00 27672 0.00 0.06 alloc_x96_sample_buffer
0.00 12.85 0.00 16 0.00 0.00 ta_zalloc_size
0.00 12.85 0.00 12 0.00 0.00 ta_free
0.00 12.85 0.00 6 0.00 0.00 ta_alloc_size
0.00 12.85 0.00 5 0.00 0.00 interpolate_sub64_float_init
0.00 12.85 0.00 5 0.00 0.00 interpolator_create
Core 48kHz:
time seconds seconds calls us/call us/call name
69.77 0.60 0.60 28884 20.77 20.77 interpolate_sub32_float
10.47 0.69 0.09 7221 12.46 21.97 parse_frame_data
9.30 0.77 0.08 dcadec_waveout_write
4.65 0.81 0.04 9415936 0.00 0.00 bits_get_signed
2.33 0.83 0.02 3399280 0.01 0.01 bits_get
1.16 0.84 0.01 1010940 0.01 0.01 bits_get1
1.16 0.85 0.01 7221 1.38 1.38 reorder_samples
1.16 0.86 0.01 dcadec_stream_read
0.00 0.86 0.00 808752 0.00 0.00 bits_get_unsigned_vlc
0.00 0.86 0.00 36105 0.00 0.00 bits_skip
0.00 0.86 0.00 36105 0.00 0.00 ta_get_size
0.00 0.86 0.00 21663 0.00 0.00 bits_skip1
0.00 0.86 0.00 14443 0.00 0.01 read_frame
0.00 0.86 0.00 14442 0.00 0.00 bits_init
0.00 0.86 0.00 7221 0.00 0.00 bits_seek
0.00 0.86 0.00 7221 0.00 83.10 core_filter
0.00 0.86 0.00 7221 0.00 22.13 core_parse
0.00 0.86 0.00 12 0.00 0.00 ta_zalloc_size
0.00 0.86 0.00 5 0.00 0.00 ta_alloc_size
0.00 0.86 0.00 4 0.00 0.00 interpolate_sub32_float_init
0.00 0.86 0.00 4 0.00 0.00 interpolator_create
0.00 0.86 0.00 4 0.00 0.00 ta_free
Not surprising the transform takes most of the time and frame parsing is next.
The text was updated successfully, but these errors were encountered: