-
Notifications
You must be signed in to change notification settings - Fork 0
F2FS Debugging
This contains my notes on debugging and understanding some internal performance of F2FS. It includes locking overheads and performance profiling of functions, since I am aiming to increase its performance. These are also my notes on implementing multi streams in F2FS, hence often contain comments for me to track what did and did not work and why.
This retrieves and shows the call write stack using perf
, and is running on 5.19 Kernel without modifications. I don't include all output but only what is important. I include all commands if I ever happen to need them again (which is very likely).
user@stosys:~$ sudo perf record -F 99 -a --call-graph dwarf sudo env "NR_FILES=1" "NR_THREAD=1" ${FIO_HOME}/fio src/f2fs-zns-workloads/f2fs-multi-stream/jobs/f2fs_single_stream.fio
user@stosys:~$ sudo perf report
32.14% 0.00% kworker/u22:2-e [kernel.kallsyms] [k] ret_from_fork
|
---ret_from_fork
kthread
worker_thread
process_one_work
wb_workfn
wb_do_writeback
wb_writeback
__writeback_inodes_wb
writeback_sb_inodes
__writeback_single_inode
do_writepages
f2fs_write_data_pages
__f2fs_write_data_pages
|
--31.97%--f2fs_write_cache_pages
|
|--29.07%--f2fs_write_single_data_page
| |
| --27.08%--f2fs_do_write_data_page
| |
| |--14.29%--f2fs_outplace_write_data
| | |
| | |--11.11%--do_write_page
| | | |
| | | |--4.02%--f2fs_allocate_data_block
| | | | |
| | | | |--1.13%--update_sit_entry
| | | | | |
| | | | | --0.51%--f2fs_usable_zone_blks_in_seg
| | | | |
| | | | --0.82%--locate_dirty_segment
| | | | |
| | | | --0.53%--__locate_dirty_segment
| | | |
| | | |--3.63%--f2fs_submit_page_write
| | | | |
| | | | --0.66%--__wake_up
| | | | __wake_up_common_lock
| | | |
| | | --1.25%--f2fs_update_device_state.part.0
| | | --0.97%--f2fs_set_dirty_device
| | | |
| | | --0.70%--__add_ino_entry
| | | |
| | | --0.68%--radix_tree_lookup
| | | | --0.53%--__locate_dirty_segment
| | | |
| | | |--3.63%--f2fs_submit_page_write
| | | |
| | | |--3.63%--f2fs_submit_page_write
| | | | |
| | | | --0.66%--__wake_up
| | | | __wake_up_common_lock
| | | |
| | | --1.25%--f2fs_update_device_state.part.0
| | | |
| | | --0.97%--f2fs_set_dirty_device
| | | |
| | | --0.70%--__add_ino_entry
| | | |
| | | __radix_tree_lookup
| | |
| | |--1.89%--f2fs_update_data_blkaddr
| | | f2fs_update_extent_cache
| | | |
| | | --1.29%--f2fs_update_extent_tree_range
| | | |
| | | --0.68%--__try_merge_extent_node
| | | _raw_spin_unlock
| | | __raw_callee_save___pv_queued_spin_unlock
| | |
| | --1.05%--f2fs_update_extent_cache
| |
| |--5.91%--f2fs_get_dnode_of_data
| | |
| | --4.66%--__get_node_page
| | |
| | --4.28%--__get_node_page.part.0
| | |
| | |--3.23%--pagecache_get_page
| | | |
| | | --2.56%--__filemap_get_folio
| | |
| | --0.82%--read_node_page
| |
| |--3.97%--set_page_writeback
| | __folio_start_writeback
| | |
| | |--1.31%--__lock_text_start
| | | __raw_spin_lock_irqsave
| | |
| | |--0.61%--_raw_spin_unlock_irqrestore
| | |
| | --0.52%--mod_lruvec_page_state
| |
| --1.07%--f2fs_put_dnode
|
--1.82%--clear_page_dirty_for_io
|
--1.72%--folio_clear_dirty_for_io
When debugging kernel code we often need to see contention points for used locks (semaphores, mutexes, etc.). This can easily be done with the builtin kernel lock statistics. These collect information on acquisition times, holding times, and much more. Usage is as follows:
# Enable lock stats
root@stosys:/home/user# echo 1 > /proc/sys/kernel/lock_stat
# Get stats with cat/less/head etc.
root@stosys:/home/user# less /proc/lock_stat
# Reset the collected stats
root@stosys:/home/user# echo 0 > /proc/lock_stat
For this I was aiming to find the lock contention in F2FS. Identifying if there are particular parts that can be parallelized with multiple locks. Note, this is run on a debug kernel with qemu -kernel
, hence times are increased as a result of all the kernel debugging tools (kasan, etc.). I grep for the F2FS related locks to only see what is needed. For convenience I also pasted the column information of lock_stat
, these are not actually part of the output (get them with head -5 /proc/lock-stat
)
root@stosys:/home/user# cat /proc/lock_stat | grep -E "f2fs|sbi|sit_i|nm_i|sm_info"
lock_stat version 0.4
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
&sit_i->sentry_lock-W: 4644 4846 0.34 1179.80 9679.18 2.00 57326 5920627 1.04 294251.76 61121145.04 10.32
&sit_i->sentry_lock-R: 56802 56991 0.42 1080.25 1839784.99 32.28 57247 18361436 0.21 6758.12 11502506.77 0.63
&sit_i->sentry_lock 56991 [<0000000025468521>] check_valid_map+0x48/0xd0
&sit_i->sentry_lock 4846 [<000000000ad77fab>] f2fs_allocate_data_block+0x16b/0x1470
&sit_i->sentry_lock 56721 [<000000000ad77fab>] f2fs_allocate_data_block+0x16b/0x1470
&sit_i->sentry_lock 5116 [<0000000025468521>] check_valid_map+0x48/0xd0
&nm_i->nat_list_lock: 4331 4379 0.36 624.16 4840.38 1.11 93982 13225060 0.29 2130.67 8722715.54 0.66
&nm_i->nat_list_lock 4379 [<0000000092ecb2b0>] __lookup_nat_cache+0x62/0x110
&nm_i->nat_list_lock 4379 [<0000000092ecb2b0>] __lookup_nat_cache+0x62/0x110
&et->lock 54 [<000000008cdbedea>] f2fs_lookup_extent_cache+0x190/0x710
&et->lock 83 [<00000000607fc718>] f2fs_update_extent_tree_range+0x184/0x950
&et->lock 54 [<00000000607fc718>] f2fs_update_extent_tree_range+0x184/0x950
&et->lock 83 [<000000008cdbedea>] f2fs_lookup_extent_cache+0x190/0x710
&sbi->extent_lock: 117 120 0.45 40.12 218.89 1.82 77783 12040725 0.39 6024.66 8658383.88 0.72
&sbi->extent_lock 61 [<000000006d1d302b>] f2fs_lookup_extent_cache+0x292/0x710
&sbi->extent_lock 58 [<00000000e65f9d87>] __try_merge_extent_node+0x15e/0x450
&sbi->extent_lock 1 [<00000000d253b52e>] __insert_extent_tree+0x106/0x240
&sbi->extent_lock 56 [<00000000e65f9d87>] __try_merge_extent_node+0x15e/0x450
&sbi->extent_lock 63 [<000000006d1d302b>] f2fs_lookup_extent_cache+0x292/0x710
&sbi->extent_lock 1 [<00000000d253b52e>] __insert_extent_tree+0x106/0x240
&sbinfo->stat_lock: 0 0 0.00 0.00 0.00 0.00 833 1695 0.00 17.69 2011.67 1.19
f2fs_stat_lock: 0 0 0.00 0.00 0.00 0.00 5 5 1.29 23.21 38.30 7.66
&sbi->gc_thread->fggc_wq: 0 0 0.00 0.00 0.00 0.00 2 2 2.65 12.71 15.36 7.68
&sbi->s_es_lock: 0 0 0.00 0.00 0.00 0.00 661 7489 0.60 366.76 8040.86 1.07
&sbi->s_error_lock: 0 0 0.00 0.00 0.00 0.00 1 2 6.66 15.78 22.44 11.22
(work_completion)(&sbi->s_error_work): 0 0 0.00 0.00 0.00 0.00 0 1 15.33 15.33 15.33 15.33
&sbi->s_md_lock: 0 0 0.00 0.00 0.00 0.00 89 743 0.54 179.06 1871.22 2.52
&sbi->s_writepages_rwsem-R: 0 0 0.00 0.00 0.00 0.00 0 564 0.84 203262.41 3876161.76 6872.63
&sbi->s_orphan_lock: 0 0 0.00 0.00 0.00 0.00 115 876 1.26 226.80 3986.01 4.55
&sbi->s_mb_rb_lock-W: 0 0 0.00 0.00 0.00 0.00 62 444 0.69 23.42 1328.66 2.99
&sbi->s_mb_rb_lock-R: 0 0 0.00 0.00 0.00 0.00 0 47 1.95 216.46 745.21 15.86
&sbi->s_mb_largest_free_orders_locks[-W: 0 0 0.00 0.00 0.00 0.00 131 795 0.58 23.76 1788.86 2.25
&sbi->s_mb_largest_free_orders_locks[-R: 0 0 0.00 0.00 0.00 0.00 8 10 2.30 277.55 391.22 39.12
&sbi->fs_lock: 0 0 0.00 0.00 0.00 0.00 6 33 1.76 49.70 261.42 7.92
&sbi->wq_mutex: 0 0 0.00 0.00 0.00 0.00 3 3 28.37 278.69 341.32 113.77
&sbi->pipe_mutex: 0 0 0.00 0.00 0.00 0.00 1 1 568.39 568.39 568.39 568.39
&sbi->streammap_lock: 0 0 0.00 0.00 0.00 0.00 78 15203885 0.53 9988.61 12619640.08 0.83
&nm_i->nid_list_lock: 0 0 0.00 0.00 0.00 0.00 6 121398 0.58 2476.12 1789743.32 14.74
&nm_i->build_lock: 0 0 0.00 0.00 0.00 0.00 52 204 0.82 254992.18 314326.95 1540.82
&nm_i->nat_tree_lock-W: 0 0 0.00 0.00 0.00 0.00 4 13311 2.61 11699.63 230141.52 17.29
&nm_i->nat_tree_lock-R: 0 0 0.00 0.00 0.00 0.00 94083 15101090 0.67 254943.71 23852332.99 1.58
&sbi->cp_global_sem: 0 0 0.00 0.00 0.00 0.00 5 37 2.63 131434.47 1522091.42 41137.61
f2fs_list_lock: 0 0 0.00 0.00 0.00 0.00 5 5 1.00 16.19 27.66 5.53
&sbi->gc_thread->gc_wait_queue_head: 0 0 0.00 0.00 0.00 0.00 7 87 0.64 26.58 247.85 2.85
&sbi->stat_lock: 0 0 0.00 0.00 0.00 0.00 7 9320 0.60 125.66 33056.64 3.55
&sbi->cp_rwsem-W: 0 0 0.00 0.00 0.00 0.00 4 28 2.26 125212.94 1478869.86 52816.78
&sbi->cp_rwsem-R: 0 0 0.00 0.00 0.00 0.00 3 5911958 5.25 295129.33 193559611.41 32.74
&sbi->inode_lock[i]: 0 0 0.00 0.00 0.00 0.00 31 6331921 0.58 18825.87 5245790.91 0.83
&sbi->extent_tree_lock: 0 0 0.00 0.00 0.00 0.00 2 13 1.37 54017.61 54900.16 4223.09
&sbi->sb_lock-R: 0 0 0.00 0.00 0.00 0.00 1 7 2.55 71.76 89.13 12.73
&sbi->node_change-W: 0 0 0.00 0.00 0.00 0.00 6 25 8.06 96483.89 551267.14 22050.69
&sbi->node_change-R: 0 0 0.00 0.00 0.00 0.00 9 2883752 2.50 255611.36 19970056.97 6.93
&sbi->gc_lock: 0 0 0.00 0.00 0.00 0.00 4 52 1.96 9837437.61 106697626.03 2051877.42
&sbi->writepages: 0 0 0.00 0.00 0.00 0.00 39 535 33242.83 4954863.85 242587877.71 453435.29
&sbi->rr_active_stream_lock[i]: 0 0 0.00 0.00 0.00 0.00 55 5920572 0.60 21137.25 4544948.94 0.77
&sm_info->curseg_lock-R: 0 0 0.00 0.00 0.00 0.00 65 5920575 5.23 294307.94 83926844.31 14.18
&sbi->write_io[i][j].io_lock: 0 0 0.00 0.00 0.00 0.00 55 17761716 0.60 23164.41 13938704.95 0.78
&sbi->write_io[i][j].io_rwsem-W: 0 0 0.00 0.00 0.00 0.00 132 5933719 0.54 50951.26 25478283.65 4.29
&sbi->write_io[i][j].io_rwsem-R: 0 0 0.00 0.00 0.00 0.00 90 1614 0.63 27.19 1440.30 0.89
&sbi->dev_lock: 0 0 0.00 0.00 0.00 0.00 43 44 0.98 76.89 211.17 4.80
&sbi->node_write-W: 0 0 0.00 0.00 0.00 0.00 6 46 1.25 78238.63 927459.38 20162.16
&sbi->node_write-R: 0 0 0.00 0.00 0.00 0.00 1 8628 28.39 7867.27 503611.17 58.37
&sbi->fsync_node_lock: 0 0 0.00 0.00 0.00 0.00 7 17259 0.51 17.67 14546.59 0.84
&sbi->cp_lock: 0 0 0.00 0.00 0.00 0.00 5 50 0.85 9.60 92.48 1.85
&sbi->cp_wait: 0 0 0.00 0.00 0.00 0.00 7 100 0.64 191.37 773.98 7.74
&sbi->umount_mutex: 0 0 0.00 0.00 0.00 0.00 2 2 3511.32 56781.12 60292.44 30146.22
&sbi->io_order_lock-R: 0 0 0.00 0.00 0.00 0.00 28 3032831 23.31 7000.68 84128416.81 27.74
One of the earlier implementations I made was to handle the streams in a different way in the F2FS I/O management. By default it manages the I/O in write_io
, which is indexed by the type (hot/warm/cold). With multiple streams and hence multiple instances of a single type, this indexing is no longer the best. I extended the indexing to also index by the stream id. Then I profiled the locks and compare with the profiling of the unmodified F2FS.
root@stosys:/home/user# cat /proc/lock_stat | grep -E "f2fs|sbi|sit_i|nm_i|sm_info"
&sit_i->sentry_lock-W: 14219 14278 0.34 4017.78 19863.39 1.39 182271 5308208 5.70 32587.04 63464553.30 11.96
&sit_i->sentry_lock-R: 181612 181768 0.40 20589.27 5625626.63 30.95 182176 25623969 0.21 2782.04 16956195.57 0.66
&sit_i->sentry_lock 181768 [<000000006ce8634d>] check_valid_map+0x48/0xd0
&sit_i->sentry_lock 14278 [<00000000bd1804fe>] f2fs_allocate_data_block+0x17d/0x14a0
&sit_i->sentry_lock 180963 [<00000000bd1804fe>] f2fs_allocate_data_block+0x17d/0x14a0
&sit_i->sentry_lock 15083 [<000000006ce8634d>] check_valid_map+0x48/0xd0
&sbi->inode_lock[i]: 103638 103724 0.41 12975.84 161188.09 1.55 951026 5939719 0.30 1645.99 4923380.33 0.83
&sbi->inode_lock[i] 103724 [<000000007e654fa2>] f2fs_update_dirty_folio+0xa6/0x490
&sbi->inode_lock[i] 103724 [<000000007e654fa2>] f2fs_update_dirty_folio+0xa6/0x490
&nm_i->nat_list_lock: 17429 17472 0.36 46.89 11330.13 0.65 307119 18167578 0.28 25467.76 12727449.60 0.70
&nm_i->nat_list_lock 17472 [<000000002abff106>] __lookup_nat_cache+0x62/0x110
&nm_i->nat_list_lock 17472 [<000000002abff106>] __lookup_nat_cache+0x62/0x110
&sbi->extent_lock: 164 165 0.46 17.18 230.38 1.40 246398 13294578 0.34 24788.96 9961278.19 0.75
&sbi->extent_lock 100 [<00000000fb493d48>] f2fs_lookup_extent_cache+0x292/0x710
&sbi->extent_lock 64 [<00000000e61a1502>] __try_merge_extent_node+0x15e/0x450
&sbi->extent_lock 1 [<000000009073a0ea>] __insert_extent_tree+0x106/0x240
&sbi->extent_lock 94 [<00000000e61a1502>] __try_merge_extent_node+0x15e/0x450
&sbi->extent_lock 69 [<00000000fb493d48>] f2fs_lookup_extent_cache+0x292/0x710
&sbi->extent_lock 2 [<000000009073a0ea>] __insert_extent_tree+0x106/0x240
&et->lock 70 [<00000000e7e0ee04>] f2fs_lookup_extent_cache+0x190/0x710
&et->lock 23 [<000000007c9309ff>] f2fs_update_extent_tree_range+0x184/0x950
&et->lock 70 [<000000007c9309ff>] f2fs_update_extent_tree_range+0x184/0x950
&et->lock 23 [<00000000e7e0ee04>] f2fs_lookup_extent_cache+0x190/0x710
&sbi->node_change-W: 1 1 417.07 417.07 417.07 417.07 3 30 28.71 343037.33 758207.58 25273.59
&sbi->node_change-R: 7 8 0.85 343830.49 2061876.06 257734.51 951017 1167127 3.37 29638.65 13200975.92 11.31
&sbi->node_change 8 [<00000000533e9bb7>] f2fs_do_map_lock+0x1e/0x40
&sbi->node_change 1 [<0000000057f046d0>] f2fs_write_checkpoint+0x3cf/0x2fd0
&sbi->node_change 2 [<0000000057f046d0>] f2fs_write_checkpoint+0x3cf/0x2fd0
&sbi->node_change 7 [<00000000533e9bb7>] f2fs_do_map_lock+0x1e/0x40
&sbinfo->stat_lock: 1 1 1.33 1.33 1.33 1.33 808 1615 0.00 21.69 1986.07 1.23
&sbinfo->stat_lock 1 [<0000000052969bfb>] shmem_free_inode+0x4a/0x70
&sbinfo->stat_lock 1 [<0000000052969bfb>] shmem_free_inode+0x4a/0x70
&sbi->sb_lock-R: 0 0 0.00 0.00 0.00 0.00 1 6 2.92 19.77 35.19 5.86
&sbi->rr_active_stream_lock[i]: 0 0 0.00 0.00 0.00 0.00 55 5308147 0.63 19083.77 4083601.88 0.77
&sbi->s_es_lock: 0 0 0.00 0.00 0.00 0.00 507 6874 0.58 26.90 6777.34 0.99
&sbi->s_error_lock: 0 0 0.00 0.00 0.00 0.00 2 2 5.81 39.45 45.26 22.63
&sbi->s_mb_largest_free_orders_locks[-W: 0 0 0.00 0.00 0.00 0.00 107 428 0.66 21.76 740.25 1.73
&sbi->s_mb_largest_free_orders_locks[-R: 0 0 0.00 0.00 0.00 0.00 6 7 2.65 38.33 73.39 10.48
&sbi->s_mb_rb_lock-W: 0 0 0.00 0.00 0.00 0.00 74 258 0.89 21.49 663.33 2.57
&sbi->s_mb_rb_lock-R: 0 0 0.00 0.00 0.00 0.00 0 28 7.92 191.82 698.42 24.94
(work_completion)(&sbi->s_error_work): 0 0 0.00 0.00 0.00 0.00 0 1 11.83 11.83 11.83 11.83
&sbi->s_md_lock: 0 0 0.00 0.00 0.00 0.00 78 488 0.58 45.25 907.16 1.86
&sbi->s_writepages_rwsem-R: 0 0 0.00 0.00 0.00 0.00 0 882 0.78 118426.33 1354388.06 1535.59
&sbi->s_orphan_lock: 0 0 0.00 0.00 0.00 0.00 91 416 1.27 218.72 2659.34 6.39
&sbi->fs_lock: 0 0 0.00 0.00 0.00 0.00 5 33 1.73 47.74 259.33 7.86
&sbi->wq_mutex: 0 0 0.00 0.00 0.00 0.00 2 3 32.37 295.96 369.09 123.03
&sbi->pipe_mutex: 0 0 0.00 0.00 0.00 0.00 0 1 255.44 255.44 255.44 255.44
&sbi->streammap_lock: 0 0 0.00 0.00 0.00 0.00 97 17767752 0.56 26587.72 15591418.66 0.88
&nm_i->nid_list_lock: 0 0 0.00 0.00 0.00 0.00 2 73619 0.62 2182.86 741000.55 10.07
&nm_i->build_lock: 0 0 0.00 0.00 0.00 0.00 78 145 1.13 29530.74 83847.61 578.26
&nm_i->nat_tree_lock-W: 0 0 0.00 0.00 0.00 0.00 1 23934 2.59 26425.01 205609.87 8.59
&nm_i->nat_tree_lock-R: 0 0 0.00 0.00 0.00 0.00 307024 18120088 0.69 29506.25 32271001.18 1.78
f2fs_stat_lock: 0 0 0.00 0.00 0.00 0.00 1 1 9.88 9.88 9.88 9.88
&sbi->cp_global_sem: 0 0 0.00 0.00 0.00 0.00 2 33 2.51 460989.54 2656346.58 80495.35
f2fs_list_lock: 0 0 0.00 0.00 0.00 0.00 1 1 129.95 129.95 129.95 129.95
&sbi->gc_thread->gc_wait_queue_head: 0 0 0.00 0.00 0.00 0.00 2 96 0.64 20.53 200.48 2.09
&sbi->stat_lock: 0 0 0.00 0.00 0.00 0.00 2 18594 0.59 44.40 12536.14 0.67
&sbi->cp_rwsem-W: 0 0 0.00 0.00 0.00 0.00 1 36 1.83 451498.90 2623255.31 72868.20
&sbi->cp_rwsem-R: 0 0 0.00 0.00 0.00 0.00 1 5293538 5.77 38946.28 186140159.31 35.16
&sbi->extent_tree_lock: 0 0 0.00 0.00 0.00 0.00 1 6 11.07 70.25 169.07 28.18
&sbi->gc_lock: 0 0 0.00 0.00 0.00 0.00 1 62 1.86 11082989.32 157825138.93 2545566.76
&sm_info->curseg_lock-R: 0 0 0.00 0.00 0.00 0.00 78 5308147 8.98 32591.70 85100803.94 16.03
&sbi->write_io[k + (MAX_ACTIVE_LOGS *: 0 0 0.00 0.00 0.00 0.00 55 15924441 0.63 32671.30 13189589.10 0.83
&sbi->write_io[k + (MAX_ACTIVE_LOGS *-W: 0 0 0.00 0.00 0.00 0.00 124 5320593 0.62 45234.11 22182713.59 4.17
&sbi->write_io[k + (MAX_ACTIVE_LOGS *-R: 0 0 0.00 0.00 0.00 0.00 108 2565 0.63 16.36 2629.88 1.03
&sbi->dev_lock: 0 0 0.00 0.00 0.00 0.00 51 58 1.09 24.72 198.29 3.42
&sbi->node_write-W: 0 0 0.00 0.00 0.00 0.00 1 58 1.22 108041.75 1864362.02 32144.17
&sbi->node_write-R: 0 0 0.00 0.00 0.00 0.00 0 14616 28.13 1291.81 657126.17 44.96
&sbi->fsync_node_lock: 0 0 0.00 0.00 0.00 0.00 1 29225 0.52 27.45 22347.58 0.76
&sbi->cp_lock: 0 0 0.00 0.00 0.00 0.00 1 58 0.86 8.79 94.32 1.63
&sbi->cp_wait: 0 0 0.00 0.00 0.00 0.00 1 116 0.63 505.93 864.57 7.45
&sbi->writepages: 0 0 0.00 0.00 0.00 0.00 43 850 3.94 849710.55 233279903.39 274446.95
&sbi->io_order_lock-R: 0 0 0.00 0.00 0.00 0.00 48 4135679 23.15 38928.24 116564852.66 28.19
Performance for this particular is the same, which logically now makes sense, since I/O is still managed (submitted) in the same way, hence locked up for the same time. The issue now is that performance significantly drops because we do not know which stream to flush when syncing page cache to the device, hence we have to flush all. This increases the writeback calls and hence drops the performance, since flushing is still called the same number of times. Next we do some more profiling on what else we can improve (and from the locking profiling the obvious choice is the sbi->writepages
lock, where I/Os are submitted under a single global lock).
For this I am tracing the I/O submission, in order to easily see entire call stacks and overheads of particular functions. This is retrieved with perf, and only includes the part that is of interest.
user@stosys:~$ sudo perf record -F 99 -a --call-graph dwarf sudo env "NR_FILES=1" "NR_THREAD=1" ${FIO_HOME}/fio src/f2fs-zns-workloads/f2fs-multi-stream/jobs/f2fs_single_stream.fio
user@stosys:~$ sudo perf report --symbol-filter=f2fs_write_cache_pages
43.24% 0.00% kworker/u22:3-e [kernel.kallsyms] [k] ret_from_fork
|
---ret_from_fork
kthread
worker_thread
process_one_work
wb_workfn
wb_writeback
__writeback_inodes_wb
writeback_sb_inodes
__writeback_single_inode
do_writepages
f2fs_write_data_pages
|
--43.23%--f2fs_write_cache_pages
|
--42.09%--f2fs_write_single_data_page
|
|--39.98%--f2fs_do_write_data_page
| |
| |--27.12%--f2fs_outplace_write_data
| | |
| | |--22.98%--do_write_page
| | | |
| | | |--14.48%--f2fs_allocate_data_block
| | | | |
| | | | |--3.50%--update_segment_mtime
| | | | | |
| | | | | --3.45%--ktime_get_coarse_with_offset
| | | | | |
| | | | | |--1.30%--trace_hardirqs_on
| | | | | | |
| | | | | | --1.23%--lockdep_hardirqs_on_prepare
| | | | | | |
| | | | | | --1.05%--mark_held_locks
| | | | | | |
| | | | | | --0.94%--mark_lock.part.0
| | | | | |
| | | | | |--1.15%--lock_acquire
| | | | | | |
| | | | | | --0.80%--__lock_acquire
| | | | | |
| | | | | --0.71%--lock_release
| | | | |
| | | | |--2.55%--mutex_lock_nested
| | | | | |
| | | | | --2.50%--__mutex_lock
| | | | | |
| | | | | --1.53%--lock_acquire
| | | | | |
| | | | | --1.02%--__lock_acquire
| | | | |
| | | | |--1.72%--__lock_text_start
| | | | | |
| | | | | --1.29%--lock_acquire
| | | | | |
| | | | | --0.95%--__lock_acquire
| | | | |
| | | | |--1.70%--locate_dirty_segment
| | | | | |
| | | | | --1.68%--IS_CURSEG
| | | | | |
| | | | | --1.34%--__lock_text_start
| | | | | |
| | | | | --0.87%--lock_acquire
| | | | | |
| | | | | --0.52%--__lock_acquire
| | | | |
| | | | |--1.01%--down_write
| | | | | |
| | | | | --0.75%--lock_acquire
| | | | |
| | | | |--0.92%--down_read
| | | | |
| | | | |--0.67%--_raw_spin_unlock
| | | | | |
| | | | | --0.55%--lock_release
| | | | |
| | | | --0.52%--up_write
| | | |
| | | |--4.35%--f2fs_submit_page_write
| | | | |
| | | | |--1.74%--__lock_text_start
| | | | | |
| | | | | --1.32%--lock_acquire
| | | | | |
| | | | | --0.99%--__lock_acquire
| | | | |
| | | | |--0.90%--down_write
| | | | | |
| | | | | --0.68%--lock_acquire
| | | | |
| | | | --0.59%--_raw_spin_unlock
| | | | |
| | | | --0.53%--lock_release
| | | |
| | | --3.74%--f2fs_update_device_state.part.0
| | | |
| | | --3.72%--f2fs_set_dirty_device
| | | |
| | | --3.71%--__add_ino_entry
| | | |
| | | |--1.07%--__lock_text_start
| | | | |
| | | | --0.66%--lock_acquire
| | | | |
| | | | --0.53%--__lock_acquire
| | | |
| | | |--0.82%--lock_acquire
| | | | |
| | | | --0.58%--__lock_acquire
| | | |
| | | |--0.66%--radix_tree_preload
| | | | |
| | | | --0.64%--__radix_tree_preload
| | | | |
| | | | --0.61%--lock_acquire
| | | |
| | | --0.54%--lock_release
| | |
| | --4.05%--f2fs_update_data_blkaddr
| | |
| | --3.71%--f2fs_update_extent_cache
| | |
| | --3.50%--f2fs_update_extent_tree_range
| | |
| | |--1.40%--__try_merge_extent_node
| | | |
| | | --1.06%--__lock_text_start
| | | |
| | | --0.72%--lock_acquire
| | | |
| | | --0.54%--__lock_acquire
| | |
| | --1.03%--_raw_write_lock
| | |
| | --0.77%--lock_acquire
| | |
| | --0.61%--__lock_acquire
| |
| |--6.10%--f2fs_get_dnode_of_data
| | |
| | --5.66%--__get_node_page
| | |
| | --5.30%--pagecache_get_page
| | |
| | --5.14%--__filemap_get_folio
| | |
| | |--2.07%--xas_load
| | | |
| | | --1.49%--xas_descend
| | | |
| | | --0.90%--rcu_read_lock_held
| | |
| | |--1.56%--lock_acquire
| | | |
| | | --1.13%--__lock_acquire
| | |
| | --0.71%--lock_release
| |
| |--2.72%--f2fs_get_node_info
| | |
| | |--1.19%--__lookup_nat_cache
| | | |
| | | --0.79%--__lock_text_start
| | |
| | --0.84%--down_read
| | |
| | --0.60%--lock_acquire
| |
| |--2.68%--set_page_writeback
| | |
| | --2.62%--__folio_start_writeback
| | |
| | |--1.02%--_raw_spin_unlock_irqrestore
| | |
| | --0.81%--_raw_spin_lock_irqsave
| | |
| | --0.54%--lock_acquire
| |
| --0.72%--down_read_trylock
| |
| --0.55%--lock_acquire
|
--1.12%--__lock_text_start
|
--0.82%--lock_acquire
Check out more stuff I am/was working on and other notes on my website