Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blueprint execution is surprisingly slow #7217

Open
jgallagher opened this issue Dec 9, 2024 · 7 comments
Open

Blueprint execution is surprisingly slow #7217

jgallagher opened this issue Dec 9, 2024 · 7 comments
Assignees

Comments

@jgallagher
Copy link
Contributor

Poking around on dogfood, I noted blueprint execution was reporting success but taking several minutes:

task: "blueprint_executor"
  configured period: every 1m
  currently executing: iter 1003, triggered by a periodic timer firing
    started at 2024-12-09T20:10:42.700Z, running for 2497ms
  last completed activation: iter 1002, triggered by a periodic timer firing
    started at 2024-12-09T20:06:21.942Z (263s ago) and ran for 260744ms
    target blueprint: d08ce52e-7af3-4caa-9505-d695236c18b0
    execution:        enabled
    status:           completed (16 steps)
    error:            (none)

Looking at the status report for one execution, it's the three steps that send disks, datasets, and zones to each sled agent that are taking almost all of the time (steps 3-5):

root@oxz_switch0:~# omdb nexus background-tasks print-report blueprint_executor
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:104::3]:12221
[Dec 09 20:15:05]   Running ( 1/16) Ensure external networking resources
[Dec 09 20:15:05] Completed ( 1/16) Ensure external networking resources: after 64.24ms
[Dec 09 20:15:05]   Running ( 2/16) Fetch sled list
[Dec 09 20:15:05] Completed ( 2/16) Fetch sled list: after 17.27ms
[Dec 09 20:15:05]   Running ( 3/16) Deploy physical disks
[Dec 09 20:16:18] Completed ( 3/16) Deploy physical disks: after 73.64s
[Dec 09 20:16:18]   Running ( 4/16) Deploy datasets
[Dec 09 20:17:46] Completed ( 4/16) Deploy datasets: after 87.63s
[Dec 09 20:17:46]   Running ( 5/16) Deploy Omicron zones
[Dec 09 20:19:15] Completed ( 5/16) Deploy Omicron zones: after 88.76s
[Dec 09 20:19:15]   Running ( 6/16) Plumb service firewall rules
[Dec 09 20:19:15] Completed ( 6/16) Plumb service firewall rules: after 470.23ms
[Dec 09 20:19:15]   Running ( 7/16) Ensure dataset records
[Dec 09 20:19:29] Completed ( 7/16) Ensure dataset records: after 13.77s
[Dec 09 20:19:29]   Running ( 8/16) Deploy DNS records
[Dec 09 20:19:29] Completed ( 8/16) Deploy DNS records: after 210.87ms
[Dec 09 20:19:29]   Running ( 9/16) Cleanup expunged zones
[Dec 09 20:19:29] Completed ( 9/16) Cleanup expunged zones: after 17.28µs
[Dec 09 20:19:29]   Running (10/16) Decommission sleds
[Dec 09 20:19:29] Completed (10/16) Decommission sleds: after 5.01µs
[Dec 09 20:19:29]   Running (11/16) Decommission expunged disks
[Dec 09 20:19:29] Completed (11/16) Decommission expunged disks: after 88.53ms
[Dec 09 20:19:29]   Running (12/16) Deploy clickhouse cluster nodes
[Dec 09 20:19:29] Completed (12/16) Deploy clickhouse cluster nodes: after 4.17µs
[Dec 09 20:19:29]   Running (13/16) Deploy single-node clickhouse cluster
[Dec 09 20:19:30] Completed (13/16) Deploy single-node clickhouse cluster: after 240.05ms
[Dec 09 20:19:30]   Running (14/16) Reassign sagas
[Dec 09 20:19:30] Completed (14/16) Reassign sagas: after 37.94ms
[Dec 09 20:19:30]   Running (15/16) Ensure CockroachDB settings
[Dec 09 20:19:30] Completed (15/16) Ensure CockroachDB settings: after 7.91ms
[Dec 09 20:19:30]   Running (16/16) Finalize and check for errors
[Dec 09 20:19:30] Completed (16/16) Finalize and check for errors: after 5.96µs

Grepping out the start and stop for each PUT request for the "Deploy physical disks" step in one execution, we don't have one outlying sled; rather, every sled takes a handful of seconds (which is surprising!), and we don't currently parallelize this step:

2024-12-09 20:02:01.632Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client request
2024-12-09 20:02:03.956Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client response
2024-12-09 20:02:03.994Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client request
2024-12-09 20:02:10.293Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client response
2024-12-09 20:02:10.479Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client request
2024-12-09 20:02:16.788Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client response
2024-12-09 20:02:16.828Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client request
2024-12-09 20:02:24.219Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client response
2024-12-09 20:02:24.256Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client request
2024-12-09 20:02:31.144Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client response
2024-12-09 20:02:31.183Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client request
2024-12-09 20:02:40.839Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client response
2024-12-09 20:02:41.015Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client request
2024-12-09 20:02:48.313Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client response
2024-12-09 20:02:48.353Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client request
2024-12-09 20:02:57.234Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client response
2024-12-09 20:02:57.273Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client request
2024-12-09 20:03:01.811Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client response
2024-12-09 20:03:01.850Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client request
2024-12-09 20:03:08.276Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client response
2024-12-09 20:03:08.313Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client request
2024-12-09 20:03:15.376Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client response
2024-12-09 20:03:15.415Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client request
2024-12-09 20:03:24.249Z DEBG 20b100d0-84c3-4119-aa9b-0c632b0b6a3a/13246 (ServerContext) on oxz_nexus_20b100d0-84c3-4119-aa9b-0c632b0b6a3a: client response
@leftwo
Copy link
Contributor

leftwo commented Dec 9, 2024

On dublin with four sleds, running the same build as dogfood I see something similar:

root@oxz_switch0:~# omdb nexus background-tasks show blueprint_executor                           
task: "blueprint_executor"
  configured period: every 1m
  currently executing: iter 4151, triggered by a periodic timer firing
    started at 2024-12-09T22:28:48.455Z, running for 87505ms
  last completed activation: iter 4150, triggered by a periodic timer firing
    started at 2024-12-09T22:27:27.751Z (168s ago) and ran for 80699ms
    target blueprint: 4c81a2a4-f927-4692-8f2e-f7916043f006 
    execution:        enabled                              
    status:           completed (16 steps)                 
    error:            (none)             

Then, the breakdown has:

root@oxz_switch0:~# omdb nexus background-tasks print-report blueprint_executor 2> /dev/null
[Dec 09 22:27:27]   Running ( 1/16) Ensure external networking resources
[Dec 09 22:27:27] Completed ( 1/16) Ensure external networking resources: after 32.27ms
[Dec 09 22:27:27]   Running ( 2/16) Fetch sled list
[Dec 09 22:27:27] Completed ( 2/16) Fetch sled list: after 10.79ms
[Dec 09 22:27:27]   Running ( 3/16) Deploy physical disks
[Dec 09 22:27:54] Completed ( 3/16) Deploy physical disks: after 26.97s
[Dec 09 22:27:54]   Running ( 4/16) Deploy datasets
[Dec 09 22:28:24] Completed ( 4/16) Deploy datasets: after 29.41s
[Dec 09 22:28:24]   Running ( 5/16) Deploy Omicron zones
[Dec 09 22:28:47] Completed ( 5/16) Deploy Omicron zones: after 23.43s
[Dec 09 22:28:47]   Running ( 6/16) Plumb service firewall rules
[Dec 09 22:28:48] Completed ( 6/16) Plumb service firewall rules: after 496.70ms
[Dec 09 22:28:48]   Running ( 7/16) Ensure dataset records
[Dec 09 22:28:48] Completed ( 7/16) Ensure dataset records: after 14.44ms
[Dec 09 22:28:48]   Running ( 8/16) Deploy DNS records
[Dec 09 22:28:48] Completed ( 8/16) Deploy DNS records: after 168.84ms
[Dec 09 22:28:48]   Running ( 9/16) Cleanup expunged zones
[Dec 09 22:28:48] Completed ( 9/16) Cleanup expunged zones: after 3.72µs
[Dec 09 22:28:48]   Running (10/16) Decommission sleds
[Dec 09 22:28:48] Completed (10/16) Decommission sleds: after 2.26µs
[Dec 09 22:28:48]   Running (11/16) Decommission expunged disks
[Dec 09 22:28:48] Completed (11/16) Decommission expunged disks: after 70.30ms
[Dec 09 22:28:48]   Running (12/16) Deploy clickhouse cluster nodes
[Dec 09 22:28:48] Completed (12/16) Deploy clickhouse cluster nodes: after 3.24µs
[Dec 09 22:28:48]   Running (13/16) Deploy single-node clickhouse cluster
[Dec 09 22:28:48] Completed (13/16) Deploy single-node clickhouse cluster: after 52.36ms
[Dec 09 22:28:48]   Running (14/16) Reassign sagas
[Dec 09 22:28:48] Completed (14/16) Reassign sagas: after 38.35ms
[Dec 09 22:28:48]   Running (15/16) Ensure CockroachDB settings
[Dec 09 22:28:48] Completed (15/16) Ensure CockroachDB settings: after 7.54ms
[Dec 09 22:28:48]   Running (16/16) Finalize and check for errors
[Dec 09 22:28:48] Completed (16/16) Finalize and check for errors: after 6.05µs

With the long poles being:

[Dec 09 22:27:54] Completed ( 3/16) Deploy physical disks: after 26.97s
[Dec 09 22:28:24] Completed ( 4/16) Deploy datasets: after 29.41s
[Dec 09 22:28:47] Completed ( 5/16) Deploy Omicron zones: after 23.43s

@jgallagher
Copy link
Contributor Author

jgallagher commented Dec 10, 2024

I think the sled-agent /datasets endpoint is sluggish, and also it's locking out any concurrent requests to /omicron-zones, /omicron-physical-disks, and /inventory while it runs. I grepped out some requests to various endpoints from a sled on dogfood while blueprint execution was running right after the rack was updated and restarted:

endpoint                 start                              end                                elapsed (ms)
/omicron-physical-disks  2024-12-10 16:32:19.587296382 UTC  2024-12-10 16:32:19.593388052 UTC             6
/omicron-physical-disks  2024-12-10 16:32:23.304662626 UTC  2024-12-10 16:32:23.305645873 UTC             0
/omicron-physical-disks  2024-12-10 16:33:09.163541478 UTC  2024-12-10 16:33:09.164933338 UTC             1
/datasets                2024-12-10 16:33:09.189697079 UTC  2024-12-10 16:33:13.320749909 UTC          4131
/datasets                2024-12-10 16:33:18.675847180 UTC  2024-12-10 16:33:22.652683754 UTC          3976
/datasets                2024-12-10 16:34:02.052130582 UTC  2024-12-10 16:34:06.186403126 UTC          4134
/omicron-zones           2024-12-10 16:34:03.966964868 UTC  2024-12-10 16:34:07.356459703 UTC          3389
/omicron-zones           2024-12-10 16:34:03.978573740 UTC  2024-12-10 16:34:07.372660449 UTC          3394
/omicron-physical-disks  2024-12-10 16:34:29.942903743 UTC  2024-12-10 16:34:29.944226793 UTC             1
/omicron-physical-disks  2024-12-10 16:34:29.945366486 UTC  2024-12-10 16:34:29.946320128 UTC             0
/omicron-zones           2024-12-10 16:34:35.124805339 UTC  2024-12-10 16:34:35.772979916 UTC           648
/datasets                2024-12-10 16:35:23.675288547 UTC  2024-12-10 16:35:27.871868851 UTC          4196
/datasets                2024-12-10 16:35:33.370408785 UTC  2024-12-10 16:35:37.474463893 UTC          4104
/omicron-physical-disks  2024-12-10 16:35:33.370539122 UTC  2024-12-10 16:35:37.474601062 UTC          4104
/inventory               2024-12-10 16:35:33.670486328 UTC  2024-12-10 16:35:38.145684914 UTC          4475
/inventory               2024-12-10 16:35:33.670624679 UTC  2024-12-10 16:35:38.203924377 UTC          4533
/inventory               2024-12-10 16:35:33.770949805 UTC  2024-12-10 16:35:38.250641330 UTC          4479
/omicron-zones           2024-12-10 16:35:58.021789480 UTC  2024-12-10 16:35:58.639234478 UTC           617
/datasets                2024-12-10 16:36:55.508769934 UTC  2024-12-10 16:36:59.745901648 UTC          4237
/omicron-physical-disks  2024-12-10 16:36:55.928026961 UTC  2024-12-10 16:36:59.882643310 UTC          3954
/omicron-zones           2024-12-10 16:36:56.715665970 UTC  2024-12-10 16:37:00.386476496 UTC          3670
/inventory               2024-12-10 16:37:08.977147900 UTC  2024-12-10 16:37:09.348134194 UTC           370
/inventory               2024-12-10 16:37:12.483570064 UTC  2024-12-10 16:37:12.847962095 UTC           364
/datasets                2024-12-10 16:38:15.471668355 UTC  2024-12-10 16:38:19.768682809 UTC          4297
/omicron-physical-disks  2024-12-10 16:38:15.785652961 UTC  2024-12-10 16:38:19.913397321 UTC          4127
/omicron-zones           2024-12-10 16:38:17.079578845 UTC  2024-12-10 16:38:20.804307740 UTC          3724
/inventory               2024-12-10 16:38:17.152064783 UTC  2024-12-10 16:38:20.892912282 UTC          3740
/inventory               2024-12-10 16:38:17.256640580 UTC  2024-12-10 16:38:20.918535894 UTC          3661
/datasets                2024-12-10 16:39:34.029437184 UTC  2024-12-10 16:39:38.182095033 UTC          4152
/omicron-physical-disks  2024-12-10 16:39:34.148391858 UTC  2024-12-10 16:39:38.253062752 UTC          4104
/omicron-zones           2024-12-10 16:39:35.486425043 UTC  2024-12-10 16:39:39.114789585 UTC          3628
/inventory               2024-12-10 16:39:35.459812054 UTC  2024-12-10 16:39:39.126371609 UTC          3666
/datasets                2024-12-10 16:40:55.260288302 UTC  2024-12-10 16:40:59.501616856 UTC          4241
/omicron-physical-disks  2024-12-10 16:40:55.632899841 UTC  2024-12-10 16:40:59.565914172 UTC          3933
/omicron-zones           2024-12-10 16:40:56.717174803 UTC  2024-12-10 16:41:00.423526334 UTC          3706
/inventory               2024-12-10 16:40:56.700412108 UTC  2024-12-10 16:41:00.435287383 UTC          3734
/datasets                2024-12-10 16:42:14.080859458 UTC  2024-12-10 16:42:18.340363882 UTC          4259
/omicron-physical-disks  2024-12-10 16:42:14.449373397 UTC  2024-12-10 16:42:18.418007297 UTC          3968
/omicron-zones           2024-12-10 16:42:15.539233466 UTC  2024-12-10 16:42:19.277100534 UTC          3737
/inventory               2024-12-10 16:42:15.520360269 UTC  2024-12-10 16:42:19.288969678 UTC          3768
/inventory               2024-12-10 16:42:28.173088082 UTC  2024-12-10 16:42:28.547470097 UTC           374

There are calls where each of the three non-/datasets endpoints are relatively fast (omicron-physical-disks is occasionally single-digit millis, inventory and zones are occasionally a few hundred millis), but /datasets is always a few seconds, and any of the other requests that come in while a /datasets request is outstanding complete shortly after their overlapping /datasets call completes.

Waiting until the sled has been up for longer, we start to see /datasets calls overlap with each other, pushing their times up to double what they were above:

endpoint                 start                              end                                elapsed (ms)
/datasets                2024-12-10 19:30:04.417608704 UTC  2024-12-10 19:30:13.674219143 UTC          9256
/omicron-physical-disks  2024-12-10 19:30:04.634426756 UTC  2024-12-10 19:30:13.951781871 UTC          9317
/omicron-zones           2024-12-10 19:30:05.421701072 UTC  2024-12-10 19:30:15.255703981 UTC          9834
/datasets                2024-12-10 19:31:26.680641343 UTC  2024-12-10 19:31:35.502109929 UTC          8821
/omicron-physical-disks  2024-12-10 19:31:26.868816815 UTC  2024-12-10 19:31:35.739100922 UTC          8870
/omicron-zones           2024-12-10 19:31:27.689736560 UTC  2024-12-10 19:31:37.047586365 UTC          9357
/datasets                2024-12-10 19:32:50.476296290 UTC  2024-12-10 19:32:59.377338381 UTC          8901
/omicron-physical-disks  2024-12-10 19:32:50.476355874 UTC  2024-12-10 19:32:59.488445416 UTC          9012
/omicron-zones           2024-12-10 19:32:51.473115964 UTC  2024-12-10 19:33:01.273352927 UTC          9800
/inventory               2024-12-10 19:32:51.604135644 UTC  2024-12-10 19:33:01.388828302 UTC          9784
/inventory               2024-12-10 19:32:51.627207232 UTC  2024-12-10 19:33:01.457466407 UTC          9830
/inventory               2024-12-10 19:33:01.793520882 UTC  2024-12-10 19:33:02.374974605 UTC           581
/omicron-physical-disks  2024-12-10 19:34:13.670806718 UTC  2024-12-10 19:34:13.672299645 UTC             1
/datasets                2024-12-10 19:34:13.919867645 UTC  2024-12-10 19:34:24.145159957 UTC         10225
/omicron-zones           2024-12-10 19:34:14.629696736 UTC  2024-12-10 19:34:25.520971762 UTC         10891
/datasets                2024-12-10 19:34:48.058007411 UTC  2024-12-10 19:34:56.895304978 UTC          8837
/inventory               2024-12-10 19:34:48.770993832 UTC  2024-12-10 19:34:57.726492139 UTC          8955
/inventory               2024-12-10 19:34:48.745288403 UTC  2024-12-10 19:34:57.726604423 UTC          8981
/omicron-physical-disks  2024-12-10 19:35:08.455756233 UTC  2024-12-10 19:35:08.457224402 UTC             1
/omicron-zones           2024-12-10 19:35:10.952099058 UTC  2024-12-10 19:35:12.510152131 UTC          1558
/datasets                2024-12-10 19:36:10.180233743 UTC  2024-12-10 19:36:20.315956080 UTC         10135
/omicron-physical-disks  2024-12-10 19:36:10.383876446 UTC  2024-12-10 19:36:20.537502087 UTC         10153
/omicron-zones           2024-12-10 19:36:11.197412397 UTC  2024-12-10 19:36:21.742602388 UTC         10545
/inventory               2024-12-10 19:36:25.462015200 UTC  2024-12-10 19:36:26.008669566 UTC           546
/datasets                2024-12-10 19:37:35.663462110 UTC  2024-12-10 19:37:44.297856734 UTC          8634
/omicron-physical-disks  2024-12-10 19:37:35.678431540 UTC  2024-12-10 19:37:44.412639087 UTC          8734
/inventory               2024-12-10 19:37:37.016285594 UTC  2024-12-10 19:37:46.096294923 UTC          9080
/omicron-zones           2024-12-10 19:37:37.044056302 UTC  2024-12-10 19:37:46.105494963 UTC          9061
/datasets                2024-12-10 19:38:56.487077869 UTC  2024-12-10 19:39:05.378389249 UTC          8891
/omicron-physical-disks  2024-12-10 19:38:57.885434739 UTC  2024-12-10 19:39:05.578758209 UTC          7693
/omicron-zones           2024-12-10 19:38:57.885573553 UTC  2024-12-10 19:39:06.707469495 UTC          8821
/inventory               2024-12-10 19:39:14.047075616 UTC  2024-12-10 19:39:14.736069062 UTC           688
/inventory               2024-12-10 19:39:16.914554164 UTC  2024-12-10 19:39:17.457563298 UTC           543
/inventory               2024-12-10 19:39:32.202042994 UTC  2024-12-10 19:39:32.765631603 UTC           563
/datasets                2024-12-10 19:40:19.458998465 UTC  2024-12-10 19:40:28.967428579 UTC          9508
/omicron-physical-disks  2024-12-10 19:40:19.663157497 UTC  2024-12-10 19:40:29.210618973 UTC          9547
/omicron-zones           2024-12-10 19:40:20.511784729 UTC  2024-12-10 19:40:30.385916591 UTC          9874
/inventory               2024-12-10 19:40:42.567889652 UTC  2024-12-10 19:40:43.111313338 UTC           543
/datasets                2024-12-10 19:41:42.307544032 UTC  2024-12-10 19:41:52.106217013 UTC          9798
/omicron-physical-disks  2024-12-10 19:41:42.307715148 UTC  2024-12-10 19:41:52.229141210 UTC          9921
/omicron-zones           2024-12-10 19:41:42.829732289 UTC  2024-12-10 19:41:53.540548291 UTC         10710

Two questions:

  • Why is /datasets so slow?
  • Why does /datasets lock out the other endpoints?

I think the second question is easy to answer: the sled-agent StorageManager is structured as a task with a channel for requests. All four of these endpoints interact with the storage manager task by sending it a request over that channel (e.g., datasets_ensure, omicron_physical_disks_ensure). The storage manager task serializes these requests, so if one of them (e.g., ensuring datasets) is slow, any other requests on the channel will block until it's able to complete all prior requests).

I'm less sure about the first question. We're ensuring roughly 40-50 datasets per sled (4 per disk for Debug, ZoneRoot, the disk's crucible zone's transient data, and the disk's crucible zone's durable data, plus any other datasets for extra control plane zones). It looks like ensuring each individual dataset is taking a couple hundred milliseconds, which does add up to the ~4 seconds I'm seeing above:

2024-12-10 21:08:11.490Z INFO oxp_0c4ef358-5533-43db-ad38-a8eff716e53a/crypt/debug/655 (StorageManager) on BRM42220014: Ensuring dataset
2024-12-10 21:08:11.715Z INFO oxp_34dadf3f-f60c-4acc-b82b-4b0c82224222/crypt/debug/655 (StorageManager) on BRM42220014: Ensuring dataset
2024-12-10 21:08:11.931Z INFO oxp_a726cacd-fa35-4ed2-ade6-31ad928b24cb/crypt/debug/655 (StorageManager) on BRM42220014: Ensuring dataset
2024-12-10 21:08:12.158Z INFO oxp_34dadf3f-f60c-4acc-b82b-4b0c82224222/crypt/zone/655 (StorageManager) on BRM42220014: Ensuring dataset
2024-12-10 21:08:12.381Z INFO oxp_416232c1-bc8f-403f-bacb-28403dd8fced/crypt/zone/655 (StorageManager) on BRM42220014: Ensuring dataset
2024-12-10 21:08:12.615Z INFO oxp_f522118c-5dcd-4116-8044-07f0cceec52e/crucible/655 (StorageManager) on BRM42220014: Ensuring dataset
2024-12-10 21:08:12.838Z INFO oxp_f522118c-5dcd-4116-8044-07f0cceec52e/crypt/zone/655 (StorageManager) on BRM42220014: Ensuring dataset
2024-12-10 21:08:13.066Z INFO oxp_0c4ef358-5533-43db-ad38-a8eff716e53a/crypt/zone/655 (StorageManager) on BRM42220014: Ensuring dataset

That seems like a long time, but maybe it's reasonable? We do have to shell out to ZFS at least twice even for datasets that already exist (to confirm they exist and that they have the correct UUID property).

@davepacheco
Copy link
Collaborator

We do have to shell out to ZFS at least twice even for datasets that already exist (to confirm they exist and that they have the correct UUID property.

Might it be possible to issue one zfs list for all of them, using the parseable output format and specifically requesting the custom property? I'd expect that to be much faster unless there are many more datasets on the system that we don't care about.

@smklein smklein self-assigned this Dec 10, 2024
smklein added a commit that referenced this issue Dec 13, 2024
Major part of #7217

Optimizations made:
- `zfs get` is queried, so properties are samples for all datasets of
interest ahead-of-time. No subsequent processes are exec'd for each
dataset that needs no changes.
- The "dataset ensure" process is now concurrent

These optimizations should significantly improve the latency of the "no
(or few) changes necessary" cases.

Optimizations still left to be made:
- Making blueprint execution concurrent, rather than blocking
one-sled-at-a-time
- Patching the illumos_utils "Zfs::ensure_filesystem" to re-use the
pre-fetched properties, and minimize re-querying
@smklein
Copy link
Collaborator

smklein commented Dec 16, 2024

I expect #7236 to mitigate the performance issues in the common cases, but would still be interested in keeping an eye on this in dogfood. Keeping this open in case further optimizations are warranted.

@jgallagher
Copy link
Contributor Author

Checking in on dogfood (which has #7236), things have improved significantly; the most recent execution was ~95 seconds:

task_name                                      PGEN# PSTART               Psecs P CSTART               C
blueprint_executor                             2879  2024-12-16T19:04:30Z  95.5 T 2024-12-16T19:06:05Z T

almost all of which was putting zones:

[Dec 16 19:04:30]   Running ( 1/16) Ensure external networking resources
[Dec 16 19:04:30] Completed ( 1/16) Ensure external networking resources: after 53.60ms
[Dec 16 19:04:30]   Running ( 2/16) Fetch sled list
[Dec 16 19:04:30] Completed ( 2/16) Fetch sled list: after 13.80ms
[Dec 16 19:04:30]   Running ( 3/16) Deploy physical disks
[Dec 16 19:04:34] Completed ( 3/16) Deploy physical disks: after 4.71s
[Dec 16 19:04:34]   Running ( 4/16) Deploy datasets
[Dec 16 19:04:41] Completed ( 4/16) Deploy datasets: after 6.29s
[Dec 16 19:04:41]   Running ( 5/16) Deploy Omicron zones
[Dec 16 19:06:04] Completed ( 5/16) Deploy Omicron zones: after 83.51s
[Dec 16 19:06:04]   Running ( 6/16) Plumb service firewall rules
[Dec 16 19:06:05] Completed ( 6/16) Plumb service firewall rules: after 470.13ms
[Dec 16 19:06:05]   Running ( 7/16) Ensure dataset records
[Dec 16 19:06:05] Completed ( 7/16) Ensure dataset records: after 16.80ms
[Dec 16 19:06:05]   Running ( 8/16) Deploy DNS records
[Dec 16 19:06:05] Completed ( 8/16) Deploy DNS records: after 224.08ms
[Dec 16 19:06:05]   Running ( 9/16) Cleanup expunged zones
[Dec 16 19:06:05] Completed ( 9/16) Cleanup expunged zones: after 13.42µs
[Dec 16 19:06:05]   Running (10/16) Decommission sleds
[Dec 16 19:06:05] Completed (10/16) Decommission sleds: after 4.23µs
[Dec 16 19:06:05]   Running (11/16) Decommission expunged disks
[Dec 16 19:06:05] Completed (11/16) Decommission expunged disks: after 41.09ms
[Dec 16 19:06:05]   Running (12/16) Deploy clickhouse cluster nodes
[Dec 16 19:06:05] Completed (12/16) Deploy clickhouse cluster nodes: after 2.41µs
[Dec 16 19:06:05]   Running (13/16) Deploy single-node clickhouse cluster
[Dec 16 19:06:05] Completed (13/16) Deploy single-node clickhouse cluster: after 87.87ms
[Dec 16 19:06:05]   Running (14/16) Reassign sagas
[Dec 16 19:06:05] Completed (14/16) Reassign sagas: after 107.43ms
[Dec 16 19:06:05]   Running (15/16) Ensure CockroachDB settings
[Dec 16 19:06:05] Completed (15/16) Ensure CockroachDB settings: after 8.19ms
[Dec 16 19:06:05]   Running (16/16) Finalize and check for errors
[Dec 16 19:06:05] Completed (16/16) Finalize and check for errors: after 6.08µs

It looks like the zones handler ensure all its datasets one by one, which bypasses the batch improvements made in #7236:

// TODO(https://github.com/oxidecomputer/omicron/issues/6043):
// - If these are the set of filesystems, we should also consider
// removing the ones which are not listed here.
// - It's probably worth sending a bulk request to the storage system,
// rather than requesting individual datasets.
for zone in &requested_zones.zones {
let Some(dataset_name) = zone.dataset_name() else {
continue;
};
// First, ensure the dataset exists
let dataset_id = zone.id.into_untyped_uuid();
self.inner
.storage
.upsert_filesystem(dataset_id, dataset_name)
.await?;
}

My gut feeling is that this isn't worth optimizing, though; we landed-then-reverted #7006 which removes this loop entirely, and we want to put that back once we can. (Is that still R12 despite #7229?)

@smklein
Copy link
Collaborator

smklein commented Dec 16, 2024

I do think #7160 - which re-lands this change, removing the loop - relies on #7229. We should fix #7229 , then merge #7160, and then I expect the zone provisioning time to improve significantly.

I believe we just cut a release for R12, so unlikely to land there?

@andrewjstone
Copy link
Contributor

Thanks for the great teamwork here @smklein @jgallagher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants