forked from beninim/slurm_simulator
-
Notifications
You must be signed in to change notification settings - Fork 0
/
simulator_notes.txt
868 lines (724 loc) · 48.8 KB
/
simulator_notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
Preamble:
---------
The Slurm Workload Simulator's aim is to provide a means of executing a set of jobs, a workload, in the Slurm system
without executing actual programs. The idea is to see how Slurm handles and schedules various workloads under different
configurations. For instance, an administrator may be interested to know how a job of a given size from a particular group
will get scheduled given the current workload on that system. He would be able to set up a simulator environment and with
a workload representing that of the real system plus the hypothetical job, he could submit it to the simulator and see the
working of Slurm, in this simulated environment, in faster than real-time.
The approach that had been initially taken to achieve this type of functionality and which has been kept and expanded
upon is to speed up time and to allow just for the specifications of some of the Slurm job attributes but not any actual job
steps. From a user's perspective, he would create a trace (workload) file that contains the specifications for each job that
is to be simulated and then start the simulator using this file as input.
Entities:
---------
slurmctld [Modified]
slurmd [Modified]
slurmdbd [Modified]
sim_mgr [NEW]
Running the Simulator:
----------------------
The sim_mgr is the driver of the simulation. It maintains the concept of simulated time and other
pertinent values in shared memory. It also, by default, will launch the slurmctld and slurmd's.
sim_mgr [endtime] [OPTIONS]
Valid OPTIONS are:
-c, --compath cpath 'cpath' is the path to the slurmctld and slurmd
(applicable only if launching daemons).
Specification of this option supersedes any
setting of SIM_DAEMONS_PATH. If neither is
specified then the sim_mgr looks in a sibling
directory of where it resides called sbin.
Finally, if still not found then the default
is /sbin.
-n, --nofork Do NOT fork the controller and daemons. The
user will be responsible for starting them
separately.
-a, --accelerator secs 'secs' is the interval, in simulated seconds,
to increment the simulated time after each
cycle instead of merely one.
-w, --wrkldfile filename 'filename' is the name of the trace file
containing the information of the jobs to
simulate.
-s, --nodenames nodeexpr 'nodeexpr' is an expression representing all
of the slurmd names to use when launching the
daemons--should correspond exactly with what
is defined in the slurm.conf.
-h, --help This help message.
Notes:
'endtime' is specified as seconds since Unix epoch. If 0 is specified
then the simulator will run indefinitely.
The debug level can be increased by sending the SIGUSR1 signal to
the sim_mgr.
$ kill -SIGUSR1 `pidof sim_mgr`
The debug level will be incremented by one with each signal sent,
wrapping back to zero after eight.
Example 1:
To run a simulation with the default trace file name (test.trace) where the Slurm configuration simply specifies
a single slurmd:
$ sim_mgr
Example 2:
To run a simulation with a trace file in some directory called /home/someuser/workload.trace where the Slurm
configuration simply specifies a single slurmd:
$ sim_mgr -w /home/someuser/workload.trace
Example 3:
To launch the simulation with a trace file called "workload.trace" and with a Slurm configuration that specifies
five front-ends named "node1,""node2,""node3,""node4" and "node5;" one possible command line would be:
$ sim_mgr -w workload.trace -s node[1-5]
Example 4:
To run the same job as above but with a speed-up factor of 100:
$ sim_mgr -w workload.trace -s node[1-5] -a 100
Monitoring Status of a Running Simulation:
------------------------------------------
Being that as part of the simulation, an instance of Slurm is running (slurmctld), the normal Slurm commands such
as "squeue" and "scontrol" can be used to see the status of the queue and the jobs that are in it. Additionally, the
Simulator-specific command "simdate" is provided as a means to view the date/time stamp of the simulation. The original
intention of this command was to be the Simulator's equivalent of the "date" command so that, at any given moment, the user
would know at what "time" the simulator is currently at. However, the command has evolved to also include the ability
to view all the fields of the special shared memory segment and to even allow the altering of a few of them.
simdate [OPTIONS]
-s, --showmem Display contents of shared memory
-i, --timeincr seconds Increment simulated time by 'seconds'
-f, --flag 1-n Set the global synchronization flag
-h, --help This help message
Notes: The simdate command's primary function is to
display the current simulated time. If, no arguments are
given, then this is the output. However, it also serves to
both display the synchronization semaphore value
and the contents of the shared memory segment.
Furthermore, it can increment the simulated time and set
the global sync flag.
Example 1:
To simply display the current date/time stamp of the Simulator:
$ simdate
Example 2:
Display the entire contents of the shared memory segment:
$ simdate -s
Example 3:
Increase the simulator's time by three minutes and display memory:
$ simdate -s -i 180
Example 4:
Manually set the global synchronization flag to three:
$ simdate -f 3
Note: Manually altering the contents of the shared memory could lead to unexpected results and should be done
with caution.
* The time can actually accept negative values and consequently be set backwards. However, this does
not roll any jobs back. It is recommended not to set the time backwards.
* The global sync flag pertains to how the slurmctld and slurmd's coordinate with the sim_mgr. It is
currently a simple mechanism by which a global flag is maintained in the shared memory. When the value is 1,
it is the turn of the sim_mgr to do what it needs in the given cycle. Once it is finished with its work for
the cycle it increments the value to 2 and at this point, the slurmctld and slurmd's can all function as they
otherwise normally would, each incrementing the flag once it is done with its work for the given cycle until
it reaches 1 + slurmd_count. The final daemon to increment will set the value back to 1, indicating that once
again it is the sim_mgr's turn. The precise values that are acceptable depend upon how many daemons are
running. If there is only one slurmd, then it should be 1-3. If there are five slurmd's running, as in
some of the above examples, then it would be 1-6. If the sim_mgr had been previously run, then the help
command for simdate will actually display the acceptable range based upon the currently configured number of
slurmd's according to the shared memory segment. This field should only be manually modified if there is a
problem where the simulator gets stuck and the user of the simulator wants to experiment to see what would
happen if he forces a change in state. It is not recommended for normal simulation.
* The slurmd_pid field in the shared memory segment currently only shows the pid of one slurmd. Thus,
if there are more than one slurmd executing than only one would be displayed. Use normal "pidof" or "ps"
commands to see the pid's of the other slurmd's. This field is only for informational purposes and is non-essential
to the execution of the simulator.
Finally, normal system monitoring commands such as "ps" can always be used to see how the system is
behaving during the simulation. For our part, we like to use the "pidof" command to list the pids of the various
entities and to script the use of the "ps -eLF" command so that we can see all of the threads of the various
entities running at any given time.
Creating a Workload Trace File:
-------------------------------
There are three general methods of creating a workload trace file. The original method is to create
a completely "synthetic" workload. The second method is to take a "snapshot" of actual running Slurm system
and the third is to take historical job information from the Slurm DB.
trace_builder: Builds a generic workload trace file.
Usage: trace_builder [OPTIONS]
-c <cpus_per_task>, The number of CPU's per task
-d <duration>, The number of seconds that each job will run
-i <id_start>, The first jobid. All subsequent jobid's will
be one greater than the previous.
-l <number_of_jobs>, The number of jobs to create in the trace file
-n <tasks_per_node>, The maximum number of tasks that a job is to
have per node.
-o <output_file>, Specifies a different output file name.
Default is simple.trace
-r, The random argument. This specifies to use
random values for duration, tasks_per_node,
cpus_per_task and tasks. This option takes
no argument.
-s <submission_step>, The amount of time, in seconds, between
submission times of jobs.
-S <initial timestamp>, The initial time stamp to use for the first
job of the workload in lieu of the default.
This must be in Unix epoch time format.
-t <tasks>, The number of tasks that a job is to have.
-u <user.sim_file>, The name of the text file containing a list
of users and their system id's.
NOTES: If not specified in conjunction with the -r (random) option, the
following options will have the following default values used in the
computation of the upper bound of the random number range:
duration = 1
tasks = 10
tasks_per_node = 8
cpus_per_task = 10
If not specified but still using the random option, the upper bounds will
be based upon arbitrary default values.
... The wall clock value will be set to always be greater than the duration.
... If using the random option, there is no guarantee that the combination
of cpus_per_task and tasks_per_node will be valid for the system being
emulated. Therefore, it is up to the user to use the -c and -n optons with
such values that the resultant product could never exceed the number of
of processors on the emulated nodes or the user will have to edit the
trace file after creating it.
Example:
Build a workload trace named "new.trace" with ten jobs, all beginning at 1459000000
(26-March-2016 14:46:40).
$ trace_builder -u /slurm_install/slurm_conf/users.sim -o new.trace -l 10 -S 1459000000
mysql_trace_builder: Builds the workload trace file from the Slurm database (when using MySQL as the database).
Usage: mysql_trace_builder [OPTIONS]
-s, --starttime time Start selecting jobs from this time
format: "yyyy-MM-DD hh:mm:ss"
-e, --endtime time Stop selecting jobs at this time
format: "yyyy-MM-DD hh:mm:ss"
-h, --host db_hostname Name of machine hosting MySQL DB
-u, --user dbuser Name of user with which to establish a
connection to the DB
-t, --table db_table Name of the MySQL table to query
-v, --verbose Increase verbosity of the messages
-f, --file filename Name of the output trace file being created
-p, --help This help message
simqsnap: Builds a workload trace file from a currently running Slurm system. All jobs in the queue are recorded.
The end time is currently assumed to be the time limit as it is not known how long a currently running or
pending job will actually take. As with all trace files, this field can be edited with the "edit_trace"
command.
Usage: simqsnap [OPTIONS]
-o, --output file The name of the workload trace file to produce.
-d, --duration_method num Number representing the method of determining
the expected end time of the job.
1 = random time length (1-max time for job)
2 = amount of time that the job had been
running on the real system at snapshot time
If pending, then use expected time.
3 = Expected time (the time limit or wclimit)
if not finished--Default
(Note: finished jobs don't typically
linger long in the queue)
-h, --help This help message
Example 1:
Create a workload file (test.trace) based upon a currently running Slurm system using the expected
time of each job as its duration.
$ simqsnap
Example 2:
Create a workload file (new.trace) based upon a currently running Slurm system using a random
value for each job's duration in the range of 0 and its expected time.
$ simqsnap -o new.trace -d 1
Viewing a Workload Trace File:
------------------------------
Being that the workload file is in binary format, the list_view command is provided to quickly
display its contents.
list_trace [OPTIONS]
-w, --wrkldfile filename The name of the trace file to view
-u, --unixtime Display submit time in Unix epoch format
Default
-r, --humantime Display submit time in a human-readable
format (YYYY-MM-DD hh:mm:ss)
-h, --help This help message.
Example:
Display the contents of a workload file called "my.trace".
$ list_trace -w my.trace
Example:
Display the contents of a workload file called "my.trace" with the submission time in human-readable
format.
$ list_trace -w my.trace -r
Editing a Workload Trace File:
------------------------------
Once you have a trace file, it will be quite useful to edit the file in various ways so as to experiment
with slightly different workloads. As the trace file is in a binary format, it would be difficult to edit
directly. Therefore, a tool called "edit_trace" is available. It works by allowing the user to select one or
more records, each corresponding to a single job, and then to apply one or more modifications to these records.
It also allows for the deletion and insertion of specified records and for the sorting (in ascending
chronological order) of all records.
edit_trace [OPTIONS]
-j, --job_id jobid Select records with job_id=joid
-u, --username name Select records with username=name
-s, --submit time Select records with submit=time
-d, --duration secs Select records with duration=secs
-w, --wclimit timeh Select records with wclimit=timeh
-t, --tasks num Select records with tasks=num
-q, --qosname qos Select records with qosname=qos
-p, --partition par Select records with partition=par
-a, --account acc Select records with account=acc
-c, --cpus_per_task cpt Select records with cpus_per_task=cpt
-n, --tasks_per_node tpn Select records with tasks_per_node=tpn
-r, --reservation res Select records with reservation=res
-e, --dependency dep Select records with dependency=dep
-x, --index idx Select record number idx
-J, --new_job_id jobid Set job_id to jobid in all matched records
-U, --new_username name Set username to name in all matched records
-S, --new_submit time Set submit to time in all matched records
-D, --new_duration secs Set duration to secs in all matched records
-W, --new_wclimit timeh Set wclimit to timeh in all matched records
-T, --new_tasks num Set tasks to num in all matched records
-Q, --new_qosname qos Set qosname to qos in all matched records
-P, --new_partition par Set partition to par in all matched records
-A, --new_account acc Set account to acc in all matched records
-C, --new_cpus_per_task cpt Set cpus_per_task to cpt in all matched
records
-N, --new_tasks_per_node tpn Set tasks_per_node to tpn in all matched
records
-R, --new_reservation res Set reservation to res in all matched records
-E, --new_dependency dep Set dependency to dep in all matched records
-X, --remove_jobs Delete all matched records
-h, --help This help message
-i, --wrkldfile name Name of the trace file to edit
-I, --insert Insert a record after each matched record
-O, --sort Sort all records in ascending
chronological order
Notes: The edit_trace utility consists of two general sets of options.
The first is the set of all options used to specify which existing
records to select. These have lower-case short options. The second
is the set of all options used to specify what the new values should
be. These have capital-case short options and their long forms are
prefixed with 'new_'.
If sorting, all other edits are performed first and then the result
is sorted.
Job id's will be out-of-order after chronological sort.
As can be seen, records can be selected based upon any of its fields and any field can
be modified. In general, lower case options are for selecting records based on a particular field
and the corresponding capital letter is for setting a new value for that field on all records
matched. Exceptions include:
-x which selects a single record based upon its index within the file
-X which states to delete all records that are matched
-i which specifies the name of the trace file to use
-I which states to insert a record (which is initially a duplicate) of each record that is matched
-O which states to sort all records (of the file).
Please beware that the edit_trace command is "destructive" in that it overwrites the contents of
the original input file once done. Therefore, if the original file is still needed, please MAKE A COPY
before executing the edit_trace command!!!
Example 1:
Select all jobs with a partition value of "short" and change it to a partition called "long".
$ edit_trace -p "short" -P "long"
Example 2:
Same as above but we will also set the duration to 1000.
$ edit_trace -p "short" -P "long" -D 1000
Example 3:
Sort all records of the file.
$ edit_trace -O
Example 4:
Insert a new record after record #42 in file "/home/user1/workload.trace", setting the job id to 12345
and the user to "user2".
$ edit_trace -i /home/user1/workload.trace -x 42 -J 12345 -U user2
NOTE: If supplying just the target jobid for a dependency, the dependency will be treated as "afterany."
Therefore, to specify other dependency types, the user can write out the full dependecy,
e.g. "edit_trace -E afterok:12345".
NOTE: To remove a dependency, reservation, partition or an account, provide a value that starts with a space,
e.g. 'edit_trace -E " "' will clear the dependency field.
In addition to edit_trace, an older and much more restrictive command is "update_trace".
This command only works on files called test.trace and only allows for the setting of a dependency or
reservation for the given job (only operates on single jobs). The edit_trace command above provides the
same functionality and much more; hence, this command is superseded.
Usage: [This command is deprecated. Use edit_trace]
update_trace [OPTIONS]
-R, --reservation States to perform a reservation update
-D, --dependency States to perform a dependency update
-n, --rsv_name name Name of reservation to use
-j, --jobid jid Select job 'jid' to modify
-r, --ref_jobid rjid Set 'rjid' as the target dependency
-a, --account
-h, --help This help message
Notes: There are two general formats, one for a dependency update and one
for reservation updates.
update_trace [-D | --dependency] [-j | --jobid] [-r | --ref_jobid]>
-- Or --
update_trace [-R | --reservation] [-n | --rsv_name] [-j | --jobid]
[-a | --account]
Command needs to specify reservation or dependency action
Example 1:
To update jobid 538330 to be dependent upon jobid 538321.
$ update_trace --dependency --jobid=538330 --ref_jobid=538321
NOTE: All job dependencies are currently treated as being of type "afterany".
Example 2:
To update the job record of test.trace with jobid of 538330 to belong to the "maint_reservation using account "test"
$ update_trace --reservation --jobid=538330 --rsv_name=maint_reservation --account=test
Preparing the Slurm source code:
--------------------------------
* Download Slurm 15.08.6
* Using the quilt command, apply the simulator patch
* Copy the new files into the appropriate locations:
* Copy sim_events.h to .../src/slurmd/slurmd
* Copy sim_funcs.h, sim_funcs.c and slurm_sim.h to .../src/common
* Copy directory "simulator" to .../contribs
Building the Slurm Simulator Source Code:
-----------------------------------------
Assuming that you have already patched Slurm and placed the new files in the appropriate directories,
the build process is essentially the same as usual with the following additions:
* export LIBS=-lrt
* export CFLAGS="-D SLURM_SIMULATOR"
If running multiple slurmd's on single node, as always, remember to use the "--enable-multiple_slurmd" option
to the Slurm configure script.
* Run the Slurm configure script (with all appropriate options as usual).
* Run make.
* Run make install
* cd the .../contribs/simulator directory
* Run make
* Run make install
Preamble:
---------
The Slurm Workload Simulator's aim is to provide a means of executing a set of jobs, a workload, in the Slurm system
without executing actual programs. The idea is to see how Slurm handles and schedules various workloads under different
configurations. For instance, an administrator may be interested to know how a job of a given size from a particular group
will get scheduled given the current workload on that system. He would be able to set up a simulator environment and with
a workload representing that of the real system plus the hypothetical job, he could submit it to the simulator and see the
working of Slurm, in this simulated environment, in faster than real-time.
The approach that had been initially taken to achieve this type of functionality and which has been kept and expanded
upon is to speed up time and to allow just for the specifications of some of the Slurm job attributes but not any actual job
steps. From a user's perspective, he would create a trace (workload) file that contains the specifications for each job that
is to be simulated and then start the simulator using this file as input.
Entities:
---------
slurmctld [Modified]
slurmd [Modified]
slurmdbd [Modified]
sim_mgr [NEW]
Running the Simulator:
----------------------
The sim_mgr is the driver of the simulation. It maintains the concept of simulated time and other
pertinent values in shared memory. It also, by default, will launch the slurmctld and slurmd's.
sim_mgr [endtime] [OPTIONS]
Valid OPTIONS are:
-c, --compath cpath 'cpath' is the path to the slurmctld and slurmd
(applicable only if launching daemons).
Specification of this option supersedes any
setting of SIM_DAEMONS_PATH. If neither is
specified then the sim_mgr looks in a sibling
directory of where it resides called sbin.
Finally, if still not found then the default
is /sbin.
-n, --nofork Do NOT fork the controller and daemons. The
user will be responsible for starting them
separately.
-a, --accelerator secs 'secs' is the interval, in simulated seconds,
to increment the simulated time after each
cycle instead of merely one.
-w, --wrkldfile filename 'filename' is the name of the trace file
containing the information of the jobs to
simulate.
-s, --nodenames nodeexpr 'nodeexpr' is an expression representing all
of the slurmd names to use when launching the
daemons--should correspond exactly with what
is defined in the slurm.conf.
-h, --help This help message.
Notes:
'endtime' is specified as seconds since Unix epoch. If 0 is specified
then the simulator will run indefinitely.
The debug level can be increased by sending the SIGUSR1 signal to
the sim_mgr.
$ kill -SIGUSR1 `pidof sim_mgr`
The debug level will be incremented by one with each signal sent,
wrapping back to zero after eight.
Example 1:
To run a simulation with the default trace file name (test.trace) where the Slurm configuration simply specifies
a single slurmd:
$ sim_mgr
Example 2:
To run a simulation with a trace file in some directory called /home/someuser/workload.trace where the Slurm
configuration simply specifies a single slurmd:
$ sim_mgr -w /home/someuser/workload.trace
Example 3:
To launch the simulation with a trace file called "workload.trace" and with a Slurm configuration that specifies
five front-ends named "node1,""node2,""node3,""node4" and "node5;" one possible command line would be:
$ sim_mgr -w workload.trace -s node[1-5]
Example 4:
To run the same job as above but with a speed-up factor of 100:
$ sim_mgr -w workload.trace -s node[1-5] -a 100
Monitoring Status of a Running Simulation:
------------------------------------------
Being that as part of the simulation, an instance of Slurm is running (slurmctld), the normal Slurm commands such
as "squeue" and "scontrol" can be used to see the status of the queue and the jobs that are in it. Additionally, the
Simulator-specific command "simdate" is provided as a means to view the date/time stamp of the simulation. The original
intention of this command was to be the Simulator's equivalent of the "date" command so that, at any given moment, the user
would know at what "time" the simulator is currently at. However, the command has evolved to also include the ability
to view all the fields of the special shared memory segment and to even allow the altering of a few of them.
simdate [OPTIONS]
-s, --showmem Display contents of shared memory
-i, --timeincr seconds Increment simulated time by 'seconds'
-f, --flag 1-n Set the global synchronization flag
-h, --help This help message
Notes: The simdate command's primary function is to
display the current simulated time. If, no arguments are
given, then this is the output. However, it also serves to
both display the synchronization semaphore value
and the contents of the shared memory segment.
Furthermore, it can increment the simulated time and set
the global sync flag.
Example 1:
To simply display the current date/time stamp of the Simulator:
$ simdate
Example 2:
Display the entire contents of the shared memory segment:
$ simdate -s
Example 3:
Increase the simulator's time by three minutes and display memory:
$ simdate -s -i 180
Example 4:
Manually set the global synchronization flag to three:
$ simdate -f 3
Note: Manually altering the contents of the shared memory could lead to unexpected results and should be done
with caution.
* The time can actually accept negative values and consequently be set backwards. However, this does
not roll any jobs back. It is recommended not to set the time backwards.
* The global sync flag pertains to how the slurmctld and slurmd's coordinate with the sim_mgr. It is
currently a simple mechanism by which a global flag is maintained in the shared memory. When the value is 1,
it is the turn of the sim_mgr to do what it needs in the given cycle. Once it is finished with its work for
the cycle it increments the value to 2 and at this point, the slurmctld and slurmd's can all function as they
otherwise normally would, each incrementing the flag once it is done with its work for the given cycle until
it reaches 1 + slurmd_count. The final daemon to increment will set the value back to 1, indicating that once
again it is the sim_mgr's turn. The precise values that are acceptable depend upon how many daemons are
running. If there is only one slurmd, then it should be 1-3. If there are five slurmd's running, as in
some of the above examples, then it would be 1-6. If the sim_mgr had been previously run, then the help
command for simdate will actually display the acceptable range based upon the currently configured number of
slurmd's according to the shared memory segment. This field should only be manually modified if there is a
problem where the simulator gets stuck and the user of the simulator wants to experiment to see what would
happen if he forces a change in state. It is not recommended for normal simulation.
* The slurmd_pid field in the shared memory segment currently only shows the pid of one slurmd. Thus,
if there are more than one slurmd executing than only one would be displayed. Use normal "pidof" or "ps"
commands to see the pid's of the other slurmd's. This field is only for informational purposes and is non-essential
to the execution of the simulator.
Finally, normal system monitoring commands such as "ps" can always be used to see how the system is
behaving during the simulation. For our part, we like to use the "pidof" command to list the pids of the various
entities and to script the use of the "ps -eLF" command so that we can see all of the threads of the various
entities running at any given time.
Creating a Workload Trace File:
-------------------------------
There are three general methods of creating a workload trace file. The original method is to create
a completely "synthetic" workload. The second method is to take a "snapshot" of actual running Slurm system
and the third is to take historical job information from the Slurm DB.
trace_builder: Builds a generic workload trace file.
Usage: trace_builder [OPTIONS]
-c <cpus_per_task>, The number of CPU's per task
-d <duration>, The number of seconds that each job will run
-i <id_start>, The first jobid. All subsequent jobid's will
be one greater than the previous.
-l <number_of_jobs>, The number of jobs to create in the trace file
-n <tasks_per_node>, The maximum number of tasks that a job is to
have per node.
-o <output_file>, Specifies a different output file name.
Default is simple.trace
-r, The random argument. This specifies to use
random values for duration, tasks_per_node,
cpus_per_task and tasks. This option takes
no argument.
-s <submission_step>, The amount of time, in seconds, between
submission times of jobs.
-S <initial timestamp>, The initial time stamp to use for the first
job of the workload in lieu of the default.
This must be in Unix epoch time format.
-t <tasks>, The number of tasks that a job is to have.
-u <user.sim_file>, The name of the text file containing a list
of users and their system id's.
NOTES: If not specified in conjunction with the -r (random) option, the
following options will have the following default values used in the
computation of the upper bound of the random number range:
duration = 1
tasks = 10
tasks_per_node = 8
cpus_per_task = 10
If not specified but still using the random option, the upper bounds will
be based upon arbitrary default values.
... The wall clock value will be set to always be greater than the duration.
... If using the random option, there is no guarantee that the combination
of cpus_per_task and tasks_per_node will be valid for the system being
emulated. Therefore, it is up to the user to use the -c and -n optons with
such values that the resultant product could never exceed the number of
of processors on the emulated nodes or the user will have to edit the
trace file after creating it.
Example:
Build a workload trace named "new.trace" with ten jobs, all beginning at 1459000000
(26-March-2016 14:46:40).
$ trace_builder -u /slurm_install/slurm_conf/users.sim -o new.trace -l 10 -S 1459000000
mysql_trace_builder: Builds the workload trace file from the Slurm database (when using MySQL as the database).
Usage: mysql_trace_builder [OPTIONS]
-s, --starttime time Start selecting jobs from this time
format: "yyyy-MM-DD hh:mm:ss"
-e, --endtime time Stop selecting jobs at this time
format: "yyyy-MM-DD hh:mm:ss"
-h, --host db_hostname Name of machine hosting MySQL DB
-u, --user dbuser Name of user with which to establish a
connection to the DB
-t, --table db_table Name of the MySQL table to query
-v, --verbose Increase verbosity of the messages
-f, --file filename Name of the output trace file being created
-p, --help This help message
simqsnap: Builds a workload trace file from a currently running Slurm system. All jobs in the queue are recorded.
The end time is currently assumed to be the time limit as it is not known how long a currently running or
pending job will actually take. As with all trace files, this field can be edited with the "edit_trace"
command.
Usage: simqsnap [OPTIONS]
-o, --output file The name of the workload trace file to produce.
-d, --duration_method num Number representing the method of determining
the expected end time of the job.
1 = random time length (1-max time for job)
2 = amount of time that the job had been
running on the real system at snapshot time
If pending, then use expected time.
3 = Expected time (the time limit or wclimit)
if not finished--Default
(Note: finished jobs don't typically
linger long in the queue)
-h, --help This help message
Example 1:
Create a workload file (test.trace) based upon a currently running Slurm system using the expected
time of each job as its duration.
$ simqsnap
Example 2:
Create a workload file (new.trace) based upon a currently running Slurm system using a random
value for each job's duration in the range of 0 and its expected time.
$ simqsnap -o new.trace -d 1
Viewing a Workload Trace File:
------------------------------
Being that the workload file is in binary format, the list_view command is provided to quickly
display its contents.
list_trace [OPTIONS]
-w, --wrkldfile filename The name of the trace file to view
-u, --unixtime Display submit time in Unix epoch format
Default
-r, --humantime Display submit time in a human-readable
format (YYYY-MM-DD hh:mm:ss)
-h, --help This help message.
Example:
Display the contents of a workload file called "my.trace".
$ list_trace -w my.trace
Example:
Display the contents of a workload file called "my.trace" with the submission time in human-readable
format.
$ list_trace -w my.trace -r
Editing a Workload Trace File:
------------------------------
Once you have a trace file, it will be quite useful to edit the file in various ways so as to experiment
with slightly different workloads. As the trace file is in a binary format, it would be difficult to edit
directly. Therefore, a tool called "edit_trace" is available. It works by allowing the user to select one or
more records, each corresponding to a single job, and then to apply one or more modifications to these records.
It also allows for the deletion and insertion of specified records and for the sorting (in ascending
chronological order) of all records.
edit_trace [OPTIONS]
-j, --job_id jobid Select records with job_id=joid
-u, --username name Select records with username=name
-s, --submit time Select records with submit=time
-d, --duration secs Select records with duration=secs
-w, --wclimit timeh Select records with wclimit=timeh
-t, --tasks num Select records with tasks=num
-q, --qosname qos Select records with qosname=qos
-p, --partition par Select records with partition=par
-a, --account acc Select records with account=acc
-c, --cpus_per_task cpt Select records with cpus_per_task=cpt
-n, --tasks_per_node tpn Select records with tasks_per_node=tpn
-r, --reservation res Select records with reservation=res
-e, --dependency dep Select records with dependency=dep
-x, --index idx Select record number idx
-J, --new_job_id jobid Set job_id to jobid in all matched records
-U, --new_username name Set username to name in all matched records
-S, --new_submit time Set submit to time in all matched records
-D, --new_duration secs Set duration to secs in all matched records
-W, --new_wclimit timeh Set wclimit to timeh in all matched records
-T, --new_tasks num Set tasks to num in all matched records
-Q, --new_qosname qos Set qosname to qos in all matched records
-P, --new_partition par Set partition to par in all matched records
-A, --new_account acc Set account to acc in all matched records
-C, --new_cpus_per_task cpt Set cpus_per_task to cpt in all matched
records
-N, --new_tasks_per_node tpn Set tasks_per_node to tpn in all matched
records
-R, --new_reservation res Set reservation to res in all matched records
-E, --new_dependency dep Set dependency to dep in all matched records
-X, --remove_jobs Delete all matched records
-h, --help This help message
-i, --wrkldfile name Name of the trace file to edit
-I, --insert Insert a record after each matched record
-O, --sort Sort all records in ascending
chronological order
Notes: The edit_trace utility consists of two general sets of options.
The first is the set of all options used to specify which existing
records to select. These have lower-case short options. The second
is the set of all options used to specify what the new values should
be. These have capital-case short options and their long forms are
prefixed with 'new_'.
If sorting, all other edits are performed first and then the result
is sorted.
Job id's will be out-of-order after chronological sort.
As can be seen, records can be selected based upon any of its fields and any field can
be modified. In general, lower case options are for selecting records based on a particular field
and the corresponding capital letter is for setting a new value for that field on all records
matched. Exceptions include:
-x which selects a single record based upon its index within the file
-X which states to delete all records that are matched
-i which specifies the name of the trace file to use
-I which states to insert a record (which is initially a duplicate) of each record that is matched
-O which states to sort all records (of the file).
Please beware that the edit_trace command is "destructive" in that it overwrites the contents of
the original input file once done. Therefore, if the original file is still needed, please MAKE A COPY
before executing the edit_trace command!!!
Example 1:
Select all jobs with a partition value of "short" and change it to a partition called "long".
$ edit_trace -p "short" -P "long"
Example 2:
Same as above but we will also set the duration to 1000.
$ edit_trace -p "short" -P "long" -D 1000
Example 3:
Sort all records of the file.
$ edit_trace -O
Example 4:
Insert a new record after record #42 in file "/home/user1/workload.trace", setting the job id to 12345
and the user to "user2".
$ edit_trace -i /home/user1/workload.trace -x 42 -J 12345 -U user2
NOTE: If supplying just the target jobid for a dependency, the dependency will be treated as "afterany."
Therefore, to specify other dependency types, the user can write out the full dependecy,
e.g. "edit_trace -E afterok:12345".
NOTE: To remove a dependency, reservation, partition or an account, provide a value that starts with a space,
e.g. 'edit_trace -E " "' will clear the dependency field.
In addition to edit_trace, an older and much more restrictive command is "update_trace".
This command only works on files called test.trace and only allows for the setting of a dependency or
reservation for the given job (only operates on single jobs). The edit_trace command above provides the
same functionality and much more; hence, this command is superseded.
Usage: [This command is deprecated. Use edit_trace]
update_trace [OPTIONS]
-R, --reservation States to perform a reservation update
-D, --dependency States to perform a dependency update
-n, --rsv_name name Name of reservation to use
-j, --jobid jid Select job 'jid' to modify
-r, --ref_jobid rjid Set 'rjid' as the target dependency
-a, --account
-h, --help This help message
Notes: There are two general formats, one for a dependency update and one
for reservation updates.
update_trace [-D | --dependency] [-j | --jobid] [-r | --ref_jobid]>
-- Or --
update_trace [-R | --reservation] [-n | --rsv_name] [-j | --jobid]
[-a | --account]
Command needs to specify reservation or dependency action
Example 1:
To update jobid 538330 to be dependent upon jobid 538321.
$ update_trace --dependency --jobid=538330 --ref_jobid=538321
NOTE: All job dependencies are currently treated as being of type "afterany".
Example 2:
To update the job record of test.trace with jobid of 538330 to belong to the "maint_reservation using account "test"
$ update_trace --reservation --jobid=538330 --rsv_name=maint_reservation --account=test
Preparing the Slurm source code:
--------------------------------
* Download Slurm 15.08.6
* Using the quilt command, apply the simulator patch
* Copy the new files into the appropriate locations:
* Copy sim_events.h to .../src/slurmd/slurmd
* Copy sim_funcs.h, sim_funcs.c and slurm_sim.h to .../src/common
* Copy directory "simulator" to .../contribs
Building the Slurm Simulator Source Code:
-----------------------------------------
Assuming that you have already patched Slurm and placed the new files in the appropriate directories,
the build process is essentially the same as usual with the following additions:
* export LIBS=-lrt
* export CFLAGS="-D SLURM_SIMULATOR"
If running multiple slurmd's on single node, as always, remember to use the "--enable-multiple_slurmd" option
to the Slurm configure script.
* Run the Slurm configure script (with all appropriate options as usual).
* Run make.
* Run make install
* cd the .../contribs/simulator directory
* Run make
* Run make install