forked from pmix/pmix-standard
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Chap_API_Proc_Mgmt.tex
803 lines (593 loc) · 38.7 KB
/
Chap_API_Proc_Mgmt.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Chapter: Process Management
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Process Management}
\label{chap:api_proc_mgmt}
This chapter defines functionality used by clients to create and destroy/abort processes in the \ac{PMIx} universe.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Abort}
\label{chap:api_proc_mgmt:abort}
\ac{PMIx} provides a dedicated API by which an application can request that specified processes be aborted by the system.
%%%%%%%%%%%
\subsection{\code{PMIx_Abort}}
\declareapi{PMIx_Abort}
%%%%
\summary
Abort the specified processes
%%%%
\format
\versionMarker{1.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_Abort(int status, const char msg[],
pmix_proc_t procs[], size_t nprocs)
\end{codepar}
\cspecificend
\begin{arglist}
\argin{status}{Error code to return to invoking environment (integer)}
\argin{msg}{String message to be returned to user (string)}
\argin{procs}{Array of \refstruct{pmix_proc_t} structures (array of handles)}
\argin{nprocs}{Number of elements in the \refarg{procs} array (integer)}
\end{arglist}
Returns \refconst{PMIX_SUCCESS} or a negative value corresponding to a PMIx error constant.
%%%%
\descr
Request that the host resource manager print the provided message and abort the provided array of \refarg{procs}.
A Unix or POSIX environment should handle the provided status as a return error code from the main program that launched the application.
A \code{NULL} for the \refarg{procs} array indicates that all processes in the caller's namespace are to be aborted, including itself.
Passing a \code{NULL} \refarg{msg} parameter is allowed.
\adviceuserstart
The response to this request is somewhat dependent on the specific \acl{RM} and its configuration (e.g., some resource managers will not abort the application if the provided status is zero unless specifically configured to do so, and some cannot abort subsets of processes in an application), and thus lies outside the control of PMIx itself.
However, the PMIx client library shall inform the \ac{RM} of the request that the specified \refarg{procs} be aborted, regardless of the value of the provided status.
Note that race conditions caused by multiple processes calling \refapi{PMIx_Abort} are left to the server implementation to resolve with regard to which status is returned and what messages (if any) are printed.
\adviceuserend
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Process Creation}
\label{chap:api_proc_mgmt:spawn}
The \refapi{PMIx_Spawn} commands spawn new processes and/or applications in the \ac{PMIx} universe. This may include requests to extend the existing resource allocation or obtain a new one, depending upon provided and supported attributes.
%%%%%%%%%%%
\subsection{\code{PMIx_Spawn}}
\declareapi{PMIx_Spawn}
%%%%
\summary
Spawn a new job.
%%%%
\format
\versionMarker{1.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_Spawn(const pmix_info_t job_info[], size_t ninfo,
const pmix_app_t apps[], size_t napps,
char nspace[])
\end{codepar}
\cspecificend
\begin{arglist}
\argin{job_info}{Array of info structures (array of handles)}
\argin{ninfo}{Number of elements in the \refarg{job_info} array (integer)}
\argin{apps}{Array of \refstruct{pmix_app_t} structures (array of handles)}
\argin{napps}{Number of elements in the \refarg{apps} array (integer)}
\argout{nspace}{Namespace of the new job (string)}
\end{arglist}
Returns \refconst{PMIX_SUCCESS} or a negative value corresponding to a PMIx error constant.
\reqattrstart
\ac{PMIx} libraries are not required to directly support any attributes for this function. However, any provided attributes must be passed to the host \ac{SMS} daemon for processing, and the \ac{PMIx} library is required to add the following attributes to those provided before passing the request to the host:
\pastePRIAttributeItem{PMIX_SPAWNED}
\pastePRIAttributeItem{PMIX_PARENT_ID}
\pastePRIAttributeItem{PMIX_REQUESTOR_IS_CLIENT}
\pastePRIAttributeItem{PMIX_REQUESTOR_IS_TOOL}
\divider
Host environments that implement support for \refapi{PMIx_Spawn} are required to pass the \refattr{PMIX_SPAWNED} and \refattr{PMIX_PARENT_ID} attributes to all \ac{PMIx} servers launching new child processes so those values can be returned to clients upon connection to the \ac{PMIx} server. In addition, they are required to support the following attributes when present in either the \refarg{job_info} or the \textit{info} array of an element of the \refarg{apps} array:
\pastePRRTEAttributeItem{PMIX_WDIR}
\pastePRRTEAttributeItem{PMIX_SET_SESSION_CWD}
\pastePRRTEAttributeItem{PMIX_PREFIX}
\pastePRRTEAttributeItem{PMIX_HOST}
\pastePRRTEAttributeItem{PMIX_HOSTFILE}
\reqattrend
\optattrstart
The following attributes are optional for host environments that support this operation:
\pastePRRTEAttributeItem{PMIX_ADD_HOSTFILE}
\pastePRRTEAttributeItem{PMIX_ADD_HOST}
\pastePRRTEAttributeItem{PMIX_PRELOAD_BIN}
\pastePRRTEAttributeItem{PMIX_PRELOAD_FILES}
\pastePRRTEAttributeItem{PMIX_PERSONALITY}
\pastePRRTEAttributeItem{PMIX_MAPPER}
\pastePRRTEAttributeItem{PMIX_DISPLAY_MAP}
\pastePRRTEAttributeItem{PMIX_PPR}
\pastePRRTEAttributeItem{PMIX_MAPBY}
\pastePRRTEAttributeItem{PMIX_RANKBY}
\pastePRRTEAttributeItem{PMIX_BINDTO}
\pastePRRTEAttributeItem{PMIX_NON_PMI}
\pastePRRTEAttributeItem{PMIX_STDIN_TGT}
\pastePRRTEAttributeItem{PMIX_FWD_STDIN}
\pastePRRTEAttributeItem{PMIX_FWD_STDOUT}
\pastePRRTEAttributeItem{PMIX_FWD_STDERR}
\pastePRRTEAttributeItem{PMIX_DEBUGGER_DAEMONS}
\pastePRRTEAttributeItem{PMIX_TAG_OUTPUT}
\pastePRRTEAttributeItem{PMIX_TIMESTAMP_OUTPUT}
\pastePRRTEAttributeItem{PMIX_MERGE_STDERR_STDOUT}
\pastePRRTEAttributeItem{PMIX_OUTPUT_TO_FILE}
\pastePRRTEAttributeItem{PMIX_INDEX_ARGV}
\pastePRRTEAttributeItem{PMIX_CPUS_PER_PROC}
\pastePRRTEAttributeItem{PMIX_NO_PROCS_ON_HEAD}
\pastePRRTEAttributeItem{PMIX_NO_OVERSUBSCRIBE}
\pastePRRTEAttributeItem{PMIX_REPORT_BINDINGS}
\pastePRRTEAttributeItem{PMIX_CPU_LIST}
\pastePRRTEAttributeItem{PMIX_JOB_RECOVERABLE}
\pastePRRTEAttributeItem{PMIX_JOB_CONTINUOUS}
\pastePRRTEAttributeItem{PMIX_MAX_RESTARTS}
\pastePRRTEAttributeItem{PMIX_NOTIFY_COMPLETION}
\optattrend
%%%%
\descr
Spawn a new job.
The assigned namespace of the spawned applications is returned in the \refarg{nspace} parameter.
A \code{NULL} value in that location indicates that the caller doesn't wish to have the namespace returned.
The \refarg{nspace} array must be at least of size one more than \refconst{PMIX_MAX_NSLEN}.
By default, the spawned processes will be PMIx ``connected'' to the parent process upon successful launch (see \refapi{PMIx_Connect} description for details).
Note that this only means that (a) the parent process will be given a copy of the new job's
information so it can query job-level info without incurring any communication penalties, (b) newly spawned child processes will receive a copy of the parent processes job-level info, and (c) both the parent process and members of the child job will receive notification of errors from processes in their combined assemblage.
\adviceuserstart
Behavior of individual resource managers may differ, but it is expected that failure of any application process to start will result in termination/cleanup of all processes in the newly spawned job and return of an error code to the caller.
\adviceuserend
%%%%%%%%%%%
\subsection{\code{PMIx_Spawn_nb}}
\declareapi{PMIx_Spawn_nb}
%%%%
\summary
Nonblocking version of the \refapi{PMIx_Spawn} routine.
%%%%
\format
\versionMarker{1.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_Spawn_nb(const pmix_info_t job_info[], size_t ninfo,
const pmix_app_t apps[], size_t napps,
pmix_spawn_cbfunc_t cbfunc, void *cbdata)
\end{codepar}
\cspecificend
\begin{arglist}
\argin{job_info}{Array of info structures (array of handles)}
\argin{ninfo}{Number of elements in the \refarg{job_info} array (integer)}
\argin{apps}{Array of \refstruct{pmix_app_t} structures (array of handles)}
\argin{cbfunc}{Callback function \refapi{pmix_spawn_cbfunc_t} (function reference)}
\argin{cbdata}{Data to be passed to the callback function (memory reference)}
\end{arglist}
Returns one of the following:
\begin{itemize}
\item \refconst{PMIX_SUCCESS}, indicating that the request is being processed by the host environment - result will be returned in the provided \refarg{cbfunc}. Note that the library must not invoke the callback function prior to returning from the \ac{API}.
\item a PMIx error constant indicating an error in the request - the \refarg{cbfunc} will \textit{not} be called
\end{itemize}
\reqattrstart
\ac{PMIx} libraries are not required to directly support any attributes for this function. However, any provided attributes must be passed to the host \ac{SMS} daemon for processing, and the \ac{PMIx} library is required to add the following attributes to those provided before passing the request to the host:
\pastePRIAttributeItem{PMIX_SPAWNED}
\pastePRIAttributeItem{PMIX_PARENT_ID}
\pastePRIAttributeItem{PMIX_REQUESTOR_IS_CLIENT}
\pastePRIAttributeItem{PMIX_REQUESTOR_IS_TOOL}
\divider
Host environments that implement support for \refapi{PMIx_Spawn} are required to pass the \refattr{PMIX_SPAWNED} and \refattr{PMIX_PARENT_ID} attributes to all \ac{PMIx} servers launching new child processes so those values can be returned to clients upon connection to the \ac{PMIx} server. In addition, they are required to support the following attributes when present in either the \refarg{job_info} or the \textit{info} array of an element of the \refarg{apps} array:
\pastePRRTEAttributeItem{PMIX_WDIR}
\pastePRRTEAttributeItem{PMIX_SET_SESSION_CWD}
\pastePRRTEAttributeItem{PMIX_PREFIX}
\pastePRRTEAttributeItem{PMIX_HOST}
\pastePRRTEAttributeItem{PMIX_HOSTFILE}
\reqattrend
\optattrstart
The following attributes are optional for host environments that support this operation:
\pastePRRTEAttributeItem{PMIX_ADD_HOSTFILE}
\pastePRRTEAttributeItem{PMIX_ADD_HOST}
\pastePRRTEAttributeItem{PMIX_PRELOAD_BIN}
\pastePRRTEAttributeItem{PMIX_PRELOAD_FILES}
\pastePRRTEAttributeItem{PMIX_PERSONALITY}
\pastePRRTEAttributeItem{PMIX_MAPPER}
\pastePRRTEAttributeItem{PMIX_DISPLAY_MAP}
\pastePRRTEAttributeItem{PMIX_PPR}
\pastePRRTEAttributeItem{PMIX_MAPBY}
\pastePRRTEAttributeItem{PMIX_RANKBY}
\pastePRRTEAttributeItem{PMIX_BINDTO}
\pastePRRTEAttributeItem{PMIX_NON_PMI}
\pastePRRTEAttributeItem{PMIX_STDIN_TGT}
\pastePRRTEAttributeItem{PMIX_FWD_STDIN}
\pastePRRTEAttributeItem{PMIX_FWD_STDOUT}
\pastePRRTEAttributeItem{PMIX_FWD_STDERR}
\pastePRRTEAttributeItem{PMIX_DEBUGGER_DAEMONS}
\pastePRRTEAttributeItem{PMIX_TAG_OUTPUT}
\pastePRRTEAttributeItem{PMIX_TIMESTAMP_OUTPUT}
\pastePRRTEAttributeItem{PMIX_MERGE_STDERR_STDOUT}
\pastePRRTEAttributeItem{PMIX_OUTPUT_TO_FILE}
\pastePRRTEAttributeItem{PMIX_INDEX_ARGV}
\pastePRRTEAttributeItem{PMIX_CPUS_PER_PROC}
\pastePRRTEAttributeItem{PMIX_NO_PROCS_ON_HEAD}
\pastePRRTEAttributeItem{PMIX_NO_OVERSUBSCRIBE}
\pastePRRTEAttributeItem{PMIX_REPORT_BINDINGS}
\pastePRRTEAttributeItem{PMIX_CPU_LIST}
\pastePRRTEAttributeItem{PMIX_JOB_RECOVERABLE}
\pastePRRTEAttributeItem{PMIX_JOB_CONTINUOUS}
\pastePRRTEAttributeItem{PMIX_MAX_RESTARTS}
\optattrend
%%%%
\descr
Nonblocking version of the \refapi{PMIx_Spawn} routine. The provided callback function will be executed upon successful start of \textit{all} specified application processes.
\adviceuserstart
Behavior of individual resource managers may differ, but it is expected that failure of any application process to start will result in termination/cleanup of all processes in the newly spawned job and return of an error code to the caller.
\adviceuserend
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Connecting and Disconnecting Processes}
\label{chap:api_proc_mgmt:connect}
This section defines functions to connect and disconnect processes in two or more separate \ac{PMIx} namespaces. The \ac{PMIx} definition of \textit{connected} solely implies that the host environment should treat the failure of any process in the assemblage as a reportable event, taking action on the assemblage as if it were a single application. For example, if the environment defaults (in the absence of any application directives) to terminating an application upon failure of any process in that application, then the environment should terminate all processes in the connected assemblage upon failure of any member.
\advicermstart
The host environment may choose to assign a new namespace to the connected assemblage and/or assign new ranks for its members for its own internal tracking purposes. However, it is not required to communicate such assignments to the participants (e.g., in response to an appropriate call to \refapi{PMIx_Query_info_nb}). The host environment is required to generate a \refconst{PMIX_ERR_INVALID_TERMINATION} event should any process in the assemblage terminate or call \refapi{PMIx_Finalize} without first \textit{disconnecting} from the assemblage.
The \textit{connect} operation does not require the exchange of job-level information nor the inclusion of information posted by participating processes via \refapi{PMIx_Put}. Indeed, the callback function utilized in \refapi{pmix_server_connect_fn_t} cannot pass information back into the \ac{PMIx} server library. However, host environments are advised that collecting such information at the participating daemons represents an optimization opportunity as participating processes are likely to request such information after the connect operation completes.
\advicermend
\adviceuserstart
Attempting to \textit{connect} processes solely within the same namespace is essentially a \textit{no-op} operation. While not explicitly prohibited, users are advised that a \ac{PMIx} implementation or host environment may return an error in such cases.
Neither the \ac{PMIx} implementation nor host environment are required to provide any tracking support for the assemblage. Thus, the application is responsible for maintaining the membership list of the assemblage.
\adviceuserend
%%%%%%%%%%%
\subsection{\code{PMIx_Connect}}
\declareapi{PMIx_Connect}
%%%%
\summary
Connect namespaces.
%%%%
\format
\versionMarker{1.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_Connect(const pmix_proc_t procs[], size_t nprocs,
const pmix_info_t info[], size_t ninfo)
\end{codepar}
\cspecificend
\begin{arglist}
\argin{procs}{Array of proc structures (array of handles)}
\argin{nprocs}{Number of elements in the \refarg{procs} array (integer)}
\argin{info}{Array of info structures (array of handles)}
\argin{ninfo}{Number of elements in the \refarg{info} array (integer)}
\end{arglist}
Returns \refconst{PMIX_SUCCESS} or a negative value corresponding to a PMIx error constant.
\reqattrstart
\ac{PMIx} libraries are not required to directly support any attributes for this function. However, any provided attributes must be passed to the host \ac{SMS} daemon for processing.
\reqattrend
\optattrstart
The following attributes are optional for host environments that support this operation:
\pastePRRTEAttributeItem{PMIX_TIMEOUT}
\pasteAttributeItem{PMIX_COLLECTIVE_ALGO}
\pasteAttributeItem{PMIX_COLLECTIVE_ALGO_REQD}
\optattrend
\adviceimplstart
We recommend that implementation of the \refattr{PMIX_TIMEOUT} attribute be left to the host environment due to race condition considerations between completion of the operation versus internal timeout in the \ac{PMIx} server library. Implementers that choose to support \refattr{PMIX_TIMEOUT} directly in the \ac{PMIx} server library must take care to resolve the race condition and should avoid passing \refattr{PMIX_TIMEOUT} to the host environment so that multiple competing timeouts are not created.
\adviceimplend
%%%%
\descr
Record the processes specified by the \refarg{procs} array as \textit{connected} as per the \ac{PMIx} definition. The function will return once all processes identified in \refarg{procs} have called either \refapi{PMIx_Connect} or its non-blocking version, \textit{and} the host environment has completed any supporting operations required to meet the terms of the \ac{PMIx} definition of \textit{connected} processes.
\adviceuserstart
All processes engaged in a given \refapi{PMIx_Connect} operation must provide the identical \refarg{procs} array as ordering of entries in the array and the method by which those processes are identified (e.g., use of \refconst{PMIX_RANK_WILDCARD} versus listing the individual processes) \textit{may} impact the host environment's algorithm for uniquely identifying an operation.
\adviceuserend
\adviceimplstart
\refapi{PMIx_Connect} and its non-blocking form are both \emph{collective} operations. Accordingly, the \ac{PMIx} server library is required to aggregate participation by local clients, passing the request to the host environment once all local participants have executed the \ac{API}.
\adviceimplend
\advicermstart
The host will receive a single call for each collective operation. It is the responsibility of the host to identify the nodes containing participating processes, execute the collective across all participating nodes, and notify the local \ac{PMIx} server library upon completion of the global collective.
\advicermend
Processes that combine via \refapi{PMIx_Connect} must call \refapi{PMIx_Disconnect} prior to finalizing and/or terminating - any process in the assemblage failing to meet this requirement will cause a \refconst{PMIX_ERR_INVALID_TERMINATION} event to be generated.
A process can only engage in one connect operation involving the identical \refarg{procs} array at a time.
However, a process can be simultaneously engaged in multiple connect operations, each involving a different \refarg{procs} array.
As in the case of the \refapi{PMIx_Fence} operation, the \refarg{info} array can be used to pass user-level directives regarding the algorithm to be used for any collective operation involved in the operation, timeout constraints, and other options available from the host \ac{RM}.
%%%%%%%%%%%
\subsection{\code{PMIx_Connect_nb}}
\declareapi{PMIx_Connect_nb}
%%%%
\summary
Nonblocking \refapi{PMIx_Connect_nb} routine.
%%%%
\format
\versionMarker{1.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_Connect_nb(const pmix_proc_t procs[], size_t nprocs,
const pmix_info_t info[], size_t ninfo,
pmix_op_cbfunc_t cbfunc, void *cbdata)
\end{codepar}
\cspecificend
\begin{arglist}
\argin{procs}{Array of proc structures (array of handles)}
\argin{nprocs}{Number of elements in the \refarg{procs} array (integer)}
\argin{info}{Array of info structures (array of handles)}
\argin{ninfo}{Number of element in the \refarg{info} array (integer)}
\argin{cbfunc}{Callback function \refapi{pmix_op_cbfunc_t} (function reference)}
\argin{cbdata}{Data to be passed to the callback function (memory reference)}
\end{arglist}
Returns one of the following:
\begin{itemize}
\item \refconst{PMIX_SUCCESS}, indicating that the request is being processed by the host environment - result will be returned in the provided \refarg{cbfunc}. Note that the library must not invoke the callback function prior to returning from the \ac{API}.
\item \refconst{PMIX_OPERATION_SUCCEEDED}, indicating that the request was immediately processed and returned \textit{success} - the \refarg{cbfunc} will \textit{not} be called
\item a PMIx error constant indicating either an error in the input or that the request was immediately processed and failed - the \refarg{cbfunc} will \textit{not} be called
\end{itemize}
\reqattrstart
\ac{PMIx} libraries are not required to directly support any attributes for this function. However, any provided attributes must be passed to the host \ac{SMS} daemon for processing.
\reqattrend
\optattrstart
The following attributes are optional for host environments that support this operation:
\pastePRRTEAttributeItem{PMIX_TIMEOUT}
\pasteAttributeItem{PMIX_COLLECTIVE_ALGO}
\pasteAttributeItem{PMIX_COLLECTIVE_ALGO_REQD}
\optattrend
\adviceimplstart
We recommend that implementation of the \refattr{PMIX_TIMEOUT} attribute be left to the host environment due to race condition considerations between completion of the operation versus internal timeout in the \ac{PMIx} server library. Implementers that choose to support \refattr{PMIX_TIMEOUT} directly in the \ac{PMIx} server library must take care to resolve the race condition and should avoid passing \refattr{PMIX_TIMEOUT} to the host environment so that multiple competing timeouts are not created.
\adviceimplend
%%%%
\descr
Nonblocking version of \refapi{PMIx_Connect}. The callback function is called once all processes identified in \refarg{procs} have called either \refapi{PMIx_Connect} or its non-blocking version, \textit{and} the host environment has completed any supporting operations required to meet the terms of the \ac{PMIx} definition of \textit{connected} processes. See the advice provided in the description for \refapi{PMIx_Connect} for more information.
%%%%%%%%%%%
\subsection{\code{PMIx_Disconnect}}
\declareapi{PMIx_Disconnect}
%%%%
\summary
Disconnect a previously connected set of processes.
%%%%
\format
\versionMarker{1.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_Disconnect(const pmix_proc_t procs[], size_t nprocs,
const pmix_info_t info[], size_t ninfo);
\end{codepar}
\cspecificend
\begin{arglist}
\argin{procs}{Array of proc structures (array of handles)}
\argin{nprocs}{Number of elements in the \refarg{procs} array (integer)}
\argin{info}{Array of info structures (array of handles)}
\argin{ninfo}{Number of element in the \refarg{info} array (integer)}
\end{arglist}
Returns \refconst{PMIX_SUCCESS} or a negative value corresponding to a PMIx error constant.
\reqattrstart
\ac{PMIx} libraries are not required to directly support any attributes for this function. However, any provided attributes must be passed to the host \ac{SMS} daemon for processing.
\reqattrend
\optattrstart
The following attributes are optional for host environments that support this operation:
\pastePRRTEAttributeItem{PMIX_TIMEOUT}
\optattrend
\adviceimplstart
We recommend that implementation of the \refattr{PMIX_TIMEOUT} attribute be left to the host environment due to race condition considerations between completion of the operation versus internal timeout in the \ac{PMIx} server library. Implementers that choose to support \refattr{PMIX_TIMEOUT} directly in the \ac{PMIx} server library must take care to resolve the race condition and should avoid passing \refattr{PMIX_TIMEOUT} to the host environment so that multiple competing timeouts are not created.
\adviceimplend
%%%%
\descr
Disconnect a previously connected set of processes.
A \refconst{PMIX_ERR_INVALID_OPERATION} error will be returned if the specified set of \refarg{procs} was not previously \textit{connected} via a call to \refapi{PMIx_Connect} or its non-blocking form. The function will return once all processes identified in \refarg{procs} have called either \refapi{PMIx_Disconnect} or its non-blocking version, \textit{and} the host environment has completed any required supporting operations.
\adviceuserstart
All processes engaged in a given \refapi{PMIx_Disconnect} operation must provide the identical \refarg{procs} array as ordering of entries in the array and the method by which those processes are identified (e.g., use of \refconst{PMIX_RANK_WILDCARD} versus listing the individual processes) \textit{may} impact the host environment's algorithm for uniquely identifying an operation.
\adviceuserend
\adviceimplstart
\refapi{PMIx_Disconnect} and its non-blocking form are both \emph{collective} operations. Accordingly, the \ac{PMIx} server library is required to aggregate participation by local clients, passing the request to the host environment once all local participants have executed the \ac{API}.
\adviceimplend
\advicermstart
The host will receive a single call for each collective operation. The host will receive a single call for each collective operation. It is the responsibility of the host to identify the nodes containing participating processes, execute the collective across all participating nodes, and notify the local \ac{PMIx} server library upon completion of the global collective.
\advicermend
A process can only engage in one disconnect operation involving the identical \refarg{procs} array at a time.
However, a process can be simultaneously engaged in multiple disconnect operations, each involving a different \refarg{procs} array.
As in the case of the \refapi{PMIx_Fence} operation, the \refarg{info} array can be used to pass user-level directives regarding the algorithm to be used for any collective operation involved in the operation, timeout constraints, and other options available from the host \ac{RM}.
%%%%%%%%%%%
\subsection{\code{PMIx_Disconnect_nb}}
\declareapi{PMIx_Disconnect_nb}
%%%%
\summary
Nonblocking \refapi{PMIx_Disconnect} routine.
%%%%
\format
\versionMarker{1.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_Disconnect_nb(const pmix_proc_t procs[], size_t nprocs,
const pmix_info_t info[], size_t ninfo,
pmix_op_cbfunc_t cbfunc, void *cbdata);
\end{codepar}
\cspecificend
\begin{arglist}
\argin{procs}{Array of proc structures (array of handles)}
\argin{nprocs}{Number of elements in the \refarg{procs} array (integer)}
\argin{info}{Array of info structures (array of handles)}
\argin{ninfo}{Number of element in the \refarg{info} array (integer)}
\argin{cbfunc}{Callback function \refapi{pmix_op_cbfunc_t} (function reference)}
\argin{cbdata}{Data to be passed to the callback function (memory reference)}
\end{arglist}
Returns one of the following:
\begin{itemize}
\item \refconst{PMIX_SUCCESS}, indicating that the request is being processed by the host environment - result will be returned in the provided \refarg{cbfunc}. Note that the library must not invoke the callback function prior to returning from the \ac{API}.
\item \refconst{PMIX_OPERATION_SUCCEEDED}, indicating that the request was immediately processed and returned \textit{success} - the \refarg{cbfunc} will \textit{not} be called
\item a PMIx error constant indicating either an error in the input or that the request was immediately processed and failed - the \refarg{cbfunc} will \textit{not} be called
\end{itemize}
\reqattrstart
\ac{PMIx} libraries are not required to directly support any attributes for this function. However, any provided attributes must be passed to the host \ac{SMS} daemon for processing.
\reqattrend
\optattrstart
The following attributes are optional for host environments that support this operation:
\pastePRRTEAttributeItem{PMIX_TIMEOUT}
\optattrend
\adviceimplstart
We recommend that implementation of the \refattr{PMIX_TIMEOUT} attribute be left to the host environment due to race condition considerations between completion of the operation versus internal timeout in the \ac{PMIx} server library. Implementers that choose to support \refattr{PMIX_TIMEOUT} directly in the \ac{PMIx} server library must take care to resolve the race condition and should avoid passing \refattr{PMIX_TIMEOUT} to the host environment so that multiple competing timeouts are not created.
\adviceimplend
%%%%
\descr
Nonblocking \refapi{PMIx_Disconnect} routine. The callback function is called once all processes identified in \refarg{procs} have called either \refapi{PMIx_Disconnect_nb} or its blocking version, \textit{and} the host environment has completed any required supporting operations. See the advice provided in the description for \refapi{PMIx_Disconnect} for more information.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{IO Forwarding}
\label{chap:api_proc_mgmt:iof}
This section defines functions by which tools (e.g., debuggers) can request forwarding of input/output to/from other processes. The term ``tool'' widely refers to non-computational programs executed by the user or system administrator to monitor or control a principal computational program. Tools almost always interact with either the host environment, user applications, or both to perform administrative and support functions. For example, a debugger tool might be used to remotely control the processes of a parallel application, monitoring their behavior on a step-by-step basis.
Underlying the operation of many tools is a common need to forward stdin from the tool to targeted processes, and to return stdout/stderr from those processes for display on the user’s console. Historically, each tool developer was responsible for creating their own \ac{IO} forwarding subsystem. However, with the introduction of \ac{PMIx} as a standard mechanism for interacting between applications and the host environment, it has become possible to relieve tool developers of this burden.
\advicermstart
The responsibility of the host environment in forwarding of \ac{IO} falls into the following areas:
\begin{itemize}
\item Capturing output from specified child processes
\item Forwarding that output to the host of the \ac{PMIx} server library that requested it
\item Delivering that payload to the \ac{PMIx} server library via the \refapi{PMIx_server_IOF_deliver} \ac{API} for final dispatch
\end{itemize}
It is the responsibility of the \ac{PMIx} library to buffer, format, and deliver the payload to the requesting client.
\advicermend
\adviceuserstart
The forwarding of \ac{IO} via \ac{PMIx} requires that both the host environment and the tool support \ac{PMIx}, but does not impose any similar requirements on the application itself.
\adviceuserend
%%%%%%%%%%%
\subsection{\code{PMIx_IOF_pull}}
\declareapi{PMIx_IOF_pull}
%%%%
\summary
Register to receive output forwarded from a set of remote processes.
%%%%
\format
\versionMarker{3.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_IOF_pull(const pmix_proc_t procs[], size_t nprocs,
const pmix_info_t directives[], size_t ndirs,
pmix_iof_channel_t channel, pmix_iof_cbfunc_t cbfunc,
pmix_hdlr_reg_cbfunc_t regcbfunc, void *regcbdata)
\end{codepar}
\cspecificend
\begin{arglist}
\argin{procs}{Array of proc structures identifying desired source processes (array of handles)}
\argin{nprocs}{Number of elements in the \refarg{procs} array (integer)}
\argin{directives}{Array of \refstruct{pmix_info_t} structures (array of handles)}
\argin{ndirs}{Number of elements in the \refarg{directives} array (integer)}
\argin{channel}{Bitmask of IO channels included in the request (\refstruct{pmix_iof_channel_t})}
\argin{cbfunc}{Callback function for delivering relevant output (\refapi{pmix_iof_cbfunc_t} function reference)}
\argin{regcbfunc}{Function to be called when registration is completed (\refapi{pmix_hdlr_reg_cbfunc_t} function reference)}
\argin{regcbdata}{Data to be passed to the \refarg{regcbfunc} callback function (memory reference)}
\end{arglist}
If \refarg{regcbfunc} is \code{NULL}, the function call will be treated as a \emph{blocking} call. In this case, the returned status will be either (a) the IOF handler reference identifier if the value is greater than or equal to zero, or (b) a negative error code indicative of the reason for the failure.
If the \refarg{regcbfunc} is non-\code{NULL}, the function call will be treated as a \emph{non-blocking} call and will return the following:
\begin{constantdesc}
\item \refconst{PMIX_SUCCESS} indicating that the request has been accepted for processing and the provided callback function will be executed upon completion of the operation. Note that the library must not invoke the callback function prior to returning from the \ac{API}. The IOF handler identifier will be returned in the callback
\item a non-zero \ac{PMIx} error constant indicating a reason for the request to have been rejected. In this case, the provided callback function will not be executed.
\end{constantdesc}
\reqattrstart
The following attributes are required for \ac{PMIx} libraries that support \ac{IO} forwarding:
\pastePRIAttributeItem{PMIX_IOF_CACHE_SIZE}
\pastePRIAttributeItem{PMIX_IOF_DROP_OLDEST}
\pastePRIAttributeItem{PMIX_IOF_DROP_NEWEST}
\reqattrend
\optattrstart
The following attributes are optional for \ac{PMIx} libraries that support \ac{IO} forwarding:
\pastePRIAttributeItem{PMIX_IOF_BUFFERING_SIZE}
\pastePRIAttributeItem{PMIX_IOF_BUFFERING_TIME}
\pastePRIAttributeItem{PMIX_IOF_TAG_OUTPUT}
\pastePRIAttributeItem{PMIX_IOF_TIMESTAMP_OUTPUT}
\pastePRIAttributeItem{PMIX_IOF_XML_OUTPUT}
\optattrend
%%%%
\descr
Register to receive output forwarded from a set of remote processes.
\adviceuserstart
Providing a \code{NULL} function pointer for the \refarg{cbfunc} parameter will cause output for the indicated channels to be written to their corresponding stdout/stderr file descriptors. Use of \refconst{PMIX_RANK_WILDCARD} to specify all processes in a given namespace is supported but should be used carefully due to bandwidth considerations.
\adviceuserend
%%%%%%%%%%%
\subsection{\code{PMIx_IOF_deregister}}
\declareapi{PMIx_IOF_deregister}
%%%%
\summary
Deregister from output forwarded from a set of remote processes.
%%%%
\format
\versionMarker{3.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_IOF_deregister(size_t iofhdlr,
const pmix_info_t directives[], size_t ndirs,
pmix_op_cbfunc_t cbfunc, void *cbdata)
\end{codepar}
\cspecificend
\begin{arglist}
\argin{iofhdlr}{Registration number returned from the \refapi{pmix_hdlr_reg_cbfunc_t} callback from the call to \refapi{PMIx_IOF_pull} (\code{size_t})}
\argin{directives}{Array of \refstruct{pmix_info_t} structures (array of handles)}
\argin{ndirs}{Number of elements in the \refarg{directives} array (integer)}
\argin{cbfunc}{Callback function to be called when deregistration has been completed. (function reference)}
\argin{cbdata}{Data to be passed to the \refarg{cbfunc} callback function (memory reference)}
\end{arglist}
If \refarg{cbfunc} is \code{NULL}, the function will be treated as a \emph{blocking} call and the result of the operation returned in the status code.
If \refarg{cbfunc} is non-\code{NULL}, the function will be treated as a \emph{non-blocking} call and return one of the following:
\begin{itemize}
\item \refconst{PMIX_SUCCESS}, indicating that the request is being processed - result will be returned in the provided \refarg{cbfunc}. Note that the library must not invoke the callback function prior to returning from the \ac{API}.
\item \refconst{PMIX_OPERATION_SUCCEEDED}, indicating that the request was immediately processed and returned \textit{success} - the \refarg{cbfunc} will \textit{not} be called
\item a PMIx error constant indicating either an error in the input or that the request was immediately processed and failed - the \refarg{cbfunc} will \textit{not} be called
\end{itemize}
The returned status code will be one of the following:
\begin{constantdesc}
\item \refconst{PMIX_SUCCESS} The IOF handler was successfully deregistered.
\item \refconst{PMIX_ERR_BAD_PARAM} The provided \refarg{iofhdlr} was unrecognized.
\item \refconst{PMIX_ERR_NOT_SUPPORTED} The \ac{PMIx} implementation does not support this function.
\end{constantdesc}
%%%%
\descr
Deregister from output forwarded from a set of remote processes.
\adviceimplstart
Any currently buffered \ac{IO} should be flushed upon receipt of a deregistration request. All received \ac{IO} after receipt of the request shall be discarded.
\adviceimplend
%%%%%%%%%%%
\subsection{\code{PMIx_IOF_push}}
\declareapi{PMIx_IOF_push}
%%%%
\summary
Push data collected locally (typically from stdin or a file) to stdin of the target recipients.
%%%%
\format
\versionMarker{3.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_IOF_push(const pmix_proc_t targets[], size_t ntargets,
pmix_byte_object_t *bo,
const pmix_info_t directives[], size_t ndirs,
pmix_op_cbfunc_t cbfunc, void *cbdata)
\end{codepar}
\cspecificend
\begin{arglist}
\argin{targets}{Array of proc structures identifying desired target processes (array of handles)}
\argin{ntargets}{Number of elements in the \refarg{targets} array (integer)}
\argin{bo}{Pointer to \refstruct{pmix_byte_object_t} containing the payload to be delivered (handle)}
\argin{directives}{Array of \refstruct{pmix_info_t} structures (array of handles)}
\argin{ndirs}{Number of elements in the \refarg{directives} array (integer)}
\argin{directives}{Array of \refstruct{pmix_info_t} structures (array of handles)}
\argin{cbfunc}{Callback function to be called when operation has been completed. (\refapi{pmix_op_cbfunc_t} function reference)}
\argin{cbdata}{Data to be passed to the \refarg{cbfunc} callback function (memory reference)}
\end{arglist}
If \refarg{cbfunc} is \code{NULL}, the function will be treated as a \emph{blocking} call and the result of the operation returned in the status code.
If \refarg{cbfunc} is non-\code{NULL}, the function will be treated as a \emph{non-blocking} call and return one of the following:
\begin{itemize}
\item \refconst{PMIX_SUCCESS}, indicating that the request is being processed - result will be returned in the provided \refarg{cbfunc}. Note that the library must not invoke the callback function prior to returning from the \ac{API}.
\item \refconst{PMIX_OPERATION_SUCCEEDED}, indicating that the request was immediately processed and returned \textit{success} - the \refarg{cbfunc} will \textit{not} be called
\item a PMIx error constant indicating either an error in the input or that the request was immediately processed and failed - the \refarg{cbfunc} will \textit{not} be called
\end{itemize}
The returned status code will be one of the following:
\begin{constantdesc}
\item \refconst{PMIX_SUCCESS} The provided data has been accepted for transmission - it is not indicative of the payload being delivered to any member of the provided \refarg{targets}
\item \refconst{PMIX_ERR_NOT_SUPPORTED} The \ac{PMIx} implementation does not support this function.
\item a PMIx error constant indicating the nature of the error
\end{constantdesc}
\reqattrstart
The following attributes are required for \ac{PMIx} libraries that support \ac{IO} forwarding:
\pastePRIAttributeItem{PMIX_IOF_CACHE_SIZE}
\pastePRIAttributeItem{PMIX_IOF_DROP_OLDEST}
\pastePRIAttributeItem{PMIX_IOF_DROP_NEWEST}
\reqattrend
\optattrstart
The following attributes are optional for \ac{PMIx} libraries that support \ac{IO} forwarding:
\pastePRIAttributeItem{PMIX_IOF_BUFFERING_SIZE}
\pastePRIAttributeItem{PMIX_IOF_BUFFERING_TIME}
\optattrend
%%%%
\descr
Push data collected locally (typically from stdin or a file) to stdin of the target recipients.
\adviceuserstart
Execution of the \refarg{cbfunc} callback function serves as notice that the \ac{PMIx} library no longer requires the caller to maintain the \refarg{bo} data object - it does \textit{not} indicate delivery of the payload to the targets. Use of \refconst{PMIX_RANK_WILDCARD} to specify all processes in a given namespace is supported but should be used carefully due to bandwidth considerations.
\adviceuserend