forked from Cambricon/mlu-ops
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmlu_op.h
14529 lines (14197 loc) · 586 KB
/
mlu_op.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/*************************************************************************
* Copyright (C) [2022] by Cambricon, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*************************************************************************/
#ifndef MLUOP_EXAMPLE_H_
#define MLUOP_EXAMPLE_H_
/******************************************************************************
* MLU-OPS: Cambricon Open Source operator library for Network
******************************************************************************/
#define MLUOP_MAJOR 1
#define MLUOP_MINOR 4
#define MLUOP_PATCHLEVEL 1
/*********************************************************************************
* MLUOP_VERSION is deprecated and not recommended. To get the version of MLUOP, use
* MLUOP_MAJOR, MLUOP_MINOR and MLUOP_PATCHLEVEL.
********************************************************************************/
#define MLUOP_VERSION (MLUOP_MAJOR * 1000 + MLUOP_MINOR * 100 + MLUOP_PATCHLEVEL)
#define MLUOP_DIM_MAX 8
#include <stdint.h>
#include "cn_api.h"
#include "cnrt.h"
#ifndef MLUOP_WIN_API
#ifdef _WIN32
#define MLUOP_WIN_API __stdcall
#else
#define MLUOP_WIN_API
#endif
#endif
#if defined(__cplusplus)
extern "C" {
#endif
/******************************************************************************
* MLU-OPS Return Status
******************************************************************************/
/*! @brief Describes function return status.
*/
typedef enum {
MLUOP_STATUS_SUCCESS = 0, /*!< The operation is successfully completed. */
MLUOP_STATUS_NOT_INITIALIZED = 1,
/*!< MLU-OPS library is not initialized properly, which is usually caused by failing
to call ::mluOpCreate, ::mluOpCreateTensorDescriptor or ::mluOpSetTensorDescriptor.
Such error is usually due to incompatible MLU device or invalid driver environment.
Notice that ::mluOpCreate should be called prior to any other MLU-OPS function. */
MLUOP_STATUS_ALLOC_FAILED = 2,
/*!< This error occurs when the resource allocation fails, which is usually caused by
failing to call cnMallocHost due to exceeded memory usage. Make sure that
the memory allocated previously is deallocated as much as possible. */
MLUOP_STATUS_BAD_PARAM = 3,
/*!< Invalid value or parameters are passed to the function, including data type, layout,
dimensions, etc. */
MLUOP_STATUS_INTERNAL_ERROR = 4,
/*!< An error occurs inside of the function, which may indicate an internal error or bug in
the library. This error is usually caused by failing to call cnrtMemcpyAsync.
Check whether the memory passed to the function is deallocated before the completion
of the routine. */
MLUOP_STATUS_ARCH_MISMATCH = 5,
/*!< Invalid MLU device which is not supported by current function. */
MLUOP_STATUS_EXECUTION_FAILED = 6,
/*!< An error occurs when the function fails to be executed on MLU device due to multiple reasons.
You can check whether the hardware environment, driver version and other prerequisite
libraries are correctly installed. */
MLUOP_STATUS_NOT_SUPPORTED = 7,
/*!< An error occurs when the requested functionality is not supported in this version but would
be supported in the future. */
MLUOP_STATUS_NUMERICAL_OVERFLOW = 8,
/*!< A numerical overflow occurs when executing the function, which is usually due to large scale
or inappropriate range of value of input tensor. */
} mluOpStatus_t;
/******************************************************************************
* MLU-OPS Tensor Layout
******************************************************************************/
/*!
* @brief Describes the data layouts in MLU-OPS.
*
* The data can be defined in three, four, or five dimensions.
*
* Take images for example, the format of the data layout can be NCHW:
* - N: The number of images
* - C: The number of image channels
* - H: The height of images
* - W: The weight of images
*
* Take sequence for example, the format of the data layout can be TNC:
* - T: The timing steps of sequence
* - N: The batch size of sequence
* - C: The alphabet size of sequence
*/
typedef enum {
MLUOP_LAYOUT_NCHW = 0,
/*!< The data layout is in the following order: batch size, channel, height, and width. */
MLUOP_LAYOUT_NHWC = 1,
/*!< The data layout is in the following order: batch size, height, width, and channel. */
MLUOP_LAYOUT_HWCN = 2,
/*!< The data layout is in the following order: height, width, channel and batch size. */
MLUOP_LAYOUT_NDHWC = 3,
/*!< The data layout is in the following order: batch size, depth, height, width, and
* channel. */
MLUOP_LAYOUT_ARRAY = 4,
/*!< The data is multi-dimensional tensor. */
MLUOP_LAYOUT_NCDHW = 5,
/*!< The data layout is in the following order: batch size, channel, depth, height, and
* width. */
MLUOP_LAYOUT_TNC = 6,
/*!< The data layout is in the following order: timing steps, batch size, alphabet size. */
MLUOP_LAYOUT_NTC = 7,
/*!< The data layout is in the following order: batch size, timing steps, alphabet size. */
MLUOP_LAYOUT_NC = 8,
/*!< The data layout is in the following order: batch size, channel. */
MLUOP_LAYOUT_NLC = 9,
/*!< The data layout is in the following order: batch size, width, channel. */
MLUOP_LAYOUT_NCL = 10,
/*!< The data layout is in the following order: batch size, channel, length.*/
} mluOpTensorLayout_t;
/******************************************************************************
* Cambricon MLU-OPS sequence data Layout
******************************************************************************/
/*!
* @brief Enumeration variables describing the sequence data layouts.
* N represents batch, B represents beam, T represents sequence length,
* and C represents embedding size.
*/
typedef enum {
MLUOP_SEQDATA_TNC = 0, /*!< Sequence data layout order: TNC. */
MLUOP_SEQDATA_TNC_PACKED = 1, /*!< Sequence data layout order: TNC_PACKED. */
MLUOP_SEQDATA_NTC = 2, /*!< Sequence data layout order: NTC. */
MLUOP_SEQDATA_NC = 3, /*!< Sequence data layout order: NC. */
MLUOP_SEQDATA_TNBC = 4, /*!< Sequence data layout order: TNBC. */
MLUOP_SEQDATA_TBNC = 5, /*!< Sequence data layout order: TBNC. */
MLUOP_SEQDATA_NBTC = 6, /*!< Sequence data layout order: NBTC. */
MLUOP_SEQDATA_NTBC = 7, /*!< Sequence data layout order: NTBC. */
MLUOP_SEQDATA_BNTC = 8, /*!< Sequence data layout order: BNTC. */
MLUOP_SEQDATA_BTNC = 9, /*!< Sequence data layout order: BTNC. */
MLUOP_SEQDATA_TN = 10, /*!< Sequence data layout order: TN. */
MLUOP_SEQDATA_NT = 11, /*!< Sequence data layout order: NT. */
} mluOpSeqDataLayout_t;
/******************************************************************************
* MLU-OPS Data Type
******************************************************************************/
/*! @brief Describes the data types in MLU-OPS. */
typedef enum {
MLUOP_DTYPE_INVALID = 0, /*!< An invalid data type. */
MLUOP_DTYPE_HALF = 1, /*!< A 16-bit floating-point data type. */
MLUOP_DTYPE_FLOAT = 2, /*!< A 32-bit floating-point data type. */
MLUOP_DTYPE_DOUBLE = 14, /*!< A 64-bit floating-point data type. */
MLUOP_DTYPE_INT8 = 3, /*!< An 8-bit signed integer data type. */
MLUOP_DTYPE_INT16 = 4, /*!< A 16-bit signed integer data type. */
MLUOP_DTYPE_INT31 = 5, /*!< The data is a 31-bit signed integer data type. */
MLUOP_DTYPE_INT32 = 6, /*!< A 32-bit signed integer data type. */
MLUOP_DTYPE_INT64 = 9, /*!< A 64-bit signed integer data type. */
MLUOP_DTYPE_UINT8 = 7, /*!< An 8-bit unsigned integer data type. */
MLUOP_DTYPE_UINT16 = 13, /*!< A 16-bit unsigned integer data type. */
MLUOP_DTYPE_UINT32 = 11, /*!< A 32-bit unsigned integer data type. */
MLUOP_DTYPE_UINT64 = 12, /*!< A 64-bit unsigned integer data type. */
MLUOP_DTYPE_BOOL = 8, /*!< A boolean data type. */
MLUOP_DTYPE_COMPLEX_HALF = 15, /*!< A 32-bit complex number of two fp16. */
MLUOP_DTYPE_COMPLEX_FLOAT = 16, /*!< A 64-bit complex number of two fp32. */
MLUOP_DTYPE_BFLOAT16 = 17,
/*!< The data is a 16-bit floating-point data type with one bit for sign,
* 8 bits for exponent and 7 bits for fraction. */
} mluOpDataType_t;
/*!
* @brief Describes whether to propagate NaN numbers.
*/
typedef enum {
MLUOP_NOT_PROPAGATE_NAN = 0, /*!< The NaN numbers are not propagated . */
MLUOP_PROPAGATE_NAN = 1, /*!< The NaN numbers are propagated. */
} mluOpNanPropagation_t;
/*!
* @brief Describes the options that can help choose the best suited algorithm used for
* implementation of the activation and accumulation operations.
**/
typedef enum {
MLUOP_COMPUTATION_FAST = 0,
/*!< Implementation with the fastest algorithm and lower precision. */
MLUOP_COMPUTATION_HIGH_PRECISION = 1,
/*!< Implementation with the high-precision algorithm regardless of the performance. */
MLUOP_COMPUTATION_ULTRAHIGH_PRECISION = 2,
/*!< Implementation with the ultrahigh-precision algorithm regardless of the performance. */
} mluOpComputationPreference_t;
/*!
* @brief Describes the atomics modes in MLU-OPS.
*/
typedef enum {
MLUOP_ATOMICS_NOT_ALLOWED = 1,
/*!< The atomics is not allowed to cumulate results. */
MLUOP_ATOMICS_ALLOWED = 2,
/*!< The atomics is allowed to cumulate results. */
} mluOpAtomicsMode_t;
/*!
* @brief Describes the rounding modes of quantization conversion.
*/
typedef enum {
MLUOP_ROUND_HALF_TO_EVEN = 0,
/*!< The rounding mode to round towards the nearest even neighbor is used for
* quantization conversion. */
MLUOP_ROUND_HALF_UP = 1,
/*!< The rounding mode to round up towards the nearest neighbor is used for
* quantization conversion. */
MLUOP_ROUND_HALF_OFF_ZERO = 2,
/*!< The rounding mode to round half away from zero is used for quantization
* conversion. */
} mluOpQuantizeRoundMode_t;
/*!
* @brief Describes the modes of quantization method.
*/
typedef enum {
MLUOP_QUANTIZE_POSITION = 0,
/*!< Quantization method with position factor and without scale factor. */
MLUOP_QUANTIZE_POSITION_SCALE = 1,
/*!< Quantization method with position and scale factors. */
MLUOP_QUANTIZE_POSITION_SCALE_OFFSET = 2,
/*!< Asymmetric quantization method with position, scale, and offset factors. */
} mluOpQuantizeMode_t;
/*!
* @brief Describes the bases that are used in the implementation of the log function.
*/
typedef enum {
MLUOP_LOG_E = 0, /*!< The base e is used. */
MLUOP_LOG_2 = 1, /*!< The base 2 is used. */
MLUOP_LOG_10 = 2, /*!< The base 10 is used. */
} mluOpLogBase_t;
/*!
* @brief Describes the pointer modes that are used in the implementation of the fill function.
*/
typedef enum {
MLUOP_POINTER_MODE_HOST = 0,
/*!< A host pointer, which means that the values passed by reference are on the host. */
MLUOP_POINTER_MODE_DEVICE = 1,
/*!< A device pointer, which means that the values passed by reference are on the device. */
} mluOpPointerMode_t;
/*!
* @brief Describes the input box modes that can be used to implement the Nms operation.
*/
typedef enum {
MLUOP_NMS_BOX_DIAGONAL = 0, /*!< The box mode is [x1, y1, x2, y2]. */
MLUOP_NMS_BOX_CENTER = 1,
/*!< The box mode is [x_center, y_center, width, height] where width > 0 and * height > 0. */
} mluOpNmsBoxPointMode_t;
/*!
* @brief Describes the output modes that can be used to implement the Nms operation.
*/
typedef enum {
MLUOP_NMS_OUTPUT_TARGET_INDICES = 0,
/*!< Returns target indices, which are sorted in decreasing order of confidences. */
MLUOP_NMS_OUTPUT_TARGET_CONFIDENCE_AND_POS_1 = 1,
/*!< Returns target confidences and positions with the order of confidence_0, x_01, y_01, x_02, y_02,
* confidence_1, x_11, y_11, x_12, y_12, ... ,
* confidence_n, x_n1, y_n1, x_n2, and y_n2. The (x_01, y_01) and (x_02, y_02) represent the top left corner
* and bottom right corner coordinates of the first box, respectively.
*/
MLUOP_NMS_OUTPUT_TARGET_CONFIDENCE_AND_POS_2 = 2,
/*!< Returns target confidences and positions with the order of confidence_0,
* confidence_1, ... , confidence_n, x_01, x_11, ... , x_n1, y_01, y_11, ... , y_n1, x_02, x_12, ... , x_n2, y_02,
* y_12, ... , and y_n2. The (x_01, y_01) and (x_02, y_02) represent the top left corner and
* bottom right corner coordinates of the first box, respectively.
*/
MLUOP_NMS_OUTPUT_TARGET_BATCH_AND_CLASS = 3,
/*!< Returns batch indices, class indices, and positions with the order of batch_0, class_0, box_0,
* ... , batch_0, class_0, box_m, batch_0, class_1, box_0, ... , batch_0, class_1, box_m, ... , ... ,
* batch_s, class_n, and box_m.
*/
} mluOpNmsOutputMode_t;
/*!
* @brief Describes the algorithms that can be used to implement the Nms operation.
*/
typedef enum {
MLUOP_NMS_HARD_NMS = 0,
/*!< A type of algorithm which updates confidence using hard Nms, for example
*confidence = IOU < IOU_threshold ? confidence : 0.
*/
MLUOP_NMS_SOFT_NMS_LINEAR = 1,
/*!< A type of algorithm which updates confidence using linear method, for example
* confidence = IOU < IOU_threshold ? confidence : confidence * (1 - IOU).
*/
MLUOP_NMS_SOFT_NMS_GAUSSIAN = 2,
/*!< A type of algorithm which updates confidence using Gaussian method, for example
*confidence = confidence * exp{- \f$IOU^2\f$ / (2 * sigma)}.
*/
} mluOpNmsMethodMode_t;
/*!
* @brief Describes the algorithms that can be used to implement the Nms operation.
*/
typedef enum {
MLUOP_NMS_ALGO_EXCLUDE_BOUNDARY = 0,
/*!< Implements Nms with boundary excluded. In this mode,
* the height or width of boxes is ``(x2 - x1)``.
*/
MLUOP_NMS_ALGO_INCLUDE_BOUNDARY = 1,
/*!< Implements Nms with boundary included. In this mode,
* the height or width of boxes is ``(x2 - x1 + offset)``.
*/
} mluOpNmsAlgo_t;
/******************************************************************************
* MLU-OPS Data Structure: Customized Operation
******************************************************************************/
/*!
* @brief Describes the data type of indices used in the reduce function.
*/
typedef enum {
MLUOP_32BIT_INDICES = 0, /*!< The data type of indices is unsigned int. */
MLUOP_16BIT_INDICES = 1, /*!< The data type of indices is unsigned short. */
} mluOpIndicesType_t;
/*!
* @brief Describes the reduction applied to the output in the implementation of the loss function.
*/
typedef enum {
MLUOP_LOSS_REDUCTION_NONE = 0,
/*!< No reduction is applied in the operation.*/
MLUOP_LOSS_REDUCTION_SUM = 1,
/*!< The elements of output are summed in the operation.*/
MLUOP_LOSS_REDUCTION_MEAN = 2,
/*!< The weighted mean of the output is applied in the operation.*/
} mluOpLossReduction_t;
/*!
* @brief Describes the modes that are used in the Reduce function.
*/
typedef enum {
MLUOP_REDUCE_DSUM = 0, /*!< Computes the sum value. */
MLUOP_REDUCE_DMEAN = 1, /*!< Computes the mean value. */
MLUOP_REDUCE_DMAX = 2, /*!< Computes the maximum value. */
} mluOpReduceMode_t;
/*!
* @brief Enumeration variables describing the pooling modes that can be used to
* implement the pooling operation.
*/
typedef enum {
MLUOP_POOLING_MAX = 0, /*!< The max pooling mode is implemented.*/
MLUOP_POOLING_AVERAGE_COUNT_INCLUDE_PADDING = 1,
/*!< The average pooling with padding mode is implemented.*/
MLUOP_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING = 2,
/*!< The average pooling without padding mode is implemented.*/
MLUOP_POOLING_FIXED = 3,
/*!< The fixed mode is implemented. This mode is used in the unpool operation.
* In this mode, each input pixel will be put to the center of the pooling kernel
* regardless of the index.*/
} mluOpPoolingMode_t;
/******************************************************************************
* MLU-OPS Runtime Management
******************************************************************************/
/*!
* @struct mluOpContext
* @brief Describes the Cambricon MLU-OPS context.
*/
struct mluOpContext;
/*!
* Pointer to ::mluOpContext struct that holds the Cambricon MLU-OPS context.
*
* MLU device resources cannot be accessed directly, so MLU-OPS uses
* handle to manage MLU-OPS context including MLU device information
* and queues.
*
* The MLU-OPS context is created with ::mluOpCreate and the returned
* handle should be passed to all the subsequent function calls.
* You need to destroy the MLU-OPS context at the end with ::mluOpDestroy.
*/
typedef struct mluOpContext *mluOpHandle_t;
/*!
* The descriptor of the collection of tensor which is used in the RNN operation, such as weight,
* bias.
* You need to call ::mluOpCreateTensorSetDescriptor to create a descriptor, and
* call ::mluOpInitTensorSetMemberDescriptor to set the information about each tensor in
* the tensor set. If the data type of the tensor in the tensor set is in fixed-point data type,
* call ::mluOpInitTensorSetMemberDescriptorPositionAndScale to set quantization
* parameters.
* At last, you need to destroy the descriptor at the end with
* ::mluOpDestroyTensorSetDescriptor.
*/
typedef struct mluOpTensorSetStruct *mluOpTensorSetDescriptor_t;
// Group: Runtime Management
/*!
* @brief Initializes the MLU-OPS library and creates a handle \b handle to a struct
* that holds the MLU-OPS library context. It allocates hardware resources on the host
* and device. You need to call this function before any other MLU-OPS function.
*
* You need to call ::mluOpDestroy to release the resources later.
*
* @param[out] handle
* Pointer to a Cambricon MLU-OPS context that is used to manage MLU devices and queues.
* For detailed information, see ::mluOpHandle_t.
*
* @par Return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @par Data Type
* - None.
*
* @par Data Layout
* - None.
*
* @par Scale Limitation
* - None.
*
* @par API Dependency
* - None.
*
* @par Note
* - None.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
mluOpStatus_t MLUOP_WIN_API
mluOpCreate(mluOpHandle_t *handle);
// Group: Runtime Management
/*!
* @brief Updates the MLU-OPS context information that is held by \b handle. This function
* should be called if you call CNDrv API cnSetCtxConfigParam to set the context information.
* The related context information will be synchronized to MLU-OPS with this function. For
* detailed information, see "Cambricon CNDrv Developer Guide".
*
* @param[in] handle
* Pointer to a Cambricon MLU-OPS context that is used to manage MLU devices. For detailed information,
* see ::mluOpHandle_t.
*
* @par Return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @par Data Type
* - None.
*
* @par Data Layout
* - None.
*
* @par Scale Limitation
* - None.
*
* @par API Dependency
* - None.
*
* @par Note
* - None.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
mluOpStatus_t MLUOP_WIN_API
mluOpUpdateContextInformation(mluOpHandle_t handle);
// Group: Runtime Management
/*!
* @brief Releases the resources of the specified MLU-OPS handle \b handle that was
* created by ::mluOpCreate. It is usually the last call to destroy
* the handle to the MLU-OPS handle.
*
* @param[in] handle
* Pointer to the MLU devices that holds information to be destroyed.
*
* @par Return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @par Data Type
* - None.
*
* @par Data Layout
* - None.
*
* @par Scale Limitation
* - None.
*
* @par API Dependency
* - None.
*
* @par Note
* - None.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
mluOpStatus_t MLUOP_WIN_API
mluOpDestroy(mluOpHandle_t handle);
// Group: Runtime Management
/*!
* @brief Sets the runtime queue \b queue in the handle \b handle. The queue is used to
* launch kernels or to synchronize to this queue.
*
* Before setting a queue \b queue, you need to call ::mluOpCreate to initialize
* MLU-OPS library, and call cnrtCreateQueue to create a queue \b queue.
*
* @param[in] handle
* Handle to a Cambricon MLU-OPS context that is used to manage MLU devices and
* queues. For detailed information, see ::mluOpHandle_t.
* @param[in] queue
* The runtime queue to be set to the MLU-OPS handle.
*
* @par Return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @par Data Type
* - None.
*
* @par Data Layout
* - None.
*
* @par Scale Limitation
* - None.
*
* @par API Dependency
* - None.
*
* @par Note
* - None.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
mluOpStatus_t MLUOP_WIN_API
mluOpSetQueue(mluOpHandle_t handle, cnrtQueue_t queue);
// Group: Runtime Management
/*!
* @brief Retrieves the queue \b queue that was previously set to the handle \b handle.
*
* @param[in] handle
* Handle to a Cambricon MLU-OPS context that is used to manage MLU devices and queues. For
* detailed information, see ::mluOpHandle_t.
* @param[out] queue
* Pointer to the queue that was previously set to the specified handle.
*
* @par Return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @par Data Type
* - None.
*
* @par Data Layout
* - None.
*
* @par Scale Limitation
* - None.
*
* @par API Dependency
* - None.
*
* @par Note
* - None.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
mluOpStatus_t MLUOP_WIN_API
mluOpGetQueue(mluOpHandle_t handle, cnrtQueue_t *queue);
// Group: Runtime Management
/*!
* @brief Converts the MLU-OPS enumerated status code to ASCIIZ static string and returns
* a pointer to the MLU memory that holds information about ASCIIZ static string with
* the status name. For example, when the input argument is ::MLUOP_STATUS_SUCCESS, the
* returned string is ::MLUOP_STATUS_SUCCESS. When an invalid status value is passed to
* the function, the returned string is ::MLUOP_STATUS_BAD_PARAM.
*
* @param[in] status
* The MLU-OPS enumerated status code.
*
* @par return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @par Data Type
* - None.
*
* @par Data Layout
* - None.
*
* @par Scale Limitation
* - None.
*
* @par API Dependency
* - None.
*
* @par Note
* - None.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
const char *
mluOpGetErrorString(mluOpStatus_t status);
// Group: Tensor
/*!
* @brief Gets the size of a data type in ::mluOpDataType_t.
*
* @param[in] data_type
* For detailed information, see ::mluOpDataType_t.
* @param[out] size
* Host pointer to the size of the data type.
*
* @par Return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @par Data Type
* - None.
*
* @par Data Layout
* - None.
*
* @par Scale Limitation
* - None.
*
* @par API Dependency
* - None.
*
* @par Note
* - None.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
mluOpStatus_t MLUOP_WIN_API
mluOpGetSizeOfDataType(mluOpDataType_t data_type, size_t *size);
// Group: Version Management
/*!
* @brief Retrieves the version of MLU-OPS library. The version of MLU-OPS
* is composed of \b major, \b minor, and \b patch. For instance, major = 1,
* minor = 2, patch = 3, the version of MLU-OPS library is 1.2.3.
*
* @param[in] major
* Pointer to scale factor that gets the major version of MLU-OPS library.
* @param[in] minor
* Pointer to scale factor that gets the minor version of MLU-OPS library.
* @param[in] patch
* Pointer to scale factor that gets the patch version of MLU-OPS library.
*
* @par return
* - None.
*
* @par Data Type
* - None.
*
* @par Data Layout
* - None.
*
* @par Scale Limitation
* - None.
*
* @par API Dependency
* - None.
*
* @par Note
* - None.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
void
mluOpGetLibVersion(int *major, int *minor, int *patch);
// Group: QuantizeRoundMode
/*!
* @brief Updates the specific rounding mode of MLU-OPS context information that is held by the \b
* handle. This function should be called if you want to change the MLU-OPS rounding mode that
* is used to cumulate the results. For detailed information, see "Cambricon CNDrv Developer
* Guide".
*
* @param[in] handle
* Pointer to a Cambricon MLU-OPS context that is used to manage MLU devices and queues. For detailed
* information, see ::mluOpHandle_t.
* @param[in] round_mode
* The rounding mode of quantization conversion to be set to the MLU-OPS handle.
*
* @par Return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @par Data Type
* - None.
*
* @par Data Layout
* - None.
*
* @par Scale Limitation
* - None.
*
* @par API Dependency
* - None.
* @par Note
* - None.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
mluOpStatus_t MLUOP_WIN_API
mluOpSetQuantizeRoundMode(mluOpHandle_t handle, mluOpQuantizeRoundMode_t round_mode);
// Group: QuantizeRoundMode
/*!
* @brief Retrieves the rounding mode of a specific MLU-OPS context.
*
* @param[in] handle
* Pointer to a Cambricon MLU-OPS context that is used to manage MLU devices and queues. For detailed
* information, see ::mluOpHandle_t.
* @param[out] round_mode
* The rounding mode of quantization conversion that was previously set to the specified handle.
*
* @par Return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @par Data Type
* - None.
*
* @par Data Layout
* - None.
*
* @par Scale Limitation
* - None.
*
* @par API Dependency
* - None.
*
* @par Note
* - The rounding mode of initialized ::mluOpHandle_t is MLUOP_ROUND_TO_EVEN.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
mluOpStatus_t MLUOP_WIN_API
mluOpGetQuantizeRoundMode(mluOpHandle_t handle, mluOpQuantizeRoundMode_t *round_mode);
// Group: Runtime Management
/*!
* @brief Updates the specific atomics mode of MLU-OPS context information that is held by the
* \b handle. This function should be called if you want to change the atomics mode that is
* used to cumulate the results. For detailed information, see "Cambricon CNDrv Developer Guide".
*
* @param[in] handle
* Pointer to a Cambricon MLU-OPS context that is used to manage MLU devices and queues. For detailed
* information, see ::mluOpHandle_t.
* @param[in] atomics_mode
* The atomics mode.
*
* @par Return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @par Data Type
* - None.
*
* @par Data Layout
* - None.
*
* @par Scale Limitation
* - None.
*
* @par API Dependency
* - None.
*
* @par Note
* - None.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
mluOpStatus_t MLUOP_WIN_API
mluOpSetAtomicsMode(mluOpHandle_t handle, mluOpAtomicsMode_t atomics_mode);
// Group: Runtime Management
/*!
* @brief Retrieves the atomics mode of a specific MLU-OPS context.
*
* @param[in] handle
* Pointer to a Cambricon MLU-OPS context that is used to manage MLU devices and queues. For
* detailed information, see ::mluOpHandle_t.
* @param[out] atomics_mode
* The atomics mode.
*
* @par Return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @par Data Type
* - None.
*
* @par Data Layout
* - None.
*
* @par Scale Limitation
* - None.
*
* @par API Dependency
* - None.
*
* @par Note
* - The default atomics mode of default initialized ::mluOpHandle_t is ::MLUOP_ATOMICS_NOT_ALLOWED.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
mluOpStatus_t MLUOP_WIN_API
mluOpGetAtomicsMode(mluOpHandle_t handle, mluOpAtomicsMode_t *atomics_mode);
/******************************************************************************
* MLU-OPS Data Structure: Descriptor
* The struct represent node, weight and the AI network layer
******************************************************************************/
/*!
* The descriptor of a tensor that holds the information including tensor
* layout, data type, the number of dimensions, shape and strides.
*
* You need to call ::mluOpCreateTensorDescriptor to create a descriptor,
* and call ::mluOpSetTensorDescriptor or ::mluOpSetTensorDescriptorEx
* to set the tensor information to the descriptor. Also, you need to destroy
* the MLU-OPS context at the end with ::mluOpDestroyTensorDescriptor.
*/
typedef struct mluOpTensorStruct *mluOpTensorDescriptor_t;
/*! The descriptor of Sequence Data that holds the dimensions,
* layout, data type, sequence length, padding fill, position, and scale.
* The total size of the tensor descriptor supports up to 2 Giga elements.
* Call ::mluOpCreateSeqDataDescriptor to create a descriptor, and
* call ::mluOpSetSeqDataDescriptor_v2 to set the sequence data information to the descriptor.
* If the sequence data is in fixed-point data type, call ::mluOpSetSeqDataDescriptorPositionAndScale
* to set the position and scale of the sequence data.
* To destroy the descriptor, call ::mluOpDestroySeqDataDescriptor.
*/
typedef struct mluOpSeqDataStruct *mluOpSeqDataDescriptor_t;
// Group: SeqData
/*!
* @brief Creates a sequence data instance \p seq_data_desc that holds the dimensions, data type,
* sequence lengths, padding fill and layout of sequence data on the host memory.
*
* Use ::mluOpSetSeqDataDescriptor_v2 to configure the descriptor and ::mluOpDestroySeqDataDescriptor
* function to destroy the sequence data descriptor.
*
* @param[out] seq_data_desc
* Pointer to the host memory that holds information about
* the struct of the sequence data descriptor.
*
* @par Return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @note
* - None.
*
* @par Requirements
* - None.
*
* @par Example
* - None.
*
* @par Reference
* - None.
*/
mluOpStatus_t MLUOP_WIN_API
mluOpCreateSeqDataDescriptor(mluOpSeqDataDescriptor_t *seq_data_desc);
// Group: SeqData
/*!
* @brief Sets the sequence data descriptor \p seq_data_desc that holds the dimensions,
* data type, sequence lengths, padding fill and layout of the sequence data.
*
* The number of dimensions in the \p dimSize[] is defined by \p dimNb. For example,
* if the layout of the sequence data is set to ::MLUOP_SEQDATA_NC, the \p dimNb is 2,
* with \p dimSize={batch, embedding}.
*
* The ::mluOpSeqDataDescriptor_t container is a collection of fixed-length sequential
* vectors, similar to the words constructing sentences. The T dimension described in the
* ::mluOpSeqDataLayout_t is the time dimension. Different sequences are bundled together to a
* batch. The beam dimension described in the ::mluOpSeqDataLayout_t is
* different candidates presenting a similar meaning in a typical translation task. The original
* sentence can be translated to several versions before picking the optimal one, and the number
* of candidates is beam.
*
* Note that different sentences have different sequence lengths, even inside a beam.
* \p seqLengthArray is to record the real sequence lengths before padding to the maximum sequence
* length. The value of \p seqLengthArray should follow a batch-beam order, in despite of
* sequence data layout. Take a sequence of batch=3, beam=2 for example, the \p seqLengthArray
* should be as follows:
@verbatim
{batch_idx = 0, beam_idx = 0}
{batch_idx = 0, beam_idx = 1}
{batch_idx = 1, beam_idx = 0}
{batch_idx = 1, beam_idx = 1}
{batch_idx = 2, beam_idx = 0}
{batch_idx = 2, beam_idx = 1}
@endverbatim
* If the real sequence lengths are not requested, pass NULL to \p seqLengthArray in this function.
*
* The \p seqLengthArraySize should be batch * beam, which is 6 in the example above.
*
* The \p PaddingFill describes whether the sequence data needs to be padded using a
* specified value. In the multi-head attention operation, the padding part should be zero before
* entering the attention part to ensure the result validity. If the sequence data is padding
* zero in advance, pass NULL to \p PaddingFill in this function. Otherwise, pass a pointer to padding
* value (e.g. float a = 0, &a) to \p PaddingFill to indicate this function that extra padding are
* needed.
*
* @param[in,out] seq_data_desc
* Input/output. The descriptor of the sequence data. For detailed information,
* see ::mluOpSeqDataDescriptor_t.
* @param[in] layout
* The layout of the sequence data. See ::mluOpSeqDataLayout_t for the description of the
* enumeration type.
* @param[in] dtype
* The data type of the sequence data. See ::mluOpDataType_t for the description of the
* enumeration type.
* @param[in] dimNb
* The number of dimensions of the sequence data.
* @param[in] dimSize
* An array that contains the size of the sequence data for each dimension.
* @param[in] seqLengthArraySize
* Number of elements in sequence length array, \p seqLengthArray[]. It should be
* batch * beam. The batch and beam are described in the ::mluOpSeqDataLayout_t.
* @param[in] seqLengthArray
* An integer array recording the length of all sequences. Note that the array should be
* set in the batch-beam order, in despite of sequence data layout. Set this parameter to NULL
* when sequence length array is not requested.
* @param[in] paddingFill
* A host pointer to the data type \p dtype to fill up the padding vectors within
* the valid length of each sequence. Use NULL when extra padding is not requested.
* @par Return
* - ::MLUOP_STATUS_SUCCESS, ::MLUOP_STATUS_BAD_PARAM
*
* @par API Dependency
* - Before calling this function, ::mluOpCreateSeqDataDescriptor should be called.