-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Add new operator ml_nms #325
base: master
Are you sure you want to change the base?
Conversation
Add files via upload
return false; | ||
} | ||
|
||
mluOpStatus_t MlNmsParamCheck( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mluOpStatus_t MlNmsParamCheck( | |
mluOpStatus_t mlNmsParamCheck( |
@@ -25,6 +25,7 @@ | |||
|
|||
#include <stdint.h> | |||
#include "cnrt.h" | |||
#include "mlu_op.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要包含该头文件吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要mlu_op.h中的结构体
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为mluOpDataType_t的使用才包含的mlu_op.h,实际上mluOpKernelMlNmsFloatFast中的Float已经指明了数据类型,这个参数可以删除。
} | ||
|
||
__mlu_func__ void getSegNumMlNmsFast(int input_boxes_num, int* seg) { | ||
if (taskDim > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为使用的是U1任务,这里taskDim一定会大于1。建议将该判断去除,默认使用taskDim > 1时的计算。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为后续block任务预留
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果taskDim = 1,uint32_t((input_boxes_num % taskDim) > taskId)结果为0,此时*seg通过(input_boxes_num / taskDim) + uint32_t((input_boxes_num % taskDim) > taskId)得到的值和input_boxes_num相同。
bangc-ops/kernels/ml_nms/ml_nms.mlu
Outdated
__nram__ char worke_space[MAX_NRAM_SIZE / 16]; | ||
__memcpy((T*)worke_space, | ||
boxes_data_ptr + (offset * 4), | ||
seg * 4 * sizeof(T), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seg是通过所有box的个数在taskDim上进行拆分得到的,这里考虑不严谨(不应该认为计算得到的seg数量能够完全保存在nram上),这里实际需要的是片上单次处理的数据量,应该和内存大小相关。
bangc-ops/test/mlu_op_gtest/pb_gtest/src/zoo/ml_nms/test_case/case_0.prototxt
Outdated
Show resolved
Hide resolved
#include "mlu_op_kernel.h" | ||
#include "kernels/kernel.h" | ||
|
||
#define NRAM_SIZE 2 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个宏没有被使用
if (boxes_data_ptr_desc->dtype == MLUOP_DTYPE_HALF) { | ||
mluOpFuncKernel = mluOpKernelMlNmsHalfFast; | ||
apply_nram_size = (input_boxes_num * 6 * 2) + (input_boxes_num * 14 * 2); | ||
} else { | ||
mluOpFuncKernel = mluOpKernelMlNmsFloatFast; | ||
apply_nram_size = (input_boxes_num * 6 * 4) + (input_boxes_num * 14 * 4); | ||
} | ||
if (apply_nram_size > MAX_NRAM_SIZE) { | ||
if ((apply_nram_size % MAX_NRAM_SIZE) !=0) { | ||
loop_num = (apply_nram_size / MAX_NRAM_SIZE) + 1; | ||
} else { | ||
loop_num = apply_nram_size / MAX_NRAM_SIZE; | ||
} | ||
} | ||
if (loop_num > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分循环放在 kernel 内(mluOpFuncKernel)处理更合适,减少kernl launch的开销
__mlu_device__ void unionImple(T* boxes_data_ptr, | ||
T nms_thres, int offset, int seg, int input_boxes_num, | ||
int boxes_start_position, uint8_t* output_boxes_index) { | ||
__nram__ char worke_space[MAX_NRAM_SIZE / 16]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__nram__ char worke_space[MAX_NRAM_SIZE / 16]; | |
__nram__ char work_space[MAX_NRAM_SIZE / 16]; |
x1 = worke_space + data_len; | ||
y1 = worke_space + (data_len + compute_len); | ||
x2 = worke_space + (data_len + (compute_len * 2)); | ||
y2 = worke_space + (data_len + (compute_len * 3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x1 = worke_space + data_len; | |
y1 = worke_space + (data_len + compute_len); | |
x2 = worke_space + (data_len + (compute_len * 2)); | |
y2 = worke_space + (data_len + (compute_len * 3)); | |
x1 = worke_space + data_len; | |
y1 = x1 + compute_len; | |
x2 = y1 + compute_len; | |
y2 = x2 + compute_len; |
result = (uint8_t*)worke_space + ((data_len + (compute_len * 8)) * | ||
(sizeof(T) / sizeof(uint8_t)) + seg); | ||
} | ||
for (i = 0, j = 0; i < seg * 6; i+=6, j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (i = 0, j = 0; i < seg * 6; i+=6, j++) { | |
for (i = 0, j = 0; i < seg * 6; i += 6, j++) { |
__bang_sub(h, y1, y2, compute_len); | ||
__bang_sub(w, x2, x1, compute_len); | ||
__bang_mul(area_ptr, h, w, compute_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
面积计算公式为:
area = (a[2] - a[0] + 1) * (a[3] - a[1] + 1);
/是否少了 + 1操作?
|
||
// max x1 | ||
__bang_write_value(scores_max_boxes_ptr, compute_len, scores_max_boxes[0]); | ||
__bang_cycle_sub(x1, x1, scores_max_boxes_ptr, compute_len, compute_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__bang_cycle_sub(x1, x1, scores_max_boxes_ptr, compute_len, compute_len); | |
__bang_sub(x1, x1, scores_max_boxes_ptr); |
__bang_cycle_add(tem, area_ptr, scores_max_boxes_area_ptr, | ||
compute_len, compute_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__bang_cycle_add(tem, area_ptr, scores_max_boxes_area_ptr, | |
compute_len, compute_len); | |
__bang_add(tem, area_ptr, scores_max_boxes_area_ptr); |
综合代码来看,tem这块内存并不需要,其他位置可以使用scores_max_boxes_area_ptr进行操作,这里可以将结果保存在area_ptr中。
} else { | ||
result[i] = 0; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
该for循环是否可以使用 bang_mul 处理?
auto output_ptr = parser_->getMetaTensor(1).cpu_ptr; | ||
int input_boxes_num = input_desc->dims[0]; | ||
std::vector<std::vector<float>> boxes_data_ptr; | ||
for (int i = 0; i < input_boxes_num * 4; i+=4) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (int i = 0; i < input_boxes_num * 4; i+=4) { | |
for (int i = 0; i < input_boxes_num * 6; i += 6) { |
单个box含有6个数据,这里应该修改为6
|
||
float area1 = abs(box1[0] - box1[2]) * abs(box1[1] - box1[3]); | ||
float area2 = abs(box2[0] - box2[2]) * abs(box2[1] - box2[3]); | ||
float inter = abs(x1 - x2) * abs(y1 - y2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没有考虑两个box不相交的情况
boxes_data_ptr.push_back(data_ptr); | ||
} | ||
for (int i = 0; i < input_boxes_num ; i++) { | ||
float iou = iouCompute(boxes_data_ptr[0], boxes_data_ptr[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
只需要和第一个box计算交并比就可以了吗?
@wenzhengyin 有下面几个内容需要确认一下: |
Thanks for your contribution and we appreciate it a lot.
1. Motivation
Add new operator ml_nms
2. Modification
Code to add a new operator
Add file
bangc-ops\kernels\ml_nms\ml_nms.cpp
bangc-ops\kernels\ml_nms\ml_nms.mlu
bangc-ops\kernels\ml_nms\ml_nms.h
bangc-ops\test\mlu_op_gtest\pb_gtest\src\zoo\ml_nms\ml_nms.cpp
bangc-ops\test\mlu_op_gtest\pb_gtest\src\zoo\ml_nms\ml_nms.h
bangc-ops\test\mlu_op_gtest\pb_gtest\src\zoo\ml_nms\test_case\case_0.prototxt
Update file
bangc-ops\mlu_op.h
bangc-ops\mlu_op_kernel.h
bangc-ops\test\mlu_op_gtest\pb_gtest\mlu_op_test_proto\mlu_op_test.proto
3. Test Report
If you want to know how to do operator testing, you can see GTest-User-Guide-zh.
3.1 Modification Details
3.1.1 Accuracy Acceptance Standard
For static threshold standard details, see: MLU-OPS Accuracy Acceptance Standard.
3.1.2 Operator Scheme checklist
MLU290
MLU370
U1
U4
3.1.3 New Feature Test
If you have checked the following items, please tick the relevant box.
3.1.4 Parameter Check
When a new operator is submitted, the test points are given and the test results are stated.
3.2 Performance Test
See MLU-OPS Performance Acceptance Standard for details.
Platform :MLU270
Platform :MLU290
Platform:MLU370
3.3 Summary Analysis
Please give a brief overview here, if you want to note and summarize the content.