Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature](mlu-ops): add dcn_forward and dcn_backward_weight op #912

Merged
merged 1 commit into from
Jan 17, 2024

Conversation

SangChengC
Copy link
Contributor

@SangChengC SangChengC commented Jan 12, 2024

Thanks for your contribution and we appreciate it a lot. 🚀🚀

1. Motivation

add dcn_forward and dcn_backward_weight op

2. Modification

    modified:   docs/user_guide/9_operators/index.rst
    modified:   docs/MLU-OPS-OpList.md
    modified:   mlu_op.h
    modified:   test/mlu_op_gtest/pb_gtest/mlu_op_test_proto
    new file:   kernels/dcn_forward/dcn_forward.cpp
    new file:   kernels/dcn_forward/dcn_common.cpp
    new file:   kernels/dcn_forward/dcn_backward_weight.cpp
    new file:   test/mlu_op_gtest/pb_gtest/src/internal_kernel/transpose_cpu/transpose_cpu.cpp
    new file:   test/mlu_op_gtest/pb_gtest/src/internal_kernel/transpose_cpu/transpose_cpu.h
    new file:   test/mlu_op_gtest/pb_gtest/src/zoo/dcn_backward_weight/dcn_backward_weight.cpp
    new file:   test/mlu_op_gtest/pb_gtest/src/zoo/dcn_backward_weight/dcn_backward_weight.h
    new file:   test/mlu_op_gtest/pb_gtest/src/zoo/dcn_backward_weight/test_case/case_hi_16.prototxt
    new file:   test/mlu_op_gtest/pb_gtest/src/zoo/dcn_forward/dcn_forward.cpp
    new file:   test/mlu_op_gtest/pb_gtest/src/zoo/dcn_forward/dcn_forward.h
    new file:   test/mlu_op_gtest/pb_gtest/src/zoo/dcn_forward/test_case/case_hi_16.prototxt

3. Test Report

If you want to know how to do operator testing, you can see GTest-User-Guide-zh.

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

For static threshold standard details, see: MLU-OPS Accuracy Acceptance Standard.

  • static threshold
    • diff1
      • float32 mlu diff1 <= 1e-5
      • float32 mlu diff1 <= 3e-3
      • float16 mlu diff1 <= 3e-3
    • diff2
      • float32 mlu diff2 <= 1e-5
      • float32 mlu diff2 <= 3e-3
      • float16 mlu diff2 <= 3e-3
    • diff3
      • mlu diff3 == 0
      • mlu diff3_1 == 0
      • mlu diff3_2 == 0
  • dynamic threshold
    • diff1: mlu diff1 <= max(baseline diff1 * 10, static threshold)
    • diff2: mlu diff2 <= max(baseline diff2 * 10, static threshold)
    • diff3: mlu diff3 <= max(baseline diff3 * 10, static threshold)
      • float32, threshold = 1e-5
      • float16, threshold = 1e-3

3.1.2 Operator Scheme checklist

  • Supported hardware
    • MLU370
    • MLU590
  • Job types
    • BLOCK
    • UNION1
    • UNION2
    • UNION4
    • The operator will dynamically select the most suitable task type, for example, UNION8

3.2 Accuracy Test

3.2.1 Accuracy Test

If you have checked the following items, please tick the relevant box.

  • Data type test (e.g. float32/int8)
  • Multi-dimensional tensor test
  • Layout test
  • Different size/integer remainder end segment/alignment misalignment test
  • Zero dimensional tensor test/zero element test
  • stability test
  • Multiple platform test
  • Gen_case module test, see: Gencase-User-Guide-zh
  • Nan/INF tests
  • Bug fix tests
  • For memory leak check details, see: GTest-User-Guide-zh
  • For code coverage check details, see: GTest-User-Guide-zh
  • For I/O calculation efficiency check details, see: MLU-OPS-Performance-Acceptance-Standard

3.2.2 Parameter Check

Test Point-1: When a new operator is submitted, the test points are given and the test results are stated. Acceptance Standard: Normal error.

Please fill your test results(Error Message) in here, ...

Test Point-2: Whether illegal parameters are passed. Acceptance Standard: Normal error.

Test results...

3.3 Performance Test

See MLU-OPS Performance Acceptance Standard for details.

Platform:MLU370

# The test results should contain Op name, Shape, Data type,  
#   MLU Hardware Time(us), MLU Interface Time(us), MLU IO Efficiency, 
#   MLU Compute Efficiency, and Mlu Workspace Size(Bytes)
# 
# for example: dcn_forward
#
# ----------- case0 -----------
[MLU Hardware Time      ]: 187 (us)
[MLU Interface Time     ]: 135.677 (us)
[MLU IO Efficiency      ]: 0.00457182
[MLU Compute Efficiency ]: 0.0293672
[MLU Workspace Size     ]: 5.52968e+06 (Bytes)
[MLU TheoryOps          ]: 1.26528e+07 (Ops)
[MLU TheoryIOs          ]: 1.7509e+06 (Bytes)
[MLU ComputeForce       ]: 2.304e+12 (op/s)
[MLU IoBandWidth        ]: 2048 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output]
DIFF1: 3.491183e-05
DIFF2: 3.435694e-05
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/dcn_forward/test_case/case_hi_16.prototxt
[       OK ] dcn_forward/TestSuite.mluOp/0 (128 ms)
[----------] 1 test from dcn_forward/TestSuite (128 ms total)
# 
# for example: dcn_backward_weight
# ----------- case0 -----------
[MLU Hardware Time      ]: 229 (us)
[MLU Interface Time     ]: 32051.5 (us)
[MLU IO Efficiency      ]: 0.00373332
[MLU Compute Efficiency ]: 0.0239811
[MLU Workspace Size     ]: 5.54402e+06 (Bytes)
[MLU TheoryOps          ]: 1.26528e+07 (Ops)
[MLU TheoryIOs          ]: 1.7509e+06 (Bytes)
[MLU ComputeForce       ]: 2.304e+12 (op/s)
[MLU IoBandWidth        ]: 2048 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[grad_weight]
DIFF1: 4.231463e-05
DIFF2: 4.225033e-05
[grad_bias]
DIFF1: 2.666710e-07
DIFF2: 3.025384e-07
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/dcn_backward_weight/test_case/case_hi_16.prototxt
[       OK ] dcn_backward_weight/TestSuite.mluOp/0 (173 ms)
[----------] 1 test from dcn_backward_weight/TestSuite (173 ms total)
# ...

Platform:MLU590

# for example: dcn_forward
#
# ----------- case0 -----------
[MLU Hardware Time      ]: 187 (us)
[MLU Interface Time     ]: 135.677 (us)
[MLU IO Efficiency      ]: 0.00457182
[MLU Compute Efficiency ]: 0.0293672
[MLU Workspace Size     ]: 5.52968e+06 (Bytes)
[MLU TheoryOps          ]: 1.26528e+07 (Ops)
[MLU TheoryIOs          ]: 1.7509e+06 (Bytes)
[MLU ComputeForce       ]: 2.304e+12 (op/s)
[MLU IoBandWidth        ]: 2048 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[output]
DIFF1: 3.491183e-05
DIFF2: 3.435694e-05
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/dcn_forward/test_case/case_hi_16.prototxt
[       OK ] dcn_forward/TestSuite.mluOp/0 (128 ms)
[----------] 1 test from dcn_forward/TestSuite (128 ms total)
# 
# for example: dcn_backward_weight
# ----------- case0 -----------
[MLU Hardware Time      ]: 229 (us)
[MLU Interface Time     ]: 32051.5 (us)
[MLU IO Efficiency      ]: 0.00373332
[MLU Compute Efficiency ]: 0.0239811
[MLU Workspace Size     ]: 5.54402e+06 (Bytes)
[MLU TheoryOps          ]: 1.26528e+07 (Ops)
[MLU TheoryIOs          ]: 1.7509e+06 (Bytes)
[MLU ComputeForce       ]: 2.304e+12 (op/s)
[MLU IoBandWidth        ]: 2048 (GB/s)
[GPU Hardware Time      ]: -1 (us)
[GPU IO Efficiency      ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size     ]: -1 (Bytes)
[Diffs]:
[grad_weight]
DIFF1: 4.231463e-05
DIFF2: 4.225033e-05
[grad_bias]
DIFF1: 2.666710e-07
DIFF2: 3.025384e-07
[^      OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/dcn_backward_weight/test_case/case_hi_16.prototxt
[       OK ] dcn_backward_weight/TestSuite.mluOp/0 (173 ms)
[----------] 1 test from dcn_backward_weight/TestSuite (173 ms total)
# ...
# ...

3.4 Summary Analysis

Please give a brief overview here, if you want to note and summarize the content.

@SangChengC SangChengC force-pushed the binary-dcn-sangcm branch 9 times, most recently from c0229c5 to cea9549 Compare January 15, 2024 09:58
mlu_op.h Outdated Show resolved Hide resolved
@SangChengC SangChengC force-pushed the binary-dcn-sangcm branch 2 times, most recently from 3a4e9ee to 9657a62 Compare January 15, 2024 12:05
@PetrelYy PetrelYy added this to the v1.0.0 milestone Jan 15, 2024
mlu_op.h Show resolved Hide resolved
@SangChengC SangChengC force-pushed the binary-dcn-sangcm branch 2 times, most recently from 13b0564 to f0ce388 Compare January 15, 2024 12:53
kernels/dcn_forward/dcn_forward.cpp Outdated Show resolved Hide resolved
kernels/dcn_backward_weight/dcn_backward_weight.cpp Outdated Show resolved Hide resolved
docs/user_guide/9_operators/index.rst Outdated Show resolved Hide resolved
mlu_op.h Outdated Show resolved Hide resolved
mlu_op.h Outdated Show resolved Hide resolved
mlu_op.h Outdated Show resolved Hide resolved
mlu_op.h Outdated Show resolved Hide resolved
mlu_op.h Outdated Show resolved Hide resolved
mlu_op.h Outdated Show resolved Hide resolved
mlu_op.h Outdated Show resolved Hide resolved
mlu_op.h Outdated Show resolved Hide resolved
mlu_op.h Show resolved Hide resolved
@SangChengC SangChengC force-pushed the binary-dcn-sangcm branch 2 times, most recently from 462167e to e7bf9e6 Compare January 16, 2024 08:18
mlu_op.h Outdated Show resolved Hide resolved
@SangChengC SangChengC force-pushed the binary-dcn-sangcm branch 2 times, most recently from c56aef1 to 4ed9c02 Compare January 16, 2024 12:03
mlu_op.h Show resolved Hide resolved
@duzekunKTH duzekunKTH merged commit daa3939 into Cambricon:master Jan 17, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants