-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the calculation of the FullyConnected layer takes a lot of time #7273
Comments
my bad, GEMM macros can be set in version 2021.1, but now I'm using 2021.4. |
Hi, |
but in 2021.4 I can not found this macros, why delete it ? |
@Iffa-Meah can you help me? |
Hi @zhaohb I see GEMM was removed starting on OpenVINO 2021.2 release, I would have to check with the development team. Could you provide your model? I want to reproduce the behavior and get the development team's input as well. Regards, |
ok, I will share my model with you later , but I want to know why remove GEMM? Is the performance similar between different implementations? |
@jgespino I have tried to add GEMM in 2021.4, but failed, so I hope you can add GEMM and test my model to see if the performance of these methods is the same, thank you very much. |
@jgespino Is there any progress now? |
@jgespino How can I tell if gemm=MKL has compiled successfully? dense_1/BiasAdd EXECUTED layerType: FullyConnected realTime: 3055 cpu: 3055 execType: jit_gemm_FP32 But I added gemm=MKL, compiled successfully, benchmark_app.py -pc still shows: dense_1/BiasAdd EXECUTED layerType: FullyConnected realTime: 3055 cpu: 3035 execType: jit_gemm_FP32 Will execType change after using MKL? and the execution time seem not changed. |
Who can give me some advice? |
Hi @zhaohb I appreciate your patience, I've reached out to the development team for additional assistance. Regards, |
@jgespino thank you very much. let me know if you find anything! |
@jgespino How is the progress now? I really need a solution to this problem |
Hi, who can help me? |
Hi @zhaohb. |
@dmitry-gorokhov thank you for your reply. this is part of my model, there are many combinations of HW, such as (1490x256、1490x4、256x256), and the slowest one should be 1490x256. |
@dmitry-gorokhov This part of the model is a bit wide, how can we increase parallelism in this part, which I think should improve performance. |
@zhaohb By HW I actually mentioned hardware :). It is important to know which system your are using for benchmarking because it affects possible ways for perf improvements. |
@zhaohb Glad to hear that. I expect the PR to be merged within 2 weeks. |
@dmitry-gorokhov ok, I will try to test the optimal number of nstreams. thank you very much. |
@zhaohb There shouldn't be much difference in terms of performance. So you can use feature branch for benchmarking. |
@dmitry-gorokhov More than just benchmarking, I wanted to add this branch to the Model Server and make it available to the Model Server for the best inference performance |
@dmitry-gorokhov I compiled the reduce_node_extension branch, but found that I could not generate the opencv library, this is my compile command: cmake .. -DCMAKE_BUILD_TYPE=Release -DENABLE_CLDNN=OFF -DENABLE_OPENCV=OFF -DTHREADING=TBB -DENABLE_GNA=OFF -DENABLE_VPU=OFF -DENABLE_PYTHON=ON -DNGRAPH_ONNX_FRONTEND_ENABLE=ON -DENABLE_OPENCV=ON -DCMAKE_INSTALL_PREFIX=/work/6686_openvino/out_6686_opencv/ but output : -- OpenVINO version is 2022.1.0
-- CMAKE_BUILD_TYPE: Release
CMake Warning at cmake/developer_package/clang_format/clang_format.cmake:21 (message):
Supported clang-format version is not found!
Call Stack (most recent call first):
cmake/developer_package/IEDevScriptsConfig.cmake:294 (include)
CMakeLists.txt:11 (find_package)
CMake Warning at cmake/developer_package/ncc_naming_style/ncc_naming_style.cmake:26 (message):
Please, install libclang-[N]-dev package (required for ncc naming style
check)
Call Stack (most recent call first):
cmake/developer_package/IEDevScriptsConfig.cmake:295 (include)
CMakeLists.txt:11 (find_package)
-- clang package is installed, but may have different version (5.0). Please use "/usr/bin/python3 -m pip install clang==9.0".
-- Inference Engine enabled features:
--
-- CI_BUILD_NUMBER: custom_reduce_node_extension_24b77d73c44f7058f4b0d05b59e079a7b80ab467
-- ENABLE_LTO = OFF
-- OS_FOLDER = OFF
-- USE_BUILD_TYPE_SUBFOLDER = ON
-- TREAT_WARNING_AS_ERROR = ON
-- ENABLE_INTEGRITYCHECK = OFF
-- ENABLE_SANITIZER = OFF
-- ENABLE_UB_SANITIZER = OFF
-- ENABLE_THREAD_SANITIZER = OFF
-- ENABLE_COVERAGE = OFF
-- ENABLE_SSE42 = ON
-- ENABLE_AVX2 = ON
-- ENABLE_AVX512F = ON
-- BUILD_SHARED_LIBS = ON
-- ENABLE_FASTER_BUILD = OFF
-- ENABLE_CPPLINT = ON
-- ENABLE_CPPLINT_REPORT = OFF
-- ENABLE_CLANG_FORMAT = OFF
-- ENABLE_NCC_STYLE = OFF
-- VERBOSE_BUILD = OFF
-- ENABLE_UNSAFE_LOCATIONS = OFF
-- ENABLE_FUZZING = OFF
-- ENABLE_MKL_DNN = ON
-- ENABLE_TESTS = OFF
-- ENABLE_STRICT_DEPENDENCIES = ON
-- ENABLE_CLDNN = OFF
-- ENABLE_PROFILING_ITT = OFF
-- ENABLE_PROFILING_FILTER = ALL
-- ENABLE_PROFILING_FIRST_INFERENCE = ON
-- SELECTIVE_BUILD = OFF
-- ENABLE_ERROR_HIGHLIGHT = OFF
-- ENABLE_PYTHON = ON
-- ENABLE_DOCS = OFF
-- ENABLE_GNA = OFF
-- ENABLE_CLDNN_TESTS = OFF
-- THREADING = TBB
-- ENABLE_VPU = OFF
-- ENABLE_MYRIAD = OFF
-- ENABLE_MYRIAD_NO_BOOT = OFF
-- ENABLE_GAPI_TESTS = OFF
-- GAPI_TEST_PERF = OFF
-- ENABLE_MYRIAD_MVNC_TESTS = OFF
-- ENABLE_DATA = OFF
-- ENABLE_BEH_TESTS = OFF
-- ENABLE_FUNCTIONAL_TESTS = OFF
-- ENABLE_SAMPLES = 0
-- ENABLE_OPENCV = ON
-- ENABLE_V7_SERIALIZE = OFF
-- ENABLE_TBB_RELEASE_ONLY = ON
-- ENABLE_SYSTEM_PUGIXML = OFF
-- ENABLE_DEBUG_CAPS = OFF
-- ENABLE_GPU_DEBUG_CAPS = OFF
-- ENABLE_CPU_DEBUG_CAPS = OFF
-- NGRAPH_ONNX_FRONTEND_ENABLE = ON
-- NGRAPH_PDPD_FRONTEND_ENABLE = ON
-- NGRAPH_IR_FRONTEND_ENABLE = ON
-- NGRAPH_USE_PROTOBUF_LITE = ON
-- NGRAPH_USE_SYSTEM_PROTOBUF = OFF
-- OPENVINO_DEBUG_ENABLE = OFF
-- ENABLE_REQUIREMENTS_INSTALL = ON
--
-- MODELS_PATH=
-- PROJECT ............................... OpenVINO
-- CMAKE_BINARY_DIR ...................... /work/6686_openvino/openvino/build
-- OpenVINO_SOURCE_DIR ................... /work/6686_openvino/openvino
-- CMAKE_GENERATOR ....................... Unix Makefiles
-- CMAKE_C_COMPILER_ID ................... GNU
-- CMAKE_BUILD_TYPE ...................... Release
-- The name pugixml::static is an ALIAS for pugixml-static. It will be exported to the InferenceEngineDeveloperPackage with the original name.
-- The name gflags is an ALIAS for gflags_nothreads_static. It will be exported to the InferenceEngineDeveloperPackage with the original name.
--
-- 3.9.2.0
-- Found PythonInterp: /usr/bin/python3 (found version "3.8.10")
-- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.8.so (found version "3.8.10")
Generated: /work/6686_openvino/openvino/build/thirdparty/onnx/onnx/onnx/onnx_ngraph_onnx-ml.proto
Generated: /work/6686_openvino/openvino/build/thirdparty/onnx/onnx/onnx/onnx-operators_ngraph_onnx-ml.proto
Generated: /work/6686_openvino/openvino/build/thirdparty/onnx/onnx/onnx/onnx-data_ngraph_onnx.proto
--
-- ******** Summary ********
-- CMake version : 3.16.3
-- CMake command : /usr/bin/cmake
-- System : Linux
-- C++ compiler : /usr/bin/c++
-- C++ compiler version : 9.3.0
-- CXX flags : -Wsuggest-override -D_GLIBCXX_USE_CXX11_ABI=1 -Wno-error=parentheses -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong -s -fsigned-char -Werror -ffunction-sections -fdata-sections -fdiagnostics-show-option -Wundef -Wreturn-type -Wunused-variable -Wuninitialized -Winit-self -Wmaybe-uninitialized -Wno-suggest-override -Wnon-virtual-dtor
-- Build type : Release
-- Compile definitions : IE_BUILD_POSTFIX="";ENABLE_MKL_DNN=1
-- CMAKE_PREFIX_PATH :
-- CMAKE_INSTALL_PREFIX : /work/6686_openvino/out_6686_opencv
-- CMAKE_MODULE_PATH :
--
-- ONNX version : 1.9.0
-- ONNX NAMESPACE : ngraph_onnx
-- ONNX_USE_LITE_PROTO : ON
-- USE_PROTOBUF_SHARED_LIBS : OFF
-- ONNX_DISABLE_EXCEPTIONS : OFF
-- ONNX_WERROR : OFF
-- ONNX_BUILD_TESTS : OFF
-- ONNX_BUILD_BENCHMARKS : OFF
-- ONNXIFI_DUMMY_BACKEND : OFF
-- ONNXIFI_ENABLE_EXT : OFF
--
-- Protobuf compiler :
-- Protobuf includes :
-- Protobuf libraries :
-- BUILD_ONNX_PYTHON : OFF
-- The name openvino::pp is an ALIAS for openvino_preprocessor. It will be exported to the InferenceEngineDeveloperPackage with the original name.
-- The name openvino::itt is an ALIAS for itt. It will be exported to the InferenceEngineDeveloperPackage with the original name.
-- The name openvino::conditional_compilation is an ALIAS for conditional_compilation. It will be exported to the InferenceEngineDeveloperPackage with the original name.
-- The name ngraph::builder is an ALIAS for ngraph_builders. It will be exported to the InferenceEngineDeveloperPackage with the original name.
-- The name ngraph::reference is an ALIAS for ngraph_reference. It will be exported to the InferenceEngineDeveloperPackage with the original name.
-- nGraph unit tests disabled
-- pybind11 v2.8.0 dev1
-- Python version=python3.8
-- TBB: /work/6686_openvino/openvino/inference-engine/temp/tbb
-- GPU support is disabled
-- Primitive cache is disabled
-- Static tbbbind_2_4 package was found
-- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.8.so (found suitable version "3.8.10", minimum required is "3")
-- Found Cython version 0.29.24
CMake Warning at inference-engine/samples/common/format_reader/CMakeLists.txt:21 (message):
OPENCV is disabled or not found, format_reader will be built without OPENCV
support
-- Register template_plugin to be built in build-modules/template_plugin
-- Found PythonInterp: /usr/bin/python3 (found suitable version "3.8.10", minimum required is "3")
-- Configuring done
-- Generating done
-- Build files have been written to: /work/6686_openvino/openvino/build and the directory structure of the output has changed, like this: install_dependencies python runtime samples setupvars.sh tools The previous directory structure looked like this: bin deployment_tools inference_engine licensing python
data_processing documentation install_dependencies opencv The new directory structure was problematic when I recompiled the Model Server,what should I do ? |
maybe I should use model server develop branch. |
@dmitry-gorokhov It's my fault, the width of the model is not the bottleneck of OpenVino, the root problem is FC, if you have a lot of FC performance will degrade a lot. |
@dmitry-gorokhov Which file is the operator implementation of FC in? I want to try and optimize it. |
@zhaohb Just following up on this discussion, is this something you are still working on? |
@jgespino yes, I am trying, but I also want some help, how to optimize FC, I do not have a particularly good method now. |
@dmitry-gorokhov @zhaohb Could you provide some guidance on possible approach to optimize FC? |
@zhaohb Apologies for the delay in our response. Could you please grant me access to the original model that was converted to IR format? Is it included in the link below? https://drive.google.com/drive/folders/10FfO_AgJtJMJx5bcSEd-p0S6oeDWI1k-?usp=sharing |
Yes, of course. I'll send [email protected] mailbox is whitelisted to access the model file. |
@zhaohb Received the invite, thank you! I don't see the original |
@jgespino yes, It can be shared. I've uploaded it. |
@zhaohb Thanks! Yes, I want to test it on the latest OpenVINO release and see if the performance improved. I'll need to find a system with a processor similar to Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz. We have a pre-release version of OpenVINO 2022.2 release on PyPI in case you want to try it from your side as well. Regards, |
@jgespino What is the optimization of Matmul/GEMM in 2022.2 compared to the previous version? I have tested 2022.1, but there is no improvement compared to the previous version. |
@zhaohb I don't have access to same Xeon Skylake processor as you do. But testing on Icelake Intel® Xeon® Platinum 8368 CPU I can see some improvement in FullyConnected layers between OpenVINO versions (2021.4.1 vs 2022.2). This test was using the model you have shared with us. In 2022.2 release 5 of 19 FullyConnected layers run as brgemm_avx512_FP32 and 14 of 19 as jit_gemm_FP32, whereas in 2021.4.1 release all 19 FC layers execute as jit_gemm_FP32. Cumulative time, roughly, for all FullyConnected layers in 2021.4.1 is 11.72 ms whereas in 2022.2 is 1.40e-5 ms. Not sure this type of improvement is expected in your environment/configuration, but might be worthwhile trying it out with 2022.2. Note in the table below jit_gemm_FP32;1.215; represents exec_type and exec_time in ms.
|
Closing this, I hope previous responses were sufficient to help you proceed. Feel free to reopen and ask additional questions related to this topic. |
Hi @avitial , |
Hi @akote123. |
@dmitry-gorokhov Continuing on @akote123 question, is ACL used through oneDNN or independently? I noticed that nodes of type FullyConnected are not being executed through oneDNN (calls to those nodes don't show up in ONEDNN_VERBOSE logs) but MatMul nodes are. |
Depends on the operations. But for Convolution, Matmul and FullyConnected we use OneDNN which fallbacks on ACL internally. This actually gives us an ability to leverage SVE kernels as well: https://github.com/openvinotoolkit/oneDNN/blob/v3.3_for_ie_master/src/cpu/cpu_convolution_list.cpp#L118-L120 Cannot say for sure why FC is not visible in VERBOSE_LOG. |
@dmitry-gorokhov I see... I should probably give a bigger picture of what I'm trying to do. I've been trying to run this script for LLaMA-2 from openvino.genai and determine the backend path followed for aarch64. I found an issue on the same repo which demonstrated how to collect profiling information with the primitives used for each operation during inference. I transferred that script to For aarch64, I observed that FullyConnected layers fell back to a reference implementation (ref_any_f16 primitive) while MatMul layers used GEMM kernels from ACL. When I override this function all FullyConnected layers remained as MatMul layers and were executed using A bit more investigation into the ONEDNN_VERBOSE logs revealed that matmuls in benchDNN were using the blocked implementation ( My question: Do you know why Please note:
|
@NishantPrabhuFujitsu The issue with FC (which fallback on ref impl) was caused by some bug on ACL side. The team already shared related patches with us and @alvoron incorporated them into OV runtime: openvinotoolkit/openvino.genai#438 (comment). So I would expect with that custom OV version correct OneDNN/ACL impls to be chosen. |
@dmitry-gorokhov Thank you for providing clarity on how MatMul and FullyConnected nodes work. I tried out the patches shipped by @alvoron, and the issue I was facing has been resolved. Thanks again for your support. |
System information (version)
now I have a model, and I've pressed it with benchmark and found that the FullyConnected layer takes a lot of time,about 25% of the total inference time:
In this link https://toscode.gitee.com/vinsonSpace/openvino/blob/master/build-instruction.md saw that GEMM can be accelerated through openblas or MKL:
I want to use mkl:
but I am reminded that this GEMM macro is not available.
openvino doesn't support this macro?How to accelerate FullyConnected layer?
The text was updated successfully, but these errors were encountered: