From 7cde80d58d38ac16f20313a8a79a845051c7ad84 Mon Sep 17 00:00:00 2001 From: "man.lu" Date: Fri, 23 Feb 2024 18:36:48 +0800 Subject: [PATCH] [doc] update release note Change-Id: I435b1959a4a8ce4025e6c24564b94b9af62c5740 --- build.sh | 1 + .../source_en/03_user_interface.rst | 17 +++++++++------ .../source_zh/03_user_interface.rst | 2 +- docs/quick_start/source_en/00_disclaimer.rst | 12 +++++++++++ .../quick_start/source_en/07_quantization.rst | 21 ++++++++++++++----- .../source_en/Appx.01_to_onnx_convert.rst | 3 ++- docs/quick_start/source_zh/00_disclaimer.rst | 12 +++++++++++ docs/quick_start/source_zh/03_onnx.rst | 11 +++++----- docs/quick_start/source_zh/06_tflite.rst | 2 +- .../quick_start/source_zh/07_quantization.rst | 17 +++++++++------ .../source_zh/Appx.01_to_onnx_convert.rst | 6 ++++-- .../source_zh/Appx.02_cv18xx_guide.rst | 11 +++++----- 12 files changed, 83 insertions(+), 32 deletions(-) diff --git a/build.sh b/build.sh index 597478932..055581fcb 100755 --- a/build.sh +++ b/build.sh @@ -29,6 +29,7 @@ function config_release() cmake -G Ninja \ -B ${BUILD_PATH} \ -DCMAKE_C_COMPILER=clang \ + -DCMAKE_BUILD_TYPE="" \ -DCMAKE_CXX_COMPILER=clang++ \ -DCMAKE_CXX_FLAGS=-O2 \ -DTPUMLIR_USE_LLD=ON \ diff --git a/docs/developer_manual/source_en/03_user_interface.rst b/docs/developer_manual/source_en/03_user_interface.rst index 1f4a3e8ce..f1f04ecbf 100644 --- a/docs/developer_manual/source_en/03_user_interface.rst +++ b/docs/developer_manual/source_en/03_user_interface.rst @@ -375,7 +375,7 @@ Convert the mlir file into the corresponding model, the parameters are as follow * - core - N - When the target is selected as bm1688 or cv186x, it is used to select the number of tpu cores for parallel computing, and the default setting is 1 tpu core - * - asymmetric + * - asymmetric - N - Do INT8 asymmetric quantization * - dynamic @@ -592,15 +592,20 @@ Supported functions: - The input mlir file name (including path) * - img - N - - Used for CV tasks to generate random images, otherwise generate npz files. The default image value range is [0,255], the data type is 'uint8', and cannot be changed. + - Used for CV tasks to generate random images, otherwise generate npz + files. The default image value range is [0,255], the data type is + 'uint8', and cannot be changed. * - ranges - N - - Set the value ranges of the model inputs, expressed in list form, such as [[0,300],[0,0]]. If you want to generate a picture, you do not need to specify the value range, the default is [0,255]. - In other cases, value ranges need to be specified. + - Set the value ranges of the model inputs, expressed in list form, such as + [[0,300],[0,0]]. If you want to generate a picture, you do not need to + specify the value range, the default is [0,255]. In other cases, value ranges need to be specified. * - input_types - N - - Set the model input types, such as 'si32,f32'. 'si32' and 'f32' types are supported. False by default, and it will be read from mlir. If you generate an image - , you do not need to specify the data type, the default is 'uint8'. + - Set the model input types, such as 'si32,f32'. 'si32' and 'f32' types are + supported. False by default, and it will be read from mlir. If you + generate an image, you do not need to specify the data type, the default + is 'uint8'. * - output - Y - The names of the output. diff --git a/docs/developer_manual/source_zh/03_user_interface.rst b/docs/developer_manual/source_zh/03_user_interface.rst index bbd5b6388..e51b84156 100644 --- a/docs/developer_manual/source_zh/03_user_interface.rst +++ b/docs/developer_manual/source_zh/03_user_interface.rst @@ -463,7 +463,7 @@ model_deploy.py * - core - 否 - 当target选择为bm1688或cv186x时,用于选择并行计算的tpu核心数量,默认设置为1个tpu核心 - * - asymmetric + * - asymmetric - 否 - 指定做int8非对称量化 * - dynamic diff --git a/docs/quick_start/source_en/00_disclaimer.rst b/docs/quick_start/source_en/00_disclaimer.rst index 406acb9c6..2c1151585 100644 --- a/docs/quick_start/source_en/00_disclaimer.rst +++ b/docs/quick_start/source_en/00_disclaimer.rst @@ -34,6 +34,18 @@ * - Version - Release date - Explanation + * - v1.6.0 + - 2024.02.23 + - Added PyPI release form; + Supports user-defined Global operators; + Support for the CV186X processor platform + * - v1.5.0 + - 2023.11.03 + - Enhanced Global Layer support for multicore parallelism; + * - v1.4.0 + - 2023.09.27 + - System dependencies upgraded to Ubuntu 22.04; + Supported BM1684 Winograd * - v1.3.0 - 2023.07.27 - Add the function to manually specify operations computing with floating-point; diff --git a/docs/quick_start/source_en/07_quantization.rst b/docs/quick_start/source_en/07_quantization.rst index 6c7bd72bb..fe8cba7b0 100644 --- a/docs/quick_start/source_en/07_quantization.rst +++ b/docs/quick_start/source_en/07_quantization.rst @@ -596,11 +596,22 @@ Also a log file named ``SensitiveLayerSearch`` is generated, its context is as b INFO:root:outputs_cos_los = 0.008808857469573828 INFO:root:layer input3.1, layer type is top.Conv, best_th = 2.6186381997094728, best_method = KL, best_cos_loss = 0.008808857469573828 -This log file records the cosine losses between the outputs of mix model and float model when setting each op to int8 with different quantize methods(MAX/Percentile9999/KL). -It also contains the loss information printed in the screen and the cosine similarity of mix model and float model. -The qtable generated by this program can be modified according to the loss information. -The best thresholds of each op are recorded in a new cali table named new_cali_table. This table is restored in current workspace and need to be used when generating mix model. -In this example, the loss of input3.1 is larger than other ops, thus you can only set input3.1 as float in qtable. +The log file records the threshold obtained for each operation under different +quantization methods (MAX/Percentile9999/KL) and provides the loss of similarity +(1 - cosine similarity) between the mixed-precision model using only the +corresponding threshold for that operation in int8 computation and the original +float model. It also includes the loss information of each operation output on +the screen side and the cosine similarity between the final mixed-precision +model and the original float model. Users can use the qtable output by the +program, or modify the qtable based on the loss information, and then generate +the mixed-precision model. After the search for sensitive layers is finished, +the optimal threshold will be updated to a new quantization table +'new_cali_table.txt', stored in the current project directory, which needs to be +called when generating the mixed-precision model. In this case, based on the +output loss information, it was observed that the loss of input3.1 is much +higher than that of other operations, which can be set to FP32 only in the +qtable. + Step 2: Gen mix precision model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/quick_start/source_en/Appx.01_to_onnx_convert.rst b/docs/quick_start/source_en/Appx.01_to_onnx_convert.rst index b96bfd068..6670897a1 100644 --- a/docs/quick_start/source_en/Appx.01_to_onnx_convert.rst +++ b/docs/quick_start/source_en/Appx.01_to_onnx_convert.rst @@ -123,8 +123,10 @@ This section requires additional installation of openssl-1.1.1o (ubuntu 22.04 pr Step 0: Install openssl-1.1.1o ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + .. code-block:: shell :linenos: + wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb @@ -183,4 +185,3 @@ Install the paddle2onnx tool through the following commands, and use this tool t --save_file squeezenet1_1.onnx After running all the above commands we will get an onnx model named squeezenet1_1.onnx. - diff --git a/docs/quick_start/source_zh/00_disclaimer.rst b/docs/quick_start/source_zh/00_disclaimer.rst index a058af3d7..1c12c2ab1 100644 --- a/docs/quick_start/source_zh/00_disclaimer.rst +++ b/docs/quick_start/source_zh/00_disclaimer.rst @@ -35,6 +35,18 @@ * - 版本 - 发布日期 - 说明 + * - v1.6.0 + - 2024.02.23 + - 添加了Pypi发布形式; + 支持用户自定义Global算子; + 支持了CV186X处理器平台 + * - v1.5.0 + - 2023.11.03 + - 更多Global Layer支持多核并行; + * - v1.4.0 + - 2023.09.27 + - 系统依赖升级到Ubuntu22.04; + 支持了BM1684 Winograd * - v1.3.0 - 2023.07.27 - 增加手动指定浮点运算区域功能; diff --git a/docs/quick_start/source_zh/03_onnx.rst b/docs/quick_start/source_zh/03_onnx.rst index 57fb259c2..de1751b13 100644 --- a/docs/quick_start/source_zh/03_onnx.rst +++ b/docs/quick_start/source_zh/03_onnx.rst @@ -65,7 +65,7 @@ ONNX转MLIR y = (x - mean) \times scale -官网yolov5的图片是rgb, 每个值会乘以 ``1/255`` , 转换成mean和scale对应为 +官网yolov5的图片是rgb格式, 每个值会乘以 ``1/255`` , 转换成mean和scale对应为 ``0.0,0.0,0.0`` 和 ``0.0039216,0.0039216,0.0039216`` 。 模型转换命令如下: @@ -124,7 +124,7 @@ ONNX转MLIR - 图片每个通道的比值, 默认为1.0,1.0,1.0 * - pixel_format - 否 - - 图片类型, 可以是rgb、bgr、gray、rgbd四种情况, 默认为bgr + - 图片类型, 可以是rgb、bgr、gray、rgbd四种格式, 默认为bgr * - channel_format - 否 - 通道类型, 对于图片输入可以是nhwc或nchw, 非图片输入则为none, 默认为nchw @@ -133,7 +133,7 @@ ONNX转MLIR - 指定输出的名称, 如果不指定, 则用模型的输出; 指定后用该指定名称做输出 * - test_input - 否 - - 指定输入文件用于验证, 可以是图片或npy或npz; 可以不指定, 则不会正确性验证 + - 指定输入文件用于验证, 可以是图片或npy或npz; 可以不指定, 则不会进行正确性验证 * - test_result - 否 - 指定验证后的输出文件 @@ -183,7 +183,8 @@ MLIR转F16模型 - 指定默认量化类型, 支持F32/F16/BF16/INT8 * - processor - 是 - - 指定模型将要用到的平台, 支持bm1688/bm1684x/bm1684/cv186x/cv183x/cv182x/cv181x/cv180x + - 指定模型将要用到的平台, + 支持bm1688, bm1684x, bm1684, cv186x, cv183x, cv182x, cv181x, cv180x * - calibration_table - 否 - 指定校准表路径, 当存在INT8量化的时候需要校准表 @@ -192,7 +193,7 @@ MLIR转F16模型 - 表示 MLIR 量化后的结果与 MLIR fp32推理结果相似度的误差容忍度 * - test_input - 否 - - 指定输入文件用于验证, 可以是图片或npy或npz; 可以不指定, 则不会正确性验证 + - 指定输入文件用于验证, 可以是图片或npy或npz; 可以不指定, 则不会进行正确性验证 * - test_reference - 否 - 用于验证模型正确性的参考数据(使用npz格式)。其为各算子的计算结果 diff --git a/docs/quick_start/source_zh/06_tflite.rst b/docs/quick_start/source_zh/06_tflite.rst index 7edb6a7f7..7f1bfe3de 100644 --- a/docs/quick_start/source_zh/06_tflite.rst +++ b/docs/quick_start/source_zh/06_tflite.rst @@ -55,7 +55,7 @@ TFLite转MLIR 转成mlir文件后, 会生成一个 ``mobilebert_tf_in_f32.npz`` 文件, 该文件是模型的输入文件。 -MLIR转模型 +MLIR转INT8模型 ------------------ 该模型是tflite int8模型, 可以按如下参数转成模型: diff --git a/docs/quick_start/source_zh/07_quantization.rst b/docs/quick_start/source_zh/07_quantization.rst index 2444fc88c..6529b7db4 100644 --- a/docs/quick_start/source_zh/07_quantization.rst +++ b/docs/quick_start/source_zh/07_quantization.rst @@ -177,7 +177,8 @@ - 输入校准表 * - processor - 是 - - 指定模型将要用到的平台, 支持bm1688/bm1684x/bm1684/cv186x/cv183x/cv182x/cv181x/cv180x + - 指定模型将要用到的平台, + 支持bm1688, bm1684x, bm1684, cv186x, cv183x, cv182x, cv181x, cv180x * - fp_type - 否 - 指定混精度使用的float类型, 支持auto,F16,F32,BF16,默认为auto,表示由程序内部自动选择 @@ -482,7 +483,8 @@ INT8对称量化模型: - 输入校准表 * - processor - 是 - - 指定模型将要用到的平台, 支持bm1688/bm1684x/bm1684/cv186x/cv183x/cv182x/cv181x/cv180x + - 指定模型将要用到的平台, + 支持bm1688, bm1684x, bm1684, cv186x, cv183x, cv182x, cv181x, cv180x * - fp_type - 否 - 指定混精度使用的float类型, 支持auto,F16,F32,BF16,默认为auto,表示由程序内部自动选择 @@ -599,10 +601,12 @@ INT8对称量化模型: INFO:root:layer input3.1, layer type is top.Conv, best_th = 2.6186381997094728, best_method = KL, best_cos_loss = 0.008808857469573828 -日志文件记录了每个op在每种量化方法(MAX/Percentile9999/KL)得到的threshold下,设置为int8后,混精度模型与原始float模型输出的相似度的loss(1-余弦相似度)。 -同时也包含了屏幕端输出的每个op的loss信息以及最后的混精度模型与原始float模型的余弦相似度。 +日志文件记录了每个Op在不同量化方法(MAX/Percentile9999/KL)下得到的threshold,同 +时给出了在只对该Op使用对应threshold做int8计算后的混精度模型与原始float模型输出的相似度的loss(1-余弦相似度)。 +此外,日志还包含了屏幕端输出的每个op的loss信息以及最后的混精度模型与原始float模型的余弦相似度。 用户可以使用程序输出的qtable,也可以根据loss信息对qtable进行修改,然后生成混精度模型。 -在敏感层搜索结束后,最优的threshold会被更新到一个新的量化表new_cali_table.txt,该量化表存储在当前工程目录下,在生成混精度模型时需要调用新量化表。 +在敏感层搜索结束后,最优的threshold会被更新到一个新的量化表new_cali_table.txt, +该量化表存储在当前工程目录下,在生成混精度模型时需要调用新量化表。 在本例中,根据输出的loss信息,观察到input3.1的loss比其他op高很多,可以在qtable中只设置input3.1为FP32。 第二步: 生成混精度量化模型 @@ -742,7 +746,8 @@ INT8模型mAP为: 34.70% - 指定起点和终点之间的层不执行量化,起点和终点之间用:间隔,多个block之间用空格间隔 * - processor - 是 - - 指定模型将要用到的平台, 支持bm1688/bm1684x/bm1684/cv186x/cv183x/cv182x/cv181x/cv180x + - 指定模型将要用到的平台, + 支持bm1688, bm1684x, bm1684, cv186x, cv183x, cv182x, cv181x, cv180x * - fp_type - 否 - 指定混精度使用的float类型, 支持auto,F16,F32,BF16,默认为auto,表示由程序内部自动选择 diff --git a/docs/quick_start/source_zh/Appx.01_to_onnx_convert.rst b/docs/quick_start/source_zh/Appx.01_to_onnx_convert.rst index 6ead4e07c..0056c5229 100644 --- a/docs/quick_start/source_zh/Appx.01_to_onnx_convert.rst +++ b/docs/quick_start/source_zh/Appx.01_to_onnx_convert.rst @@ -122,12 +122,15 @@ PaddlePaddle模型转ONNX 本节需要额外安装openssl-1.1.1o(ubuntu 22.04默认提供openssl-3.0.2)。 步骤0:安装openssl-1.1.1o -~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~ + .. code-block:: shell :linenos: + wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb + 如果上述链接失效,请参考 http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/?C=M;O=D 更换有效链接. 步骤1:创建工作目录 @@ -183,4 +186,3 @@ PaddlePaddle模型转ONNX --save_file squeezenet1_1.onnx 运行完以上所有命令后我们将获得一个名为squeezenet1_1.onnx的onnx模型。 - diff --git a/docs/quick_start/source_zh/Appx.02_cv18xx_guide.rst b/docs/quick_start/source_zh/Appx.02_cv18xx_guide.rst index 1bf21349b..f8859d913 100644 --- a/docs/quick_start/source_zh/Appx.02_cv18xx_guide.rst +++ b/docs/quick_start/source_zh/Appx.02_cv18xx_guide.rst @@ -336,8 +336,8 @@ INT8 cvimodel的执行方式如下, 得到 ``dog_int8.jpg`` : 需要如下文件: -* cvitek_tpu_sdk_[cv186x|cv183x|cv182x|cv182x_uclibc|cv181x_glibc32|cv181x_musl_riscv64_rvv|cv180x_musl_riscv64_rvv|cv181x_glibc_riscv64].tar.gz -* cvimodel_samples_[cv186x|cv183x|cv182x|cv181x|cv180x].tar.gz +* cvitek_tpu_sdk_[cv186x | cv183x | cv182x | cv182x_uclibc | cv181x_glibc32 | cv181x_musl_riscv64_rvv | cv180x_musl_riscv64_rvv | cv181x_glibc_riscv64].tar.gz +* cvimodel_samples_[cv186x | cv183x | cv182x | cv181x | cv180x].tar.gz 将根据处理器类型选择所需文件加载至EVB的文件系统,于evb上的linux console执行,以cv183x为例: @@ -463,7 +463,7 @@ INT8 cvimodel的执行方式如下, 得到 ``dog_int8.jpg`` : 本节需要如下文件: -* cvitek_tpu_sdk_[cv186x|cv183x|cv182x|cv182x_uclibc|cv181x_glibc32|cv181x_musl_riscv64_rvv|cv180x_musl_riscv64_rvv].tar.gz +* cvitek_tpu_sdk_[cv186x | cv183x | cv182x | cv182x_uclibc | cv181x_glibc32 | cv181x_musl_riscv64_rvv | cv180x_musl_riscv64_rvv].tar.gz * cvitek_tpu_samples.tar.gz aarch 64位 (如cv183x aarch64位平台) @@ -836,7 +836,7 @@ FAQ 支持多线程, 但是多个模型在深度学习处理器上推理时是串行进行的。 5 填充input tensor相关接口区别 -``````````````````````````````` +`````````````````````````````` ``CVI_NN_SetTensorPtr`` : 设置input tensor的虚拟地址,原本的tensor 内存不会释放。推理时从用户设置的虚拟地址 **拷贝数据** 到原本的tensor内存上。 @@ -844,7 +844,8 @@ FAQ ``CVI_NN_SetTensorWithVideoFrame`` : 通过VideoFrame结构体来填充Input Tensor。注意VideoFrame的地址为物理地址。如果转模型设置 ``--fuse_preprocess --aligned_input`` ,则等同于 ``CVI_NN_SetTensorPhysicalAddr`` ,否则会将VideoFrame的数据拷贝到Input Tensor。 -``CVI_NN_SetTensorWithAlignedFrames`` : 支持多batch,与 ``CVI_NN_SetTensorWithVideoFrame`` 类似。 +``CVI_NN_SetTensorWithAlignedFrames`` : 与 ``CVI_NN_SetTensorWithVideoFrame`` 类 +似, 支持多batch。 ``CVI_NN_FeedTensorWithFrames`` : 与 ``CVI_NN_SetTensorWithVideoFrame`` 类似。