From 7cde80d58d38ac16f20313a8a79a845051c7ad84 Mon Sep 17 00:00:00 2001
From: "man.lu" <man.lu@sophgo.com>
Date: Fri, 23 Feb 2024 18:36:48 +0800
Subject: [PATCH] [doc] update release note

Change-Id: I435b1959a4a8ce4025e6c24564b94b9af62c5740
---
 build.sh                                      |  1 +
 .../source_en/03_user_interface.rst           | 17 +++++++++------
 .../source_zh/03_user_interface.rst           |  2 +-
 docs/quick_start/source_en/00_disclaimer.rst  | 12 +++++++++++
 .../quick_start/source_en/07_quantization.rst | 21 ++++++++++++++-----
 .../source_en/Appx.01_to_onnx_convert.rst     |  3 ++-
 docs/quick_start/source_zh/00_disclaimer.rst  | 12 +++++++++++
 docs/quick_start/source_zh/03_onnx.rst        | 11 +++++-----
 docs/quick_start/source_zh/06_tflite.rst      |  2 +-
 .../quick_start/source_zh/07_quantization.rst | 17 +++++++++------
 .../source_zh/Appx.01_to_onnx_convert.rst     |  6 ++++--
 .../source_zh/Appx.02_cv18xx_guide.rst        | 11 +++++-----
 12 files changed, 83 insertions(+), 32 deletions(-)

diff --git a/build.sh b/build.sh
index 597478932..055581fcb 100755
--- a/build.sh
+++ b/build.sh
@@ -29,6 +29,7 @@ function config_release()
     cmake -G Ninja \
           -B ${BUILD_PATH} \
           -DCMAKE_C_COMPILER=clang \
+          -DCMAKE_BUILD_TYPE="" \
           -DCMAKE_CXX_COMPILER=clang++ \
           -DCMAKE_CXX_FLAGS=-O2 \
           -DTPUMLIR_USE_LLD=ON \
diff --git a/docs/developer_manual/source_en/03_user_interface.rst b/docs/developer_manual/source_en/03_user_interface.rst
index 1f4a3e8ce..f1f04ecbf 100644
--- a/docs/developer_manual/source_en/03_user_interface.rst
+++ b/docs/developer_manual/source_en/03_user_interface.rst
@@ -375,7 +375,7 @@ Convert the mlir file into the corresponding model, the parameters are as follow
    * - core
      - N
      - When the target is selected as bm1688 or cv186x, it is used to select the number of tpu cores for parallel computing, and the default setting is 1 tpu core
-    * - asymmetric
+   * - asymmetric
      - N
      - Do INT8 asymmetric quantization
    * - dynamic
@@ -592,15 +592,20 @@ Supported functions:
      - The input mlir file name (including path)
    * - img
      - N
-     - Used for CV tasks to generate random images, otherwise generate npz files. The default image value range is [0,255], the data type is 'uint8', and cannot be changed.
+     - Used for CV tasks to generate random images, otherwise generate npz
+       files. The default image value range is [0,255], the data type is
+       'uint8', and cannot be changed.
    * - ranges
      - N
-     - Set the value ranges of the model inputs, expressed in list form, such as [[0,300],[0,0]]. If you want to generate a picture, you do not need to specify the value range, the default is [0,255].
-     In other cases, value ranges need to be specified.
+     - Set the value ranges of the model inputs, expressed in list form, such as
+       [[0,300],[0,0]]. If you want to generate a picture, you do not need to
+       specify the value range, the default is [0,255]. In other cases, value ranges need to be specified.
    * - input_types
      - N
-     - Set the model input types, such as 'si32,f32'. 'si32' and 'f32' types are supported. False by default, and it will be read from mlir. If you generate an image
-     , you do not need to specify the data type, the default is 'uint8'.
+     - Set the model input types, such as 'si32,f32'. 'si32' and 'f32' types are
+       supported. False by default, and it will be read from mlir. If you
+       generate an image, you do not need to specify the data type, the default
+       is 'uint8'.
    * - output
      - Y
      - The names of the output.
diff --git a/docs/developer_manual/source_zh/03_user_interface.rst b/docs/developer_manual/source_zh/03_user_interface.rst
index bbd5b6388..e51b84156 100644
--- a/docs/developer_manual/source_zh/03_user_interface.rst
+++ b/docs/developer_manual/source_zh/03_user_interface.rst
@@ -463,7 +463,7 @@ model_deploy.py
    * - core
      - 否
      - 当target选择为bm1688或cv186x时,用于选择并行计算的tpu核心数量,默认设置为1个tpu核心
-    * - asymmetric
+   * - asymmetric
      - 否
      - 指定做int8非对称量化
    * - dynamic
diff --git a/docs/quick_start/source_en/00_disclaimer.rst b/docs/quick_start/source_en/00_disclaimer.rst
index 406acb9c6..2c1151585 100644
--- a/docs/quick_start/source_en/00_disclaimer.rst
+++ b/docs/quick_start/source_en/00_disclaimer.rst
@@ -34,6 +34,18 @@
    * - Version
      - Release date
      - Explanation
+   * - v1.6.0
+     - 2024.02.23
+     - Added PyPI release form;
+       Supports user-defined Global operators;
+       Support for the CV186X processor platform
+   * - v1.5.0
+     - 2023.11.03
+     - Enhanced Global Layer support for multicore parallelism;
+   * - v1.4.0
+     - 2023.09.27
+     - System dependencies upgraded to Ubuntu 22.04;
+       Supported BM1684 Winograd
    * - v1.3.0
      - 2023.07.27
      - Add the function to manually specify operations computing with floating-point;
diff --git a/docs/quick_start/source_en/07_quantization.rst b/docs/quick_start/source_en/07_quantization.rst
index 6c7bd72bb..fe8cba7b0 100644
--- a/docs/quick_start/source_en/07_quantization.rst
+++ b/docs/quick_start/source_en/07_quantization.rst
@@ -596,11 +596,22 @@ Also a log file named ``SensitiveLayerSearch`` is generated, its context is as b
     INFO:root:outputs_cos_los = 0.008808857469573828
     INFO:root:layer input3.1, layer type is top.Conv, best_th = 2.6186381997094728, best_method = KL, best_cos_loss = 0.008808857469573828
 
-This log file records the cosine losses between the outputs of mix model and float model when setting each op to int8 with different quantize methods(MAX/Percentile9999/KL).
-It also contains the loss information printed in the screen and the cosine similarity of mix model and float model.
-The qtable generated by this program can be modified according to the loss information.
-The best thresholds of each op are recorded in a new cali table named new_cali_table. This table is restored in current workspace and need to be used when generating mix model.
-In this example, the loss of input3.1 is larger than other ops, thus you can only set input3.1 as float in qtable.
+The log file records the threshold obtained for each operation under different
+quantization methods (MAX/Percentile9999/KL) and provides the loss of similarity
+(1 - cosine similarity) between the mixed-precision model using only the
+corresponding threshold for that operation in int8 computation and the original
+float model. It also includes the loss information of each operation output on
+the screen side and the cosine similarity between the final mixed-precision
+model and the original float model. Users can use the qtable output by the
+program, or modify the qtable based on the loss information, and then generate
+the mixed-precision model. After the search for sensitive layers is finished,
+the optimal threshold will be updated to a new quantization table
+'new_cali_table.txt', stored in the current project directory, which needs to be
+called when generating the mixed-precision model. In this case, based on the
+output loss information, it was observed that the loss of input3.1 is much
+higher than that of other operations, which can be set to FP32 only in the
+qtable.
+
 
 Step 2: Gen mix precision model
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/docs/quick_start/source_en/Appx.01_to_onnx_convert.rst b/docs/quick_start/source_en/Appx.01_to_onnx_convert.rst
index b96bfd068..6670897a1 100644
--- a/docs/quick_start/source_en/Appx.01_to_onnx_convert.rst
+++ b/docs/quick_start/source_en/Appx.01_to_onnx_convert.rst
@@ -123,8 +123,10 @@ This section requires additional installation of openssl-1.1.1o (ubuntu 22.04 pr
 
 Step 0: Install openssl-1.1.1o
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 .. code-block:: shell
    :linenos:
+
    wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
    sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
 
@@ -183,4 +185,3 @@ Install the paddle2onnx tool through the following commands, and use this tool t
              --save_file squeezenet1_1.onnx
 
 After running all the above commands we will get an onnx model named squeezenet1_1.onnx.
-
diff --git a/docs/quick_start/source_zh/00_disclaimer.rst b/docs/quick_start/source_zh/00_disclaimer.rst
index a058af3d7..1c12c2ab1 100644
--- a/docs/quick_start/source_zh/00_disclaimer.rst
+++ b/docs/quick_start/source_zh/00_disclaimer.rst
@@ -35,6 +35,18 @@
    * - 版本
      - 发布日期
      - 说明
+   * - v1.6.0
+     - 2024.02.23
+     - 添加了Pypi发布形式;
+       支持用户自定义Global算子;
+       支持了CV186X处理器平台
+   * - v1.5.0
+     - 2023.11.03
+     - 更多Global Layer支持多核并行;
+   * - v1.4.0
+     - 2023.09.27
+     - 系统依赖升级到Ubuntu22.04;
+       支持了BM1684 Winograd
    * - v1.3.0
      - 2023.07.27
      - 增加手动指定浮点运算区域功能;
diff --git a/docs/quick_start/source_zh/03_onnx.rst b/docs/quick_start/source_zh/03_onnx.rst
index 57fb259c2..de1751b13 100644
--- a/docs/quick_start/source_zh/03_onnx.rst
+++ b/docs/quick_start/source_zh/03_onnx.rst
@@ -65,7 +65,7 @@ ONNX转MLIR
    y = (x - mean) \times scale
 
 
-官网yolov5的图片是rgb, 每个值会乘以 ``1/255`` , 转换成mean和scale对应为
+官网yolov5的图片是rgb格式, 每个值会乘以 ``1/255`` , 转换成mean和scale对应为
 ``0.0,0.0,0.0`` 和 ``0.0039216,0.0039216,0.0039216`` 。
 
 模型转换命令如下:
@@ -124,7 +124,7 @@ ONNX转MLIR
      - 图片每个通道的比值, 默认为1.0,1.0,1.0
    * - pixel_format
      - 否
-     - 图片类型, 可以是rgb、bgr、gray、rgbd四种情况, 默认为bgr
+     - 图片类型, 可以是rgb、bgr、gray、rgbd四种格式, 默认为bgr
    * - channel_format
      - 否
      - 通道类型, 对于图片输入可以是nhwc或nchw, 非图片输入则为none, 默认为nchw
@@ -133,7 +133,7 @@ ONNX转MLIR
      - 指定输出的名称, 如果不指定, 则用模型的输出; 指定后用该指定名称做输出
    * - test_input
      - 否
-     - 指定输入文件用于验证, 可以是图片或npy或npz; 可以不指定, 则不会正确性验证
+     - 指定输入文件用于验证, 可以是图片或npy或npz; 可以不指定, 则不会进行正确性验证
    * - test_result
      - 否
      - 指定验证后的输出文件
@@ -183,7 +183,8 @@ MLIR转F16模型
      - 指定默认量化类型, 支持F32/F16/BF16/INT8
    * - processor
      - 是
-     - 指定模型将要用到的平台, 支持bm1688/bm1684x/bm1684/cv186x/cv183x/cv182x/cv181x/cv180x
+     - 指定模型将要用到的平台,
+       支持bm1688, bm1684x, bm1684, cv186x, cv183x, cv182x, cv181x, cv180x
    * - calibration_table
      - 否
      - 指定校准表路径, 当存在INT8量化的时候需要校准表
@@ -192,7 +193,7 @@ MLIR转F16模型
      - 表示 MLIR 量化后的结果与 MLIR fp32推理结果相似度的误差容忍度
    * - test_input
      - 否
-     - 指定输入文件用于验证, 可以是图片或npy或npz; 可以不指定, 则不会正确性验证
+     - 指定输入文件用于验证, 可以是图片或npy或npz; 可以不指定, 则不会进行正确性验证
    * - test_reference
      - 否
      - 用于验证模型正确性的参考数据(使用npz格式)。其为各算子的计算结果
diff --git a/docs/quick_start/source_zh/06_tflite.rst b/docs/quick_start/source_zh/06_tflite.rst
index 7edb6a7f7..7f1bfe3de 100644
--- a/docs/quick_start/source_zh/06_tflite.rst
+++ b/docs/quick_start/source_zh/06_tflite.rst
@@ -55,7 +55,7 @@ TFLite转MLIR
 转成mlir文件后, 会生成一个 ``mobilebert_tf_in_f32.npz`` 文件, 该文件是模型的输入文件。
 
 
-MLIR转模型
+MLIR转INT8模型
 ------------------
 
 该模型是tflite int8模型, 可以按如下参数转成模型:
diff --git a/docs/quick_start/source_zh/07_quantization.rst b/docs/quick_start/source_zh/07_quantization.rst
index 2444fc88c..6529b7db4 100644
--- a/docs/quick_start/source_zh/07_quantization.rst
+++ b/docs/quick_start/source_zh/07_quantization.rst
@@ -177,7 +177,8 @@
      - 输入校准表
    * - processor
      - 是
-     - 指定模型将要用到的平台, 支持bm1688/bm1684x/bm1684/cv186x/cv183x/cv182x/cv181x/cv180x
+     - 指定模型将要用到的平台,
+       支持bm1688, bm1684x, bm1684, cv186x, cv183x, cv182x, cv181x, cv180x
    * - fp_type
      - 否
      - 指定混精度使用的float类型, 支持auto,F16,F32,BF16，默认为auto，表示由程序内部自动选择
@@ -482,7 +483,8 @@ INT8对称量化模型：
      - 输入校准表
    * - processor
      - 是
-     - 指定模型将要用到的平台, 支持bm1688/bm1684x/bm1684/cv186x/cv183x/cv182x/cv181x/cv180x
+     - 指定模型将要用到的平台,
+       支持bm1688, bm1684x, bm1684, cv186x, cv183x, cv182x, cv181x, cv180x
    * - fp_type
      - 否
      - 指定混精度使用的float类型, 支持auto,F16,F32,BF16，默认为auto，表示由程序内部自动选择
@@ -599,10 +601,12 @@ INT8对称量化模型：
     INFO:root:layer input3.1, layer type is top.Conv, best_th = 2.6186381997094728, best_method = KL, best_cos_loss = 0.008808857469573828
 
 
-日志文件记录了每个op在每种量化方法（MAX/Percentile9999/KL）得到的threshold下，设置为int8后，混精度模型与原始float模型输出的相似度的loss（1-余弦相似度）。
-同时也包含了屏幕端输出的每个op的loss信息以及最后的混精度模型与原始float模型的余弦相似度。
+日志文件记录了每个Op在不同量化方法（MAX/Percentile9999/KL）下得到的threshold，同
+时给出了在只对该Op使用对应threshold做int8计算后的混精度模型与原始float模型输出的相似度的loss（1-余弦相似度）。
+此外，日志还包含了屏幕端输出的每个op的loss信息以及最后的混精度模型与原始float模型的余弦相似度。
 用户可以使用程序输出的qtable，也可以根据loss信息对qtable进行修改，然后生成混精度模型。
-在敏感层搜索结束后，最优的threshold会被更新到一个新的量化表new_cali_table.txt，该量化表存储在当前工程目录下，在生成混精度模型时需要调用新量化表。
+在敏感层搜索结束后，最优的threshold会被更新到一个新的量化表new_cali_table.txt，
+该量化表存储在当前工程目录下，在生成混精度模型时需要调用新量化表。
 在本例中，根据输出的loss信息，观察到input3.1的loss比其他op高很多，可以在qtable中只设置input3.1为FP32。
 
 第二步: 生成混精度量化模型
@@ -742,7 +746,8 @@ INT8模型mAP为： 34.70%
      - 指定起点和终点之间的层不执行量化，起点和终点之间用:间隔，多个block之间用空格间隔
    * - processor
      - 是
-     - 指定模型将要用到的平台, 支持bm1688/bm1684x/bm1684/cv186x/cv183x/cv182x/cv181x/cv180x
+     - 指定模型将要用到的平台,
+       支持bm1688, bm1684x, bm1684, cv186x, cv183x, cv182x, cv181x, cv180x
    * - fp_type
      - 否
      - 指定混精度使用的float类型, 支持auto,F16,F32,BF16，默认为auto，表示由程序内部自动选择
diff --git a/docs/quick_start/source_zh/Appx.01_to_onnx_convert.rst b/docs/quick_start/source_zh/Appx.01_to_onnx_convert.rst
index 6ead4e07c..0056c5229 100644
--- a/docs/quick_start/source_zh/Appx.01_to_onnx_convert.rst
+++ b/docs/quick_start/source_zh/Appx.01_to_onnx_convert.rst
@@ -122,12 +122,15 @@ PaddlePaddle模型转ONNX
 本节需要额外安装openssl-1.1.1o（ubuntu 22.04默认提供openssl-3.0.2）。
 
 步骤0：安装openssl-1.1.1o
-~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
 .. code-block:: shell
    :linenos:
+
    wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
    sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
 
+
 如果上述链接失效，请参考 http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/?C=M;O=D 更换有效链接.
 
 步骤1：创建工作目录
@@ -183,4 +186,3 @@ PaddlePaddle模型转ONNX
              --save_file squeezenet1_1.onnx
 
 运行完以上所有命令后我们将获得一个名为squeezenet1_1.onnx的onnx模型。
-
diff --git a/docs/quick_start/source_zh/Appx.02_cv18xx_guide.rst b/docs/quick_start/source_zh/Appx.02_cv18xx_guide.rst
index 1bf21349b..f8859d913 100644
--- a/docs/quick_start/source_zh/Appx.02_cv18xx_guide.rst
+++ b/docs/quick_start/source_zh/Appx.02_cv18xx_guide.rst
@@ -336,8 +336,8 @@ INT8 cvimodel的执行方式如下, 得到 ``dog_int8.jpg`` :
 
 需要如下文件:
 
-* cvitek_tpu_sdk_[cv186x|cv183x|cv182x|cv182x_uclibc|cv181x_glibc32|cv181x_musl_riscv64_rvv|cv180x_musl_riscv64_rvv|cv181x_glibc_riscv64].tar.gz
-* cvimodel_samples_[cv186x|cv183x|cv182x|cv181x|cv180x].tar.gz
+* cvitek_tpu_sdk_[cv186x | cv183x | cv182x | cv182x_uclibc | cv181x_glibc32 | cv181x_musl_riscv64_rvv | cv180x_musl_riscv64_rvv | cv181x_glibc_riscv64].tar.gz
+* cvimodel_samples_[cv186x | cv183x | cv182x | cv181x | cv180x].tar.gz
 
 将根据处理器类型选择所需文件加载至EVB的文件系统,于evb上的linux console执行,以cv183x为例:
 
@@ -463,7 +463,7 @@ INT8 cvimodel的执行方式如下, 得到 ``dog_int8.jpg`` :
 
 本节需要如下文件:
 
-* cvitek_tpu_sdk_[cv186x|cv183x|cv182x|cv182x_uclibc|cv181x_glibc32|cv181x_musl_riscv64_rvv|cv180x_musl_riscv64_rvv].tar.gz
+* cvitek_tpu_sdk_[cv186x | cv183x | cv182x | cv182x_uclibc | cv181x_glibc32 | cv181x_musl_riscv64_rvv | cv180x_musl_riscv64_rvv].tar.gz
 * cvitek_tpu_samples.tar.gz
 
 aarch 64位  (如cv183x aarch64位平台)
@@ -836,7 +836,7 @@ FAQ
 支持多线程, 但是多个模型在深度学习处理器上推理时是串行进行的。
 
 5 填充input tensor相关接口区别
-```````````````````````````````
+``````````````````````````````
 
 ``CVI_NN_SetTensorPtr`` : 设置input tensor的虚拟地址，原本的tensor 内存不会释放。推理时从用户设置的虚拟地址 **拷贝数据** 到原本的tensor内存上。
 
@@ -844,7 +844,8 @@ FAQ
 
 ``CVI_NN_SetTensorWithVideoFrame`` : 通过VideoFrame结构体来填充Input Tensor。注意VideoFrame的地址为物理地址。如果转模型设置 ``--fuse_preprocess --aligned_input`` ，则等同于 ``CVI_NN_SetTensorPhysicalAddr`` ，否则会将VideoFrame的数据拷贝到Input Tensor。
 
-``CVI_NN_SetTensorWithAlignedFrames`` : 支持多batch，与 ``CVI_NN_SetTensorWithVideoFrame`` 类似。
+``CVI_NN_SetTensorWithAlignedFrames`` : 与 ``CVI_NN_SetTensorWithVideoFrame`` 类
+似, 支持多batch。
 
 ``CVI_NN_FeedTensorWithFrames`` : 与 ``CVI_NN_SetTensorWithVideoFrame`` 类似。