Skip to content

Commit

Permalink
[doc] update release note
Browse files Browse the repository at this point in the history
Change-Id: I435b1959a4a8ce4025e6c24564b94b9af62c5740
  • Loading branch information
luluman committed Feb 23, 2024
1 parent ae9ec43 commit 7cde80d
Show file tree
Hide file tree
Showing 12 changed files with 83 additions and 32 deletions.
1 change: 1 addition & 0 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ function config_release()
cmake -G Ninja \
-B ${BUILD_PATH} \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_BUILD_TYPE="" \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_CXX_FLAGS=-O2 \
-DTPUMLIR_USE_LLD=ON \
Expand Down
17 changes: 11 additions & 6 deletions docs/developer_manual/source_en/03_user_interface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,7 @@ Convert the mlir file into the corresponding model, the parameters are as follow
* - core
- N
- When the target is selected as bm1688 or cv186x, it is used to select the number of tpu cores for parallel computing, and the default setting is 1 tpu core
* - asymmetric
* - asymmetric
- N
- Do INT8 asymmetric quantization
* - dynamic
Expand Down Expand Up @@ -592,15 +592,20 @@ Supported functions:
- The input mlir file name (including path)
* - img
- N
- Used for CV tasks to generate random images, otherwise generate npz files. The default image value range is [0,255], the data type is 'uint8', and cannot be changed.
- Used for CV tasks to generate random images, otherwise generate npz
files. The default image value range is [0,255], the data type is
'uint8', and cannot be changed.
* - ranges
- N
- Set the value ranges of the model inputs, expressed in list form, such as [[0,300],[0,0]]. If you want to generate a picture, you do not need to specify the value range, the default is [0,255].
In other cases, value ranges need to be specified.
- Set the value ranges of the model inputs, expressed in list form, such as
[[0,300],[0,0]]. If you want to generate a picture, you do not need to
specify the value range, the default is [0,255]. In other cases, value ranges need to be specified.
* - input_types
- N
- Set the model input types, such as 'si32,f32'. 'si32' and 'f32' types are supported. False by default, and it will be read from mlir. If you generate an image
, you do not need to specify the data type, the default is 'uint8'.
- Set the model input types, such as 'si32,f32'. 'si32' and 'f32' types are
supported. False by default, and it will be read from mlir. If you
generate an image, you do not need to specify the data type, the default
is 'uint8'.
* - output
- Y
- The names of the output.
Expand Down
2 changes: 1 addition & 1 deletion docs/developer_manual/source_zh/03_user_interface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -463,7 +463,7 @@ model_deploy.py
* - core
- 否
- 当target选择为bm1688或cv186x时,用于选择并行计算的tpu核心数量,默认设置为1个tpu核心
* - asymmetric
* - asymmetric
- 否
- 指定做int8非对称量化
* - dynamic
Expand Down
12 changes: 12 additions & 0 deletions docs/quick_start/source_en/00_disclaimer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,18 @@
* - Version
- Release date
- Explanation
* - v1.6.0
- 2024.02.23
- Added PyPI release form;
Supports user-defined Global operators;
Support for the CV186X processor platform
* - v1.5.0
- 2023.11.03
- Enhanced Global Layer support for multicore parallelism;
* - v1.4.0
- 2023.09.27
- System dependencies upgraded to Ubuntu 22.04;
Supported BM1684 Winograd
* - v1.3.0
- 2023.07.27
- Add the function to manually specify operations computing with floating-point;
Expand Down
21 changes: 16 additions & 5 deletions docs/quick_start/source_en/07_quantization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -596,11 +596,22 @@ Also a log file named ``SensitiveLayerSearch`` is generated, its context is as b
INFO:root:outputs_cos_los = 0.008808857469573828
INFO:root:layer input3.1, layer type is top.Conv, best_th = 2.6186381997094728, best_method = KL, best_cos_loss = 0.008808857469573828
This log file records the cosine losses between the outputs of mix model and float model when setting each op to int8 with different quantize methods(MAX/Percentile9999/KL).
It also contains the loss information printed in the screen and the cosine similarity of mix model and float model.
The qtable generated by this program can be modified according to the loss information.
The best thresholds of each op are recorded in a new cali table named new_cali_table. This table is restored in current workspace and need to be used when generating mix model.
In this example, the loss of input3.1 is larger than other ops, thus you can only set input3.1 as float in qtable.
The log file records the threshold obtained for each operation under different
quantization methods (MAX/Percentile9999/KL) and provides the loss of similarity
(1 - cosine similarity) between the mixed-precision model using only the
corresponding threshold for that operation in int8 computation and the original
float model. It also includes the loss information of each operation output on
the screen side and the cosine similarity between the final mixed-precision
model and the original float model. Users can use the qtable output by the
program, or modify the qtable based on the loss information, and then generate
the mixed-precision model. After the search for sensitive layers is finished,
the optimal threshold will be updated to a new quantization table
'new_cali_table.txt', stored in the current project directory, which needs to be
called when generating the mixed-precision model. In this case, based on the
output loss information, it was observed that the loss of input3.1 is much
higher than that of other operations, which can be set to FP32 only in the
qtable.


Step 2: Gen mix precision model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/quick_start/source_en/Appx.01_to_onnx_convert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,10 @@ This section requires additional installation of openssl-1.1.1o (ubuntu 22.04 pr

Step 0: Install openssl-1.1.1o
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: shell
:linenos:
wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
Expand Down Expand Up @@ -183,4 +185,3 @@ Install the paddle2onnx tool through the following commands, and use this tool t
--save_file squeezenet1_1.onnx
After running all the above commands we will get an onnx model named squeezenet1_1.onnx.

12 changes: 12 additions & 0 deletions docs/quick_start/source_zh/00_disclaimer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,18 @@
* - 版本
- 发布日期
- 说明
* - v1.6.0
- 2024.02.23
- 添加了Pypi发布形式;
支持用户自定义Global算子;
支持了CV186X处理器平台
* - v1.5.0
- 2023.11.03
- 更多Global Layer支持多核并行;
* - v1.4.0
- 2023.09.27
- 系统依赖升级到Ubuntu22.04;
支持了BM1684 Winograd
* - v1.3.0
- 2023.07.27
- 增加手动指定浮点运算区域功能;
Expand Down
11 changes: 6 additions & 5 deletions docs/quick_start/source_zh/03_onnx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ ONNX转MLIR
y = (x - mean) \times scale
官网yolov5的图片是rgb, 每个值会乘以 ``1/255`` , 转换成mean和scale对应为
官网yolov5的图片是rgb格式, 每个值会乘以 ``1/255`` , 转换成mean和scale对应为
``0.0,0.0,0.0`` 和 ``0.0039216,0.0039216,0.0039216`` 。

模型转换命令如下:
Expand Down Expand Up @@ -124,7 +124,7 @@ ONNX转MLIR
- 图片每个通道的比值, 默认为1.0,1.0,1.0
* - pixel_format
- 否
- 图片类型, 可以是rgb、bgr、gray、rgbd四种情况, 默认为bgr
- 图片类型, 可以是rgb、bgr、gray、rgbd四种格式, 默认为bgr
* - channel_format
- 否
- 通道类型, 对于图片输入可以是nhwc或nchw, 非图片输入则为none, 默认为nchw
Expand All @@ -133,7 +133,7 @@ ONNX转MLIR
- 指定输出的名称, 如果不指定, 则用模型的输出; 指定后用该指定名称做输出
* - test_input
- 否
- 指定输入文件用于验证, 可以是图片或npy或npz; 可以不指定, 则不会正确性验证
- 指定输入文件用于验证, 可以是图片或npy或npz; 可以不指定, 则不会进行正确性验证
* - test_result
- 否
- 指定验证后的输出文件
Expand Down Expand Up @@ -183,7 +183,8 @@ MLIR转F16模型
- 指定默认量化类型, 支持F32/F16/BF16/INT8
* - processor
- 是
- 指定模型将要用到的平台, 支持bm1688/bm1684x/bm1684/cv186x/cv183x/cv182x/cv181x/cv180x
- 指定模型将要用到的平台,
支持bm1688, bm1684x, bm1684, cv186x, cv183x, cv182x, cv181x, cv180x
* - calibration_table
- 否
- 指定校准表路径, 当存在INT8量化的时候需要校准表
Expand All @@ -192,7 +193,7 @@ MLIR转F16模型
- 表示 MLIR 量化后的结果与 MLIR fp32推理结果相似度的误差容忍度
* - test_input
- 否
- 指定输入文件用于验证, 可以是图片或npy或npz; 可以不指定, 则不会正确性验证
- 指定输入文件用于验证, 可以是图片或npy或npz; 可以不指定, 则不会进行正确性验证
* - test_reference
- 否
- 用于验证模型正确性的参考数据(使用npz格式)。其为各算子的计算结果
Expand Down
2 changes: 1 addition & 1 deletion docs/quick_start/source_zh/06_tflite.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ TFLite转MLIR
转成mlir文件后, 会生成一个 ``mobilebert_tf_in_f32.npz`` 文件, 该文件是模型的输入文件。


MLIR转模型
MLIR转INT8模型
------------------

该模型是tflite int8模型, 可以按如下参数转成模型:
Expand Down
17 changes: 11 additions & 6 deletions docs/quick_start/source_zh/07_quantization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,8 @@
- 输入校准表
* - processor
- 是
- 指定模型将要用到的平台, 支持bm1688/bm1684x/bm1684/cv186x/cv183x/cv182x/cv181x/cv180x
- 指定模型将要用到的平台,
支持bm1688, bm1684x, bm1684, cv186x, cv183x, cv182x, cv181x, cv180x
* - fp_type
- 否
- 指定混精度使用的float类型, 支持auto,F16,F32,BF16,默认为auto,表示由程序内部自动选择
Expand Down Expand Up @@ -482,7 +483,8 @@ INT8对称量化模型:
- 输入校准表
* - processor
- 是
- 指定模型将要用到的平台, 支持bm1688/bm1684x/bm1684/cv186x/cv183x/cv182x/cv181x/cv180x
- 指定模型将要用到的平台,
支持bm1688, bm1684x, bm1684, cv186x, cv183x, cv182x, cv181x, cv180x
* - fp_type
- 否
- 指定混精度使用的float类型, 支持auto,F16,F32,BF16,默认为auto,表示由程序内部自动选择
Expand Down Expand Up @@ -599,10 +601,12 @@ INT8对称量化模型:
INFO:root:layer input3.1, layer type is top.Conv, best_th = 2.6186381997094728, best_method = KL, best_cos_loss = 0.008808857469573828
日志文件记录了每个op在每种量化方法(MAX/Percentile9999/KL)得到的threshold下,设置为int8后,混精度模型与原始float模型输出的相似度的loss(1-余弦相似度)。
同时也包含了屏幕端输出的每个op的loss信息以及最后的混精度模型与原始float模型的余弦相似度。
日志文件记录了每个Op在不同量化方法(MAX/Percentile9999/KL)下得到的threshold,同
时给出了在只对该Op使用对应threshold做int8计算后的混精度模型与原始float模型输出的相似度的loss(1-余弦相似度)。
此外,日志还包含了屏幕端输出的每个op的loss信息以及最后的混精度模型与原始float模型的余弦相似度。
用户可以使用程序输出的qtable,也可以根据loss信息对qtable进行修改,然后生成混精度模型。
在敏感层搜索结束后,最优的threshold会被更新到一个新的量化表new_cali_table.txt,该量化表存储在当前工程目录下,在生成混精度模型时需要调用新量化表。
在敏感层搜索结束后,最优的threshold会被更新到一个新的量化表new_cali_table.txt,
该量化表存储在当前工程目录下,在生成混精度模型时需要调用新量化表。
在本例中,根据输出的loss信息,观察到input3.1的loss比其他op高很多,可以在qtable中只设置input3.1为FP32。

第二步: 生成混精度量化模型
Expand Down Expand Up @@ -742,7 +746,8 @@ INT8模型mAP为: 34.70%
- 指定起点和终点之间的层不执行量化,起点和终点之间用:间隔,多个block之间用空格间隔
* - processor
- 是
- 指定模型将要用到的平台, 支持bm1688/bm1684x/bm1684/cv186x/cv183x/cv182x/cv181x/cv180x
- 指定模型将要用到的平台,
支持bm1688, bm1684x, bm1684, cv186x, cv183x, cv182x, cv181x, cv180x
* - fp_type
- 否
- 指定混精度使用的float类型, 支持auto,F16,F32,BF16,默认为auto,表示由程序内部自动选择
Expand Down
6 changes: 4 additions & 2 deletions docs/quick_start/source_zh/Appx.01_to_onnx_convert.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,12 +122,15 @@ PaddlePaddle模型转ONNX
本节需要额外安装openssl-1.1.1o(ubuntu 22.04默认提供openssl-3.0.2)。

步骤0:安装openssl-1.1.1o
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: shell
:linenos:
wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.19_amd64.deb
如果上述链接失效,请参考 http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/?C=M;O=D 更换有效链接.

步骤1:创建工作目录
Expand Down Expand Up @@ -183,4 +186,3 @@ PaddlePaddle模型转ONNX
--save_file squeezenet1_1.onnx
运行完以上所有命令后我们将获得一个名为squeezenet1_1.onnx的onnx模型。

11 changes: 6 additions & 5 deletions docs/quick_start/source_zh/Appx.02_cv18xx_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -336,8 +336,8 @@ INT8 cvimodel的执行方式如下, 得到 ``dog_int8.jpg`` :

需要如下文件:

* cvitek_tpu_sdk_[cv186x|cv183x|cv182x|cv182x_uclibc|cv181x_glibc32|cv181x_musl_riscv64_rvv|cv180x_musl_riscv64_rvv|cv181x_glibc_riscv64].tar.gz
* cvimodel_samples_[cv186x|cv183x|cv182x|cv181x|cv180x].tar.gz
* cvitek_tpu_sdk_[cv186x | cv183x | cv182x | cv182x_uclibc | cv181x_glibc32 | cv181x_musl_riscv64_rvv | cv180x_musl_riscv64_rvv | cv181x_glibc_riscv64].tar.gz
* cvimodel_samples_[cv186x | cv183x | cv182x | cv181x | cv180x].tar.gz

将根据处理器类型选择所需文件加载至EVB的文件系统,于evb上的linux console执行,以cv183x为例:

Expand Down Expand Up @@ -463,7 +463,7 @@ INT8 cvimodel的执行方式如下, 得到 ``dog_int8.jpg`` :

本节需要如下文件:

* cvitek_tpu_sdk_[cv186x|cv183x|cv182x|cv182x_uclibc|cv181x_glibc32|cv181x_musl_riscv64_rvv|cv180x_musl_riscv64_rvv].tar.gz
* cvitek_tpu_sdk_[cv186x | cv183x | cv182x | cv182x_uclibc | cv181x_glibc32 | cv181x_musl_riscv64_rvv | cv180x_musl_riscv64_rvv].tar.gz
* cvitek_tpu_samples.tar.gz

aarch 64位 (如cv183x aarch64位平台)
Expand Down Expand Up @@ -836,15 +836,16 @@ FAQ
支持多线程, 但是多个模型在深度学习处理器上推理时是串行进行的。

5 填充input tensor相关接口区别
```````````````````````````````
``````````````````````````````

``CVI_NN_SetTensorPtr`` : 设置input tensor的虚拟地址,原本的tensor 内存不会释放。推理时从用户设置的虚拟地址 **拷贝数据** 到原本的tensor内存上。

``CVI_NN_SetTensorPhysicalAddr`` : 设置input tensor的物理地址,原本的tensor 内存会释放。推理时直接从新设置的物理地址读取数据, **无需拷贝数据** 。从VPSS获取的Frame可以调用这个接口,传入Frame的首地址。注意需要转模型的时候 ``model_deploy`` 设置 ``--fused_preprocess --aligned_input`` 才能调用此接口。

``CVI_NN_SetTensorWithVideoFrame`` : 通过VideoFrame结构体来填充Input Tensor。注意VideoFrame的地址为物理地址。如果转模型设置 ``--fuse_preprocess --aligned_input`` ,则等同于 ``CVI_NN_SetTensorPhysicalAddr`` ,否则会将VideoFrame的数据拷贝到Input Tensor。

``CVI_NN_SetTensorWithAlignedFrames`` : 支持多batch,与 ``CVI_NN_SetTensorWithVideoFrame`` 类似。
``CVI_NN_SetTensorWithAlignedFrames`` : 与 ``CVI_NN_SetTensorWithVideoFrame`` 类
似, 支持多batch。

``CVI_NN_FeedTensorWithFrames`` : 与 ``CVI_NN_SetTensorWithVideoFrame`` 类似。

Expand Down

0 comments on commit 7cde80d

Please sign in to comment.