R329开发板系列教程

本系列教程主要介绍在R329(MaixSense）板卡上进行AI模型部署。预计分为以下几个部分：

Zhouyi Compass 部署及仿真 (申请样板必看)
R329开发板运行AIPU可执行程序！
R329开发板调用摄像头及屏幕进行实时模型运行
R329开发板基于python的模型运行
R329开发板Debian系统初体验

本文为第一篇，介绍 Zhouyi Compass的部署及仿真

Zhouyi Compass的部署及仿真

Zhouyi Compass是周易NPU的工具合集，这里主要介绍AIPU NN compiler和simulator的使用。

0. NN compiler 工作流程简述

NN compiler是用于转换神经网络模型到AIPU可执行程序的编译器。
内部执行原理为：

模型解析器：转换预训练模型到IR (Intermediate Representation)
1. pb(tf1.0~1.15)
2. tflite(tf1.0~1.15)
3. caffemodel(version 1)
4. onnx(up to opsets 15)
量化模块：转换 float IR到int8 IR
生成模块：使用int8 IR 生成AIPU可执行文件，可以在真实芯片上运行

1. 下载环境

使用矽速科技提供的docker环境进行开发：

注：请保证至少有20GB的空闲磁盘空间

# 方法一，从docker hub下载，需要梯子
sudo docker pull zepan/zhouyi
# 方法二，百度云下载镜像文件（压缩包约2.9GB，解压后约5.3GB）
# 链接：https://pan.baidu.com/s/16AgLIn-6W5bIDofrcdG-YA 
# 提取码：mden 

gunzip zhouyi_docker.tar.gz
sudo docker load --input zhouyi_docker.tar

下载好docker后即可运行其中的例程测试环境是否正常：

sudo docker run -i -t zepan/zhouyi  /bin/bash

cd ~/demos/tflite
./run_sim.sh
python3 quant_predict.py

2. 生成模型文件

目前 NN compiler 支持pb,tflite,caffemodel,onnx格式，用户需要先转换自己的模型格式到对应格式

常见预训练模型文件在 github上可以下载：
https://github.com/tensorflow...

下载好预训练的ckpt文件后，转换ckpt到冻结的pb文件, 这里建议使用tf1.13~1.15之间的版本

# 导出图
python3 export_inference_graph.py \
    --alsologtostderr \
    --model_name=resnet_v1_50 \
    --image_size=224 \
    --labels_offset=1 \ # resnet_50 specific, default is 0
    --output_file=/tmp/resnet_v1_50_inf.pb
# 使用预训练权重冻结
python3 freeze_graph.py \
    --input_graph=/tmp/resnet_v1_50_inf.pb \
    --input_checkpoint=/tmp/resnet_v1.ckpt \
    --input_binary=true --output_graph=/tmp/resnet_v1_50_frozen.pb \
    --output_node_names= resnet_v1_50/predictions/Reshape_1

3. 准备量化矫正数据集

量化矫正数据集分为两部分

数据文件：经过预处理的数据，即最终送给模型输入的数据，如 [image.numpy(), ...]
标签文件：label文件, 如 np.array(label_list)

这两类文件需要使用numpy文件格式存储，示例代码片段：

label_data = open(label_file)
filename_list = []
label_list = []
for line in label_data:
    filename_list.append(line.rstrip('\n').split(' ')[0])
    label_list.append(int(line.rstrip('\n').split(' ')[1]))
label_data.close()
img_num = len(label_list)

images = np.zeros([img_num, input_height, input_width, input_channel], np.float32)
for file_name, img_idx in zip(filename_list, range(img_num)):   
    image_file = os.path.join(img_dir, file_name)
    img_s = tf.gfile.GFile(image_file, 'rb').read()  
    image = tf.image.decode_jpeg(img_s)
    image = tf.cast(image, tf.float32)
    image = tf.clip_by_value(image, 0., 255.)
    image = aspect_preserving_resize(image, min(input_height, input_width), input_channel) 
    image = central_crop(image, input_height, input_width)
    image = tf.image.resize_images(image, [input_height, input_width])
    image = (image - mean) / var
    image = image.numpy()
    _, _, ch = image.shape
    if ch == 1:
        image = tf.tile(image, multiples=[1,1,3])
        image = image.numpy()
    images[img_idx] = image

np.save('dataset.npy', images)

labels = np.array(label_list)
np.save('label.npy', labels)

4. 编辑NN compiler配置文件

得到pb和校准数据集后，我们就可以编辑NN编译器的配置文件来生成AIPU的可执行文件

[Common]
mode=build   #build表示构建aipu可执行程序，run表示使用simulator模拟运行

[Parser]
model_name = resnet_50
detection_postprocess = 
model_domain = image_classification
output = resnet_v1_50/predictions/Reshape
input_model = ./resnet_50_model/frozen.pb
input = Placeholder
input_shape = [1,224,224,3]

[AutoQuantizationTool]
model_name = resnet_50
quantize_method = SYMMETRIC
ops_per_channel = DepthwiseConv
calibration_data = ./preprocess_resnet_50_dataset/dataset.npy
calibration_label = ./preprocess_resnet_50_dataset/label.npy
preprocess_mode = normalize
quant_precision=int8
reverse_rgb = False
label_id_offset = 0

# build模式下的写法
[GBuilder]  
target=Z1_0701
outputs=./resnet_50_model/aipu_resnet_50.bin
profile= True

# run模式下的写法
[GBuilder]
inputs=./resnet_50_model/input.bin  #输入图像的二进制文件，按HWC排序的bin
outputs=output_resnet_50.bin   #输出结果
simulator=./aipu_simulator_z1  #模拟器路径，这里放到同路径下
profile= True
target=Z1_0701

5. 仿真AIPU执行结果

编辑完cfg文件后，即可执行获得运行结果

aipubuild config/resnet_50_build_run.cfg

执行后得到运算结果：output_resnet_50.bin
以及在执行输出过程中可以得到最后一层的反量化系数：

 [I]         layer_id: 76, layer_top:resnet_v1_50/predictions/Reshape_0, output_scale:[7.5395403]

这里的demo是1000分类，所以 output_resnet_50.bin 是1000字节的int8结果，除以这个 output_scale 就是实际的float输出结果。
这里简单使用int8格式进行解析，得到最大概率对应的类别，可以看到和实际图片类别一致

outputfile = './output_resnet_50.bin'
npyoutput = np.fromfile(outputfile, dtype=np.int8)
outputclass = npyoutput.argmax()
print("Predict Class is %d"%outputclass)

6. 申请开发板需要提供的仿真测试内容

原始模型文件（可选）
矫正集的data.npy和label.npy
NN compiler的cfg文件
simulator执行的输入输出结果，比较运算量化误差
详细申请流程请参见https://aijishu.com/e/1120000000214336

R329教程一|周易 AIPU 部署及仿真教程