【周易AIPU 仿真】基于win10+wsl2+ubuntu+docker的shufflenet模型

流程

配置wsl2+ubuntu+docker
docker部署zepan/zhouyi的一些小技巧
准备模型
准备矫正数据集
准备输入样本和输出参考
修改配置文件
验真结果

1. 配置wsl2+ubuntu+docker

windows配置wsl2+ubuntu教程可参考链接
从win10访问Ubuntu的文件（添加到快捷方式会比较容易找到）
配置好ubuntu后输入
```
explorer.exe .
```
配置docker教程可参考链接
注意！！
wsl中启动docker应该执行
```
sudo service docker start
```

2. docker部署zepan/zhouyi的一些小技巧

根据官方教程的链接来部署
退出容器后可以运行docker ps -a查看容器ID

将容器里的主要文件拷贝到主机上

docker cp [容器ID]:/root/demos/tflite [主机目录]

新运行一个周易容器并挂载刚刚的目录文件，实现容器和主机之间文件的实时传输
```
sudo docker run -i -t --privileged=true -v [主机目录]:/root/demos/tflite zepan/zhouyi  /bin/bash
```

3. 准备模型

可以直接下载github上的onnx模型shufflenet-9和shufflenet-v2-10
注意！！
请使用网址来检查模型的输入和输出，这在之后的修改配置文件中需要使用。
shufflenet-9的输入输出
shufflenet-v2-10的输入输出

4. 准备矫正数据集

在/root/demos/tflite/目录下新建一个preprocess_dataset_onnx.py写入以下代码并运行

import os
from torchvision import transforms
from PIL import Image
import numpy as np

imgs_path = './img/'
imgs_list = os.listdir(imgs_path)
imgs_path_list = [imgs_path + i for i in imgs_list]
imgs_list = []
for i in imgs_path_list:
  imgs_list = imgs_list + [Image.open(i)]
transforms = transforms.Compose([
  transforms.Resize(224),
  transforms.CenterCrop(224),
  transforms.ToTensor(),
  transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])
])
imgs_list = [np.array(transforms(i)) for i in imgs_list]
imgs_list = [np.transpose(i,(1,2,0)) for i in imgs_list]
imgs_list = np.array(imgs_list)
print(imgs_list.shape)
np.save('./preprocess/data.npy',imgs_list)

#保存label
label_array = []
with open('label.txt') as f:
  line = f.readlines()
  for i in range(imgs_list.shape[0]):
      label_array = label_array + [line[i][29:-2]]
label_array = [int(i) for i in label_array]
label_array = np.array(label_array)
print(label_array.shape)
np.save('./preprocess/label.npy',label_array)

5. 准备输入样本和输出参考

在/root/demos/tflite/目录下新建一个preprocess.py写入以下代码并运行

import cv2
import numpy as np
import onnx
import onnxruntime as ort

input_height=224
input_width=224
input_channel=3

img_path = "./img/ILSVRC2012_val_00000004.JPEG"

orig_image = cv2.imread(img_path)
image = cv2.cvtColor(orig_image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (input_width, input_height))
image = (image - 127.5) / 1
image_1 = np.expand_dims(image, axis=0)
image = image_1.astype(np.int8)

image_1 = image_1.astype(np.float32)
image_1 = image_1.transpose([0,3,1,2])
ort_session = ort.InferenceSession(r"\shufflenet-v2-10.onnx")
outputs = ort_session.run(None, {'input':image_1})

print("onnx result:",outputs[0])    


pred = 255 * outputs[0]
pred = pred.astype(np.uint8)
fw=open('./preprocess/output_ref.bin', 'wb')
fw.write(pred)
fw.close()

image.tofile("./preprocess/input.bin")
print("save to input.bin OK")

同时修改/root/demos/tflite/目录下的quant_predict.py，以备后续测试使用

from PIL import Image
import cv2
from matplotlib import pyplot as plt
import matplotlib.patches as patches
import numpy as np
import os
import imagenet_classes as class_name

current_dir = os.getcwd()
label_offset = 1
outputfile = current_dir + '/preprocess/output.bin'
npyoutput = np.fromfile(outputfile, dtype=np.uint8)
outputclass = npyoutput.argmax()
head5p = npyoutput.argsort()[-5:][::-1]

labelfile = current_dir + '/preprocess/output_ref.bin'
npylabel = np.fromfile(labelfile, dtype=np.int8)
labelclass = npylabel.argmax()
head5t = npylabel.argsort()[-5:][::-1]

print("predict first 5 label:")
for i in head5p:
  print("    index %4d, prob %3d, name: %s"%(i, npyoutput[i], class_name.class_names[i-label_offset]))
  
print("true first 5 label:")
for i in head5t:
  print("    index %4d, prob %3d, name: %s"%(i, npylabel[i], class_name.class_names[i-label_offset]))

# Show input picture
print('Detect picture save to result.jpeg')

input_path = './preprocess/input.bin'
npyinput = np.fromfile(input_path, dtype=np.int8)
image = np.clip(np.round(npyinput)+128, 0, 255).astype(np.uint8)
image = np.reshape(image, (224, 224, 3))
im = Image.fromarray(image)
im.save('result.jpeg')

6. 修改配置文件

参考tflite_mobilenet_v2_run.cfg文件，并生成一个自己的onnx_shufflenet_run.cfg文件，内如如下

[Common]
mode = run

[Parser]
model_type = onnx
input_data_format = NCHW
model_name = shufflenet
detection_postprocess = 
model_domain = image_classification
input_model = ./preprocess/shufflenet-v2-10.onnx
input = input
input_shape = [1, 3, 224, 224]
output = 

[AutoQuantizationTool]
quantize_method = SYMMETRIC
ops_per_channel = DepthwiseConv
reverse_rgb = False
calibration_data = ./preprocess/data.npy
calibration_label = ./preprocess/label.npy
label_id_offset = 0
preprocess_mode = normalize
quant_precision = int8

[GBuilder]
inputs=./preprocess/input.bin
simulator=aipu_simulator_z1
outputs=./preprocess/output.bin
profile= True
target=Z1_0701

7. 验真结果

执行命令

aipubuild config/onnx_shufflenet_run.cfg

得到输出结果

root@f4e7a897f777:~/demos/tflite# aipubuild config/onnx_shufflenet_run.cfg
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

[I] Parsing model....
[I] [Parser]: Begin to parse onnx model shufflenet...
2021-07-27 15:46:56.578841: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-07-27 15:46:56.591916: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2021-07-27 15:46:56.602401: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7bee6d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-27 15:46:56.602450: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
[I] [Parser]: Parser done!
[I] Parse model complete
[I] Quantizing model....
[I] AQT start: model_name:shufflenet, calibration_method:MEAN, batch_size:1
[I] ==== read ir ================
[I]     float32 ir txt: /tmp/AIPUBuilder_1627400815.3939884/shufflenet.txt
[I]     float32 ir bin2: /tmp/AIPUBuilder_1627400815.3939884/shufflenet.bin
[I] ==== read ir DONE.===========
WARNING:tensorflow:From /usr/local/bin/aipubuild:8: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/bin/aipubuild:8: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

[I] ==== auto-quantization ======
WARNING:tensorflow:From /usr/local/bin/aipubuild:8: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:Entity <bound method ImageNet.data_transform_fn of <AIPUBuilder.AutoQuantizationTool.auto_quantization.data_set.ImageNet object at 0x7f64b4cef588>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: <cyfunction ImageNet.data_transform_fn at 0x7f65cc8a7d38> is not a module, class, method, function, traceback, frame, or code object
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py:330: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py:330: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py:915: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

WARNING:tensorflow:From /usr/local/bin/aipubuild:8: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
WARNING:tensorflow:From /usr/local/bin/aipubuild:8: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From /usr/local/bin/aipubuild:8: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.


[I]     step1: get max/min statistic value DONE
[W] shift value is discrete in Depthwise, layer Conv_12, fixed by constraining shift value, may lead to acc drop
[W] shift value is discrete in Depthwise, layer Conv_42, fixed by constraining shift value, may lead to acc drop
[W] shift value is discrete in Depthwise, layer Conv_76, fixed by constraining shift value, may lead to acc drop
[W] shift value is discrete in Depthwise, layer Conv_91, fixed by constraining shift value, may lead to acc drop
[W] shift value is discrete in Depthwise, layer Conv_106, fixed by constraining shift value, may lead to acc drop
[W] shift value is discrete in Depthwise, layer Conv_151, fixed by constraining shift value, may lead to acc drop
[W] shift value is discrete in Depthwise, layer Conv_181, fixed by constraining shift value, may lead to acc drop
[W] shift value is discrete in Depthwise, layer Conv_200, fixed by constraining shift value, may lead to acc drop
[W] shift value is discrete in Depthwise, layer Conv_215, fixed by constraining shift value, may lead to acc drop
[W] shift value is discrete in Depthwise, layer Conv_230, fixed by constraining shift value, may lead to acc drop
[W] shift value is discrete in Depthwise, layer Conv_245, fixed by constraining shift value, may lead to acc drop
[I]     step2: quantization each op DONE
[I]     step3: build quantization forward DONE
[I]     step4: show output scale of end node:
[I]             layer_id:137, layer_top:Gemm_260, output_scale:[9.276978]
[I] ==== auto-quantization DONE =
[I] Quantize model complete
[I] Building ...
[I] [common_options.h: 276] BuildTool version: 4.0.175. Build for target Z1_0701 at frequency 800MHz
[I] [common_options.h: 297] using default profile events to profile AIFF

[I] [IRChecker] Start to check IR: /tmp/AIPUBuilder_1627400815.3939884/shufflenet_int8.txt
[I] [IRChecker] model_name: shufflenet
[I] [IRChecker] IRChecker: All IR pass
[I] [graph.cpp : 846] loading graph weight: /tmp/AIPUBuilder_1627400815.3939884/shufflenet_int8.bin size: 0x2322f0
[I] [builder.cpp:1059] Total memory for this graph: 0x90f940 Bytes
[I] [builder.cpp:1060] Text   section:  0x000b61c0 Bytes
[I] [builder.cpp:1061] RO     section:  0x00007300 Bytes
[I] [builder.cpp:1062] Desc   section:  0x00010900 Bytes
[I] [builder.cpp:1063] Data   section:  0x00286f80 Bytes
[I] [builder.cpp:1064] BSS    section:  0x0057a800 Bytes
[I] [builder.cpp:1065] Stack         :  0x00040400 Bytes
[I] [builder.cpp:1066] Workspace(BSS):  0x00049800 Bytes
[I] [main.cpp  : 467] # autogenrated by aipurun, do NOT modify!
LOG_FILE=log_default
FAST_FWD_INST=0
INPUT_INST_CNT=1
INPUT_DATA_CNT=2
CONFIG=Z1-0701
LOG_LEVEL=0
INPUT_INST_FILE0=/tmp/temp_3bbac3852f594e475c81cc8abeccf.text
INPUT_INST_BASE0=0x0
INPUT_INST_STARTPC0=0x0
INPUT_DATA_FILE0=/tmp/temp_3bbac3852f594e475c81cc8abeccf.ro
INPUT_DATA_BASE0=0x10000000
INPUT_DATA_FILE1=/tmp/temp_3bbac3852f594e475c81cc8abeccf.data
INPUT_DATA_BASE1=0x20000000
OUTPUT_DATA_CNT=2
OUTPUT_DATA_FILE0=output.bin
OUTPUT_DATA_BASE0=0x20a05a00
OUTPUT_DATA_SIZE0=0x3e8
OUTPUT_DATA_FILE1=profile_data.bin
OUTPUT_DATA_BASE1=0x20410b80
OUTPUT_DATA_SIZE1=0xf00
RUN_DESCRIPTOR=BIN[0]

[I] [main.cpp  : 118] run simulator:
aipu_simulator_z1 /tmp/temp_3bbac3852f594e475c81cc8abeccf.cfg
[INFO]:SIMULATOR START!
[INFO]:========================================================================
[INFO]:                             STATIC CHECK
[INFO]:========================================================================
[INFO]:  INST START ADDR : 0x0(0)
[INFO]:  INST END ADDR   : 0xb61bf(745919)
[INFO]:  INST SIZE       : 0xb61c0(745920)
[INFO]:  PACKET CNT      : 0xb61c(46620)
[INFO]:  INST CNT        : 0x2d870(186480)
[INFO]:------------------------------------------------------------------------
[WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x3f41: 0x4720204(POP R4,Rc7) vs 0x47a1be0(ADD R0,R6,R31,Rc7), PACKET:0x3f41(16193) SLOT:0 vs 3
[WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x46b0: 0x4720204(POP R4,Rc7) vs 0x47a1be0(ADD R0,R6,R31,Rc7), PACKET:0x46b0(18096) SLOT:0 vs 3
[WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x4ad4: 0x472021b(POP R27,Rc7) vs 0x5f00000(MVI R0,0x0,Rc7), PACKET:0x4ad4(19156) SLOT:0 vs 3
[WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x4ae1: 0x472021b(POP R27,Rc7) vs 0x5f00000(MVI R0,0x0,Rc7), PACKET:0x4ae1(19169) SLOT:0 vs 3
[WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x4c46: 0x472021b(POP R27,Rc7) vs 0x9f80020(ADD.S R0,R0,0x1,Rc7), PACKET:0x4c46(19526) SLOT:0 vs 3
[WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x4de3: 0x4520180(BRL R0) vs 0x47a03e5(ADD R5,R0,R31,Rc7), PACKET:0x4de3(19939) SLOT:0 vs 3
[WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x57ab: 0x4720204(POP R4,Rc7) vs 0x47a1be0(ADD R0,R6,R31,Rc7), PACKET:0x57ab(22443) SLOT:0 vs 3
[WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x6c16: 0x4720204(POP R4,Rc7) vs 0x47a1be0(ADD R0,R6,R31,Rc7), PACKET:0x6c16(27670) SLOT:0 vs 3
[WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x7535: 0x4720204(POP R4,Rc7) vs 0x47a1be0(ADD R0,R6,R31,Rc7), PACKET:0x7535(30005) SLOT:0 vs 3
[WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x808b: 0x4720204(POP R4,Rc7) vs 0x47a1be0(ADD R0,R6,R31,Rc7), PACKET:0x808b(32907) SLOT:0 vs 3
[INFO]:========================================================================
[INFO]:                             STATIC CHECK END
[INFO]:========================================================================

[INFO]:AIPU START RUNNING: BIN[0]
[INFO]:TOTAL TIME: 23.970811s.
[INFO]:SIMULATOR EXIT!
[I] [main.cpp  : 135] Simulator finished.
Total errors: 0,  warnings: 0

得到Simulator总耗时为23.97s，感觉不错
执行测试文件
```
python quan_predict.py
```

得到最总结果

root@f4e7a897f777:~/demos/tflite# python quant_predict.py
predict first 5 label:
  index  231, prob 150, name: Shetland sheepdog, Shetland sheep dog, Shetland
  index  232, prob  143, name: collie
  index  158, prob   46, name: papillon
  index  342, prob   45, name: hog, pig, grunter, squealer, Sus scrofa
  index  340, prob   31, name: sorrel
true first 5 label:
  index  232, prob  123, name: collie
  index  231, prob  109, name: Shetland sheepdog, Shetland sheep dog, Shetland
  index  158, prob  41, name: papillon
  index  170, prob  36, name: borzoi, Russian wolfhound
  index  161, prob  34, name: Afghan hound, Afghan
Detect picture save to result.jpeg

流程