流程
- 配置wsl2+ubuntu+docker
- docker部署zepan/zhouyi的一些小技巧
- 准备模型
- 准备矫正数据集
- 准备输入样本和输出参考
- 修改配置文件
- 验真结果
1. 配置wsl2+ubuntu+docker
- windows配置wsl2+ubuntu教程可参考链接
- 从win10访问Ubuntu的文件(添加到快捷方式会比较容易找到)
配置好ubuntu后输入
explorer.exe .
- 配置docker教程可参考链接
- 注意!!
wsl中启动docker应该执行
sudo service docker start
2. docker部署zepan/zhouyi的一些小技巧
- 根据官方教程的链接来部署
- 退出容器后可以运行
docker ps -a
查看容器ID 将容器里的主要文件拷贝到主机上
docker cp [容器ID]:/root/demos/tflite [主机目录]
新运行一个周易容器并挂载刚刚的目录文件,实现容器和主机之间文件的实时传输
sudo docker run -i -t --privileged=true -v [主机目录]:/root/demos/tflite zepan/zhouyi /bin/bash
3. 准备模型
- 可以直接下载github上的onnx模型shufflenet-9和shufflenet-v2-10
- 注意!!
- 请使用网址来检查模型的输入和输出,这在之后的修改配置文件中需要使用。
- shufflenet-9的输入输出
- shufflenet-v2-10的输入输出
4. 准备矫正数据集
在/root/demos/tflite/目录下新建一个
preprocess_dataset_onnx.py
写入以下代码并运行import os from torchvision import transforms from PIL import Image import numpy as np imgs_path = './img/' imgs_list = os.listdir(imgs_path) imgs_path_list = [imgs_path + i for i in imgs_list] imgs_list = [] for i in imgs_path_list: imgs_list = imgs_list + [Image.open(i)] transforms = transforms.Compose([ transforms.Resize(224), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225]) ]) imgs_list = [np.array(transforms(i)) for i in imgs_list] imgs_list = [np.transpose(i,(1,2,0)) for i in imgs_list] imgs_list = np.array(imgs_list) print(imgs_list.shape) np.save('./preprocess/data.npy',imgs_list) #保存label label_array = [] with open('label.txt') as f: line = f.readlines() for i in range(imgs_list.shape[0]): label_array = label_array + [line[i][29:-2]] label_array = [int(i) for i in label_array] label_array = np.array(label_array) print(label_array.shape) np.save('./preprocess/label.npy',label_array)
5. 准备输入样本和输出参考
在/root/demos/tflite/目录下新建一个
preprocess.py
写入以下代码并运行import cv2 import numpy as np import onnx import onnxruntime as ort input_height=224 input_width=224 input_channel=3 img_path = "./img/ILSVRC2012_val_00000004.JPEG" orig_image = cv2.imread(img_path) image = cv2.cvtColor(orig_image, cv2.COLOR_BGR2RGB) image = cv2.resize(image, (input_width, input_height)) image = (image - 127.5) / 1 image_1 = np.expand_dims(image, axis=0) image = image_1.astype(np.int8) image_1 = image_1.astype(np.float32) image_1 = image_1.transpose([0,3,1,2]) ort_session = ort.InferenceSession(r"\shufflenet-v2-10.onnx") outputs = ort_session.run(None, {'input':image_1}) print("onnx result:",outputs[0]) pred = 255 * outputs[0] pred = pred.astype(np.uint8) fw=open('./preprocess/output_ref.bin', 'wb') fw.write(pred) fw.close() image.tofile("./preprocess/input.bin") print("save to input.bin OK")
同时修改/root/demos/tflite/目录下的
quant_predict.py
,以备后续测试使用from PIL import Image import cv2 from matplotlib import pyplot as plt import matplotlib.patches as patches import numpy as np import os import imagenet_classes as class_name current_dir = os.getcwd() label_offset = 1 outputfile = current_dir + '/preprocess/output.bin' npyoutput = np.fromfile(outputfile, dtype=np.uint8) outputclass = npyoutput.argmax() head5p = npyoutput.argsort()[-5:][::-1] labelfile = current_dir + '/preprocess/output_ref.bin' npylabel = np.fromfile(labelfile, dtype=np.int8) labelclass = npylabel.argmax() head5t = npylabel.argsort()[-5:][::-1] print("predict first 5 label:") for i in head5p: print(" index %4d, prob %3d, name: %s"%(i, npyoutput[i], class_name.class_names[i-label_offset])) print("true first 5 label:") for i in head5t: print(" index %4d, prob %3d, name: %s"%(i, npylabel[i], class_name.class_names[i-label_offset])) # Show input picture print('Detect picture save to result.jpeg') input_path = './preprocess/input.bin' npyinput = np.fromfile(input_path, dtype=np.int8) image = np.clip(np.round(npyinput)+128, 0, 255).astype(np.uint8) image = np.reshape(image, (224, 224, 3)) im = Image.fromarray(image) im.save('result.jpeg')
6. 修改配置文件
参考
tflite_mobilenet_v2_run.cfg
文件,并生成一个自己的onnx_shufflenet_run.cfg
文件,内如如下[Common] mode = run [Parser] model_type = onnx input_data_format = NCHW model_name = shufflenet detection_postprocess = model_domain = image_classification input_model = ./preprocess/shufflenet-v2-10.onnx input = input input_shape = [1, 3, 224, 224] output = [AutoQuantizationTool] quantize_method = SYMMETRIC ops_per_channel = DepthwiseConv reverse_rgb = False calibration_data = ./preprocess/data.npy calibration_label = ./preprocess/label.npy label_id_offset = 0 preprocess_mode = normalize quant_precision = int8 [GBuilder] inputs=./preprocess/input.bin simulator=aipu_simulator_z1 outputs=./preprocess/output.bin profile= True target=Z1_0701
7. 验真结果
执行命令
aipubuild config/onnx_shufflenet_run.cfg
得到输出结果
root@f4e7a897f777:~/demos/tflite# aipubuild config/onnx_shufflenet_run.cfg WARNING:tensorflow: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons * https://github.com/tensorflow/io (for I/O related ops) If you depend on functionality not listed there, please file an issue. [I] Parsing model.... [I] [Parser]: Begin to parse onnx model shufflenet... 2021-07-27 15:46:56.578841: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2021-07-27 15:46:56.591916: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz 2021-07-27 15:46:56.602401: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7bee6d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-07-27 15:46:56.602450: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) [I] [Parser]: Parser done! [I] Parse model complete [I] Quantizing model.... [I] AQT start: model_name:shufflenet, calibration_method:MEAN, batch_size:1 [I] ==== read ir ================ [I] float32 ir txt: /tmp/AIPUBuilder_1627400815.3939884/shufflenet.txt [I] float32 ir bin2: /tmp/AIPUBuilder_1627400815.3939884/shufflenet.bin [I] ==== read ir DONE.=========== WARNING:tensorflow:From /usr/local/bin/aipubuild:8: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead. WARNING:tensorflow:From /usr/local/bin/aipubuild:8: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead. [I] ==== auto-quantization ====== WARNING:tensorflow:From /usr/local/bin/aipubuild:8: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version. Instructions for updating: Use eager execution and: `tf.data.TFRecordDataset(path)` WARNING:tensorflow:Entity <bound method ImageNet.data_transform_fn of <AIPUBuilder.AutoQuantizationTool.auto_quantization.data_set.ImageNet object at 0x7f64b4cef588>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: <cyfunction ImageNet.data_transform_fn at 0x7f65cc8a7d38> is not a module, class, method, function, traceback, frame, or code object WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py:330: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py:330: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py:915: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead. WARNING:tensorflow:From /usr/local/bin/aipubuild:8: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`. WARNING:tensorflow:From /usr/local/bin/aipubuild:8: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead. WARNING:tensorflow:From /usr/local/bin/aipubuild:8: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead. [I] step1: get max/min statistic value DONE [W] shift value is discrete in Depthwise, layer Conv_12, fixed by constraining shift value, may lead to acc drop [W] shift value is discrete in Depthwise, layer Conv_42, fixed by constraining shift value, may lead to acc drop [W] shift value is discrete in Depthwise, layer Conv_76, fixed by constraining shift value, may lead to acc drop [W] shift value is discrete in Depthwise, layer Conv_91, fixed by constraining shift value, may lead to acc drop [W] shift value is discrete in Depthwise, layer Conv_106, fixed by constraining shift value, may lead to acc drop [W] shift value is discrete in Depthwise, layer Conv_151, fixed by constraining shift value, may lead to acc drop [W] shift value is discrete in Depthwise, layer Conv_181, fixed by constraining shift value, may lead to acc drop [W] shift value is discrete in Depthwise, layer Conv_200, fixed by constraining shift value, may lead to acc drop [W] shift value is discrete in Depthwise, layer Conv_215, fixed by constraining shift value, may lead to acc drop [W] shift value is discrete in Depthwise, layer Conv_230, fixed by constraining shift value, may lead to acc drop [W] shift value is discrete in Depthwise, layer Conv_245, fixed by constraining shift value, may lead to acc drop [I] step2: quantization each op DONE [I] step3: build quantization forward DONE [I] step4: show output scale of end node: [I] layer_id:137, layer_top:Gemm_260, output_scale:[9.276978] [I] ==== auto-quantization DONE = [I] Quantize model complete [I] Building ... [I] [common_options.h: 276] BuildTool version: 4.0.175. Build for target Z1_0701 at frequency 800MHz [I] [common_options.h: 297] using default profile events to profile AIFF [I] [IRChecker] Start to check IR: /tmp/AIPUBuilder_1627400815.3939884/shufflenet_int8.txt [I] [IRChecker] model_name: shufflenet [I] [IRChecker] IRChecker: All IR pass [I] [graph.cpp : 846] loading graph weight: /tmp/AIPUBuilder_1627400815.3939884/shufflenet_int8.bin size: 0x2322f0 [I] [builder.cpp:1059] Total memory for this graph: 0x90f940 Bytes [I] [builder.cpp:1060] Text section: 0x000b61c0 Bytes [I] [builder.cpp:1061] RO section: 0x00007300 Bytes [I] [builder.cpp:1062] Desc section: 0x00010900 Bytes [I] [builder.cpp:1063] Data section: 0x00286f80 Bytes [I] [builder.cpp:1064] BSS section: 0x0057a800 Bytes [I] [builder.cpp:1065] Stack : 0x00040400 Bytes [I] [builder.cpp:1066] Workspace(BSS): 0x00049800 Bytes [I] [main.cpp : 467] # autogenrated by aipurun, do NOT modify! LOG_FILE=log_default FAST_FWD_INST=0 INPUT_INST_CNT=1 INPUT_DATA_CNT=2 CONFIG=Z1-0701 LOG_LEVEL=0 INPUT_INST_FILE0=/tmp/temp_3bbac3852f594e475c81cc8abeccf.text INPUT_INST_BASE0=0x0 INPUT_INST_STARTPC0=0x0 INPUT_DATA_FILE0=/tmp/temp_3bbac3852f594e475c81cc8abeccf.ro INPUT_DATA_BASE0=0x10000000 INPUT_DATA_FILE1=/tmp/temp_3bbac3852f594e475c81cc8abeccf.data INPUT_DATA_BASE1=0x20000000 OUTPUT_DATA_CNT=2 OUTPUT_DATA_FILE0=output.bin OUTPUT_DATA_BASE0=0x20a05a00 OUTPUT_DATA_SIZE0=0x3e8 OUTPUT_DATA_FILE1=profile_data.bin OUTPUT_DATA_BASE1=0x20410b80 OUTPUT_DATA_SIZE1=0xf00 RUN_DESCRIPTOR=BIN[0] [I] [main.cpp : 118] run simulator: aipu_simulator_z1 /tmp/temp_3bbac3852f594e475c81cc8abeccf.cfg [INFO]:SIMULATOR START! [INFO]:======================================================================== [INFO]: STATIC CHECK [INFO]:======================================================================== [INFO]: INST START ADDR : 0x0(0) [INFO]: INST END ADDR : 0xb61bf(745919) [INFO]: INST SIZE : 0xb61c0(745920) [INFO]: PACKET CNT : 0xb61c(46620) [INFO]: INST CNT : 0x2d870(186480) [INFO]:------------------------------------------------------------------------ [WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x3f41: 0x4720204(POP R4,Rc7) vs 0x47a1be0(ADD R0,R6,R31,Rc7), PACKET:0x3f41(16193) SLOT:0 vs 3 [WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x46b0: 0x4720204(POP R4,Rc7) vs 0x47a1be0(ADD R0,R6,R31,Rc7), PACKET:0x46b0(18096) SLOT:0 vs 3 [WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x4ad4: 0x472021b(POP R27,Rc7) vs 0x5f00000(MVI R0,0x0,Rc7), PACKET:0x4ad4(19156) SLOT:0 vs 3 [WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x4ae1: 0x472021b(POP R27,Rc7) vs 0x5f00000(MVI R0,0x0,Rc7), PACKET:0x4ae1(19169) SLOT:0 vs 3 [WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x4c46: 0x472021b(POP R27,Rc7) vs 0x9f80020(ADD.S R0,R0,0x1,Rc7), PACKET:0x4c46(19526) SLOT:0 vs 3 [WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x4de3: 0x4520180(BRL R0) vs 0x47a03e5(ADD R5,R0,R31,Rc7), PACKET:0x4de3(19939) SLOT:0 vs 3 [WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x57ab: 0x4720204(POP R4,Rc7) vs 0x47a1be0(ADD R0,R6,R31,Rc7), PACKET:0x57ab(22443) SLOT:0 vs 3 [WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x6c16: 0x4720204(POP R4,Rc7) vs 0x47a1be0(ADD R0,R6,R31,Rc7), PACKET:0x6c16(27670) SLOT:0 vs 3 [WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x7535: 0x4720204(POP R4,Rc7) vs 0x47a1be0(ADD R0,R6,R31,Rc7), PACKET:0x7535(30005) SLOT:0 vs 3 [WARN]:[0803] INST WR/RD REG CONFLICT! PACKET 0x808b: 0x4720204(POP R4,Rc7) vs 0x47a1be0(ADD R0,R6,R31,Rc7), PACKET:0x808b(32907) SLOT:0 vs 3 [INFO]:======================================================================== [INFO]: STATIC CHECK END [INFO]:======================================================================== [INFO]:AIPU START RUNNING: BIN[0] [INFO]:TOTAL TIME: 23.970811s. [INFO]:SIMULATOR EXIT! [I] [main.cpp : 135] Simulator finished. Total errors: 0, warnings: 0
- 得到Simulator总耗时为23.97s,感觉不错
执行测试文件
python quan_predict.py
得到最总结果
root@f4e7a897f777:~/demos/tflite# python quant_predict.py predict first 5 label: index 231, prob 150, name: Shetland sheepdog, Shetland sheep dog, Shetland index 232, prob 143, name: collie index 158, prob 46, name: papillon index 342, prob 45, name: hog, pig, grunter, squealer, Sus scrofa index 340, prob 31, name: sorrel true first 5 label: index 232, prob 123, name: collie index 231, prob 109, name: Shetland sheepdog, Shetland sheep dog, Shetland index 158, prob 41, name: papillon index 170, prob 36, name: borzoi, Russian wolfhound index 161, prob 34, name: Afghan hound, Afghan Detect picture save to result.jpeg