【聆思CSK6视觉AI开发套件试用】手势识别和TinyMaix神经网络推理实验
CSK6 是聆思科技新一代的 AI 芯片 SoC 产品系列,采用多核异构架构,集成了 ARM Star MCU,HiFi4 DSP,以及聆思全新设计的 AI 神经网络处理内核 NPU,算力达到 128 GOPS。多核异构的设计使芯片能以较低功耗满足音频及图像视频的 AI 应用需求。
本系列芯片集成了 SRAM 与 PSRAM,支持内置或外接Flash,可提供最高 4 入 2 出的 Audio Codec,VGA 像素的 DVP 摄像头接口,多达 6 路的触控检测以及 SPI、UART、USB、SDIO、I2C、I2S 等各类外设接口,丰富接口支持各类应用方案的开发。
CSK6011-NanoKit V1 是一款板载了CSK6011A纯离线模组的NanoKit开发板。
0x0 环境搭建
https://docs.listenai.com/chi...
天然支持linux好评,这里是 fedora 36 系统上的环境搭建步骤
安装 snapd,然后手动造个链接,否则后面 snap install 会报错 classic confinement requires snaps under /snap or symlink from /snap to /var/lib/snapd/snap
dnf install snapd
ln -s /var/lib/snapd/snap /snap
下载离线安装包,解压,根据 install.sh 内容执行即可
cp -f 99-lisa.rules /etc/udev/rules.d/99-lisa.rules
udevadm control --reload-rules
udevadm trigger
snap install ./csk6_integration_installer_linux_v1.6.5.snap --classic --dangerous
重新开个终端,执行 lisa info zep
即可显示环境安装情况,提示 lisa 工具有更新,Y 回车即可更新
0x1 手势识别
https://docs.listenai.com/chi...
根据文档的开发指引,下载SDK和Sample项目,这个过程比较久,挂代理会快些
- prj.conf 修改
CONFIG_WEBUSB=y
lisa zep build -b csk6011a_nano
lisa zep flash
至此一切顺利,但是烧录资源时遇到了小砍
lisa zep exec cskburn -s /dev/ttyACM0 -C 6 0x400000 ./resource/cp.bin -b 748800
lisa zep exec cskburn -s /dev/ttyACM0 -C 6 0x500000 ./resource/res.bin -b 748800
提示错误
Partition 1: 0x00400000 (751.35 KB) - ./resource/cp.bin
Waiting for device...
Entering update mode...
Detected flash size: 16 MB
Burning partition 1/1... (0x00400000, 751.35 KB)
ERROR: Failed burning partition 1
✖ Command failed : cskburn -s /dev/ttyACM0 -C 6 0x400000 ./resource/cp.bin -b 748800
› Error: Command failed : cskburn -s /dev/ttyACM0 -C 6 0x400000 ./resource/cp.bin -b 748800
检查错误帮助,退出 tio 串口程序,依然报错,总是 update 到 25% 左右失败
经过一番尝试,终于发现了问题,usb线连接在了机箱前置的蓝色usb口上,把usb线接在机箱后面的红色usb口,就可以顺利完成烧录
烧录完成后,按开发板的 reset 按钮重启
https://docs.listenai.com/chi...
根据文档,再使用另一根 typec 线接到电脑
git clone https://cloud.listenai.com/zephyr/applications/csk_view_finder_spd.git
用开源的 chromium 浏览器打开 csk_view_finder_spd/src/index.html,点击选择设备连接,就能看到摄像头检测的结果了
0x2 一些小修改
默认的画面分辨率非常低
app_algo_hsd_sample_for_csk6/src/webusb_render.c 开头有个 #define WEBUSB_IMAGE_DOWNSAMPLING (6)
改为 4 会稍微质量好些,但是 fps 会更低,改为 2 就会画面撕裂,应该是图像大了来不及传输
输出 LOG_INF
在 prj.conf 添加 CONFIG_LOG_DEFAULT_LEVEL=3 重新编译烧录,就会在串口中打印这些 LOG_INF
app_algo_hsd_sample_for_csk6/src/main.c 有个主回调函数 on_receive_hsd_result
开启 LOG_INF
[00:05:04.368,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:04.435,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:04.501,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:04.568,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:04.635,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:04.702,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:04.769,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:04.836,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:04.903,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:04.971,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:05.038,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:05.105,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:05.172,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:05.239,000] <inf> main: gesture result id: 142 ,state: 4
[00:05:05.306,000] <inf> main: gesture result id: 142 ,state: 4
可以看到手势识别的回调间隔约 70ms,但是实际 LOG_INF 有缓冲区,通常4行一起显示出来
系统的 sdk 路径
默认安装在 $HOME/snap/lisa/x1/.listenai/csk-sdk/zephyr/include/zephyr/
工具链在 $HOME/snap/lisa/x1/.listenai/lisa-zephyr/packages/node_modules/@binary/gcc-arm-none-eabi-10.3/binary/
这些路径在 lisa zep build
也会显示
0x3 TinyMaix实验
https://github.com/sipeed/Tin...
TinyMaix是面向单片机的超轻量级的神经网络推理库,即TinyML推理库,可以让你在任意单片机上运行轻量级深度学习模型~
ARM Star MCU:最高300MHz主频
以hello_world工程为基准,把 TinyMaix 整个项目代码复制过来,修改cmake引入代码
cmake_minimum_required(VERSION 3.20.0)
find_package(Zephyr REQUIRED HINTS $ENV{ZEPHYR_BASE})
project(hello_world)
aux_source_directory(${CMAKE_CURRENT_SOURCE_DIR}/src/TinyMaix/src lib_tinymaix)
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/src/TinyMaix/include)
target_sources(app PRIVATE src/TinyMaix/examples/mnist/main.c ${lib_tinymaix})
修改 TinyMaix/include/tm_port.h 移植 csk 相关的配置,屏蔽无法直接编译的 debug time 相关宏定义
#define TM_LOCAL_MATH (1) //use local math func (like exp()) to avoid libm
#define tm_malloc(x) csk_malloc(x)
#define tm_free(x) csk_free(x)
#define TM_GET_US()
#define TM_DBGT_INIT()
#define TM_DBGT_START()
#define TM_DBGT(x)
修改 prj.conf,设置更大的 heap size
CONFIG_HEAP_MEM_POOL_SIZE=300000
CONFIG_CSK_HEAP=y
CONFIG_CSK_HEAP_MEM_POOL_SIZE=842736
老套路,编译,烧录
lisa zep build -b csk6011a_nano
lisa zep flash
串口输出 mnist 推理结果,可以看到成功识别为数字2
*** Booting Zephyr OS build v1.1.1-alpha.2 ***
mnist demo
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,116,125,171,255,255,150, 93, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,169,253,253,253,253,253,253,218, 30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,169,253,253,253,213,142,176,253,253,122, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 52,250,253,210, 32, 12, 0, 6,206,253,140, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 77,251,210, 25, 0, 0, 0,122,248,253, 65, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 31, 18, 0, 0, 0, 0,209,253,253, 65, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,117,247,253,198, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 76,247,253,231, 63, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,128,253,253,144, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,176,246,253,159, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25,234,253,233, 35, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,198,253,253,141, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 78,248,253,189, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 19,200,253,253,141, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,134,253,253,173, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,248,253,253, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,248,253,253, 43, 20, 20, 20, 20, 5, 0, 5, 20, 20, 37,150,150,150,147, 10, 0,
0, 0, 0, 0, 0, 0, 0, 0,248,253,253,253,253,253,253,253,168,143,166,253,253,253,253,253,253,253,123, 0,
0, 0, 0, 0, 0, 0, 0, 0,174,253,253,253,253,253,253,253,253,253,253,253,249,247,247,169,117,117, 57, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,118,123,123,123,166,253,253,253,155,123,123, 41, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
================================ model stat ================================
mdl_type=0 (int8))
out_deq=1
input_cnt=1, output_cnt=1, layer_cnt=6
input 3dims: (28, 28, 1)
output 1dims: (1, 1, 10)
main buf size 1464; sub buf size 0
//Note: PARAM is layer param size, include align padding
Idx Layer outshape inoft outoft PARAM MEMOUT OPS
--- Input 28, 28, 1 - 0 0 784 0
###L71: body oft = 64
###L72: type=0, is_out=0, size=152, in_oft=0, out_oft=784, in_dims=[3,28,28,1], out_dims=[3,13,13,4], in_s=0.004, in_zp=-128, out_s=0.016, out_zp=-128
###L85: Conv2d: kw=3, kh=3, sw=2, sh=2, dw=1, dh=1, act=1, pad=[0,0,0,0], dmul=0, ws_oft=80, w_oft=96, b_oft=136
000 Conv2D 13, 13, 4 0 784 72 676 6084
###L71: body oft = 216
###L72: type=0, is_out=0, size=432, in_oft=784, out_oft=0, in_dims=[3,13,13,4], out_dims=[3,6,6,8], in_s=0.016, in_zp=-128, out_s=0.016, out_zp=-128
###L85: Conv2d: kw=3, kh=3, sw=2, sh=2, dw=1, dh=1, act=1, pad=[0,0,0,0], dmul=0, ws_oft=80, w_oft=112, b_oft=400
001 Conv2D 6, 6, 8 784 0 352 288 10368
###L71: body oft = 648
###L72: type=0, is_out=0, size=1360, in_oft=0, out_oft=1400, in_dims=[3,6,6,8], out_dims=[3,2,2,16], in_s=0.016, in_zp=-128, out_s=0.057, out_zp=-128
###L85: Conv2d: kw=3, kh=3, sw=2, sh=2, dw=1, dh=1, act=1, pad=[0,0,0,0], dmul=0, ws_oft=80, w_oft=144, b_oft=1296
002 Conv2D 2, 2, 16 0 1400 1280 64 4608
###L71: body oft = 2008
###L72: type=1, is_out=0, size=48, in_oft=1400, out_oft=0, in_dims=[3,2,2,16], out_dims=[1,1,1,16], in_s=0.057, in_zp=-128, out_s=0.022, out_zp=-128
003 GAP 1, 1, 16 1400 0 0 16 64
###L71: body oft = 2056
###L72: type=2, is_out=0, size=304, in_oft=0, out_oft=1448, in_dims=[1,1,1,16], out_dims=[1,1,1,10], in_s=0.022, in_zp=-128, out_s=0.151, out_zp=42
###L96: FC: ws_oft=64, w_oft=104, b_oft=264
004 FC 1, 1, 10 0 1448 240 10 160
###L71: body oft = 2360
###L72: type=3, is_out=1, size=48, in_oft=1448, out_oft=0, in_dims=[1,1,1,10], out_dims=[1,1,1,10], in_s=0.151, in_zp=42, out_s=0.004, out_zp=-128
005 Softmax 1, 1, 10 1448 0 0 10 60
Total param ~1.9 KB, OPS ~0.02 MOPS, buffer 1.4 KB
0: 0.004
1: 0.004
2: 0.996
3: 0.004
4: 0.000
5: 0.000
6: 0.004
7: 0.004
8: 0.004
9: 0.004
### Predict output is: Number 2, prob 0.996
0x4 总结
- 开发环境原生支持 Windows/Linux/macOS 好评
- csk 文档详细,示例demo上手难度低,可用性好
- 期待开放更多底层的技术细节,比如自定义模型的编译等
- qwqwqwq