Alex_bG9Qau · 2022年12月04日 · 浙江

【聆思CSK6 视觉AI开发套件试用】头肩检测&多模态交互初体验

背景

前段时间,忘记在哪个渠道,看到有聆思CSK6系列芯片视觉AI开发套件的试用活动,刚好最近想做个视觉AI方案的调研,就直接申请了。运气比较好,第一批就申请到了,只是最近比较忙,一直没来得及上手。再不抓紧,板子要被收回了,所以趁着周末,花了点时间体验了一下。感谢极术社区组织这么好的活动,让大家可以低成本的体验到视觉AI能力,👍👍👍

开发环境搭建

我使用了一台全新的win10系统笔记本电脑,参考环境搭建 | 聆思文档中心 ,第一步先下载 Windows 上的 Git,具体过程不再赘述。第二步下载 CSK6一键安装包 并安装即可。
开发环境搭建成功之后,执行lisa info zep 命令,可以看到类似下述信息即表示全部成功:

Operating System - Windows 10 Enterprise LTSC 2019, version 10.0.17763 x64

@listenai/lisa - 2.4.5

Account - 未登录或登录已过期

Node.js environment
Node.js - v16.14.0
npm - 8.3.1
yarn - 1.22.19

Global environment
git - git version 2.38.0.windows.1

Plugin info
zep - 1.6.5 (latest: 1.6.8)

Plugin environment
env - csk6
west - West version: v0.14.0
venv - Python 3.9.7
cmake - cmake version 3.21.4
dtc - Version: DTC 1.6.0-dirty
gperf - GNU gperf 3.1
mklfs - v1.0.0 (3640bfb)
ninja - 1.10.2
protoc - libprotoc 3.19.1
xz - xz (XZ Utils) 5.2.5
cskburn - v1.18.1 (265)
zephyr-sdk-0.14.2 - arm-zephyr-eabi-gcc (Zephyr SDK 0.14.2) 10.3.0
gcc-arm-none-eabi-10.3 - arm-none-eabi-gcc (GNU Arm Embedded Toolchain 10.3-2021.10) 10.3.1 20210824 (release)
jlink-venus - V7.58
ZEPHYR_BASE - D:\ListenAI\csk-sdk\zephyr (版本: v1.1.1-alpha.2, commit: 3e15ca75bd)
PLUGIN_HOME - D:\ListenAI\lisa-zephyr
VIRTUAL_ENV - D:\ListenAI\lisa-zephyr\venv
ZEPHYR_TOOLCHAIN_VARIANT - zephyr
ZEPHYR_SDK_INSTALL_DIR - D:\ListenAI\lisa-zephyr\packages\node_modules\@binary\zephyr-sdk-0.14.2\binary
GNUARMEMB_TOOLCHAIN_PATH - D:\ListenAI\lisa-zephyr\packages\node_modules\@binary\gcc-arm-none-eabi-10.3\binary                                   

demo试跑

本次体验,我选择的是头肩&手势识别

下载sample

首先,在一个不带空格和中文的目录下使用git命令下载sample:

git clone https://cloud.listenai.com/zephyr/applications/app_algo_hsd_sample_for_csk6.git

修改代码

修改on_receive_hsd_result(...)回调函数的代码便于更直观的看到识别结果:

const char *gesture_human_string(int gesture_state)
{
    switch (gesture_state) {
    case GESTURE_OTHER:
        return "OTHRE";
        break;
    case GESTURE_LIKE:
        return "LIKE";
        break;
    case GESTURE_OK:
        return "OK";
        break;
    case GESTURE_STOP:
        return "STOP";
        break;
    case GESTURE_YES:
        return "YES";
        break;
    case GESTURE_SIX:
        return "SIX";
        break;
    default:
        return "OTHER";
        break;
    }
}

void on_receive_hsd_result(hsd_t *hsd, hsd_event event, void *data, void *user_data)
{
    if (event == HSD_EVENT_HEAD_SHOULDER) {
        hsd_head_shoulder_detect *result = (hsd_head_shoulder_detect *)data;
        if (result->track_count > 0)
            printk("detectedhead shoulder cnt: %d\n", result->track_count);
#ifdef CONFIG_WEBUSB
        webusb_handle_hs_data(result);
#endif
    } else if (event == HSD_EVENT_GESTURE_RECOGNIZE) {
        head_shoulder_detect *result = (head_shoulder_detect *)data;
        if (result->gesture_state != GESTURE_OTHER) {
            printk("gesture result: %s\n", gesture_human_string(result->gesture_state));
        }
#ifdef CONFIG_WEBUSB
        webusb_handle_gesture_data(result);
#endif
    }
}

主要改动2点:

  • 过滤掉无效识别的打印,避免串口打印太频繁
  • 增加gesture_human_string函数用来将收拾识别结果中的数字替换为英文,更好的查看识别结果。

编译固件

只要代码改动没有错误,使用lisa zep build -b csk6011a_nano命令,即可编译出最终的固件。

[206/214] Linking C executable zephyr\zephyr_pre0.elf

[209/214] Linking C executable zephyr\zephyr_pre1.elf

[214/214] Linking C executable zephyr\zephyr.elf
Memory region         Used Size  Region Size  %age Used
FLASH:      214228 B        16 MB      1.28%
SRAM:      147856 B       320 KB     45.12%
ITCM:        5532 B        16 KB     33.76%
DTCM:          0 GB        16 KB      0.00%
PSRAMAP:     3108136 B      3968 KB     76.49%
IDT_LIST:          0 GB         2 KB      0.00%
    
√ 构建成功    

如上所示,没有看到编译错误,并且最后看到√ 构建成功字眼即表示编译通过。

硬件连接

通过阅读CSK6011-NanoKit V1 | 聆思文档中心可以知道,开发板板载了DAPLink调试器芯片,开发者可通过DAPLink USB接口对CSK6芯片进行 固件烧录代码仿真串口查看。因此我们使用套件自带的type-c线,type-c口接板子的DAPLINK USB,type-A口接电脑即可进行固件烧录和串口查看了。

烧录固件

使用lisa zep flash命令先烧录刚编译出来的固件。命令执行后,可以看到烧录进度,最后看到√ 结束字眼即表示烧录成功。但是我烧录过程有下面的报错,但整体是成功的,就没有再纠结。

PS D:\ListenAI\project\app_algo_hsd_sample_for_csk6> lisa zep flash

The module for runner "csk" could not be imported (No module named 'termios'). This most likely means it is not handling its dependencies properly. Please report this to the zephyr developers.
-- west flash: rebuilding
ninja: no work to do.
-- west flash: using runner pyocd
-- runners.pyocd: Flashing file: D:\ListenAI\project\app_algo_hsd_sample_for_csk6\build\zephyr\zephyr.hex
Exception ignored in: <function Library.__del__ at 0x00000208C958DCA0>
Traceback (most recent call last):
  File "D:\ListenAI\lisa-zephyr\venv\lib\site-packages\pylink\library.py", line 299, in __del__
    self.unload()
  File "D:\ListenAI\lisa-zephyr\venv\lib\site-packages\pylink\library.py", line 458, in unload
    os.remove(self._temp.name)
PermissionError: [WinError 32] 另一个程序正在使用此文件,进程无法访问。: 'C:\\Users\\gyl\\AppData\\Local\\Temp\\tmpqlfp78ig.dll'
Exception ignored in: <function Library.__del__ at 0x00000208C958DCA0>
Traceback (most recent call last):
  File "D:\ListenAI\lisa-zephyr\venv\lib\site-packages\pylink\library.py", line 299, in __del__
    self.unload()
  File "D:\ListenAI\lisa-zephyr\venv\lib\site-packages\pylink\library.py", line 458, in unload
    os.remove(self._temp.name)
PermissionError: [WinError 32] 另一个程序正在使用此文件,进程无法访问。: 'C:\\Users\\gyl\\AppData\\Local\\Temp\\tmpuxg6k6nq.dll'
0001561 I Loading D:\ListenAI\project\app_algo_hsd_sample_for_csk6\build\zephyr\zephyr.hex at 0x18000000 [load_cmd]
[==================================================] 100%
0012743 I Erased 0 bytes (0 sectors), programmed 0 bytes (0 pages), skipped 214272 bytes (837 pages) at 18.96 kB/s [loader]

√ 结束   

烧录模型

使用下面的命令烧录2个模型(其中,COM3需要根据自己的串口进行修改):

lisa zep exec cskburn -s \\.\COM3 -C 6 0x400000 .\resource\cp.bin -b 748800
lisa zep exec cskburn -s \\.\COM3 -C 6 0x500000 .\resource\res.bin -b 748800

效果体验

上述固件和模型都烧录完之后,按USB旁边的RSET按键即可上电开机。上电打印如下:

*** Booting Zephyr OS build e32aeccd52a1  ***
[00:00:00.009,000] <err> main: device:UART_1 is ready!
- Device name: DVPI
GET_PARAMS: 5001 0.400000[00:00:03.164,000] <inf> hsd: Setup resource [head_thinker] which in <0x18500060,0xa6ce0>
[00:00:03.164,000] <inf> hsd: Setup resource [gesture_thinker] which in <0x185a6d40,0x135d40>
[00:00:03.336,000] <inf> hsd: fmt: [VYUY] width [640] height [480]
[00:00:03.336,000] <inf> hsd: Alloc video buffer: 921600
[00:00:03.336,000] <inf> hsd: Alloc video buffer: 921600
[00:00:03.336,000] <inf> hsd: Alloc video buffer: 921600

当有识别结果后,打印如下所示:

detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
gesture result: LIKE
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
gesture result: SIX
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
gesture result: STOP
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
detectedhead shoulder cnt: 1
gesture result: STOP
detectedhead shoulder cnt: 1
gesture result: STOP
detectedhead shoulder cnt: 1
gesture result: STOP
detectedhead shoulder cnt: 1
gesture result: STOP
detectedhead shoulder cnt: 1
gesture result: STOP
detectedhead shoulder cnt: 1
gesture result: STOP
detectedhead shoulder cnt: 1
gesture result: STOP
detectedhead shoulder cnt: 1
gesture result: STOP
detectedhead shoulder cnt: 1
gesture result: STOP
detectedhead shoulder cnt: 1
gesture result: STOP
detectedhead shoulder cnt: 1
gesture result: STOP

我修改后的代码,只有在设别到头肩和手势时,才会触发回调函数里的打印。通过上述log可以看到,摄像头里一直只有1个头,我分别尝试了LIKE、SIX、STOP这3个手势。
如果需要通过预览方式实时查看摄像头图像,需要另外找一根type-c,将两一个type-c口接到电脑上,然后参考PC查看工具指引操作即可。注意使用该功能前,需要使能WEBUSB(prj.conf中将CONFIG_WEBUSB=n改为CONFIG_WEBUSB=y)后,重新编译固件烧录。

多模态交互

到这里,一个完整的体验就结束了,但总感觉少了点什么,拜读了其他人的体验贴,大多都是结合某个场景做了更深的功能融合。刚好我手里有一块以语音能力为主的开发板,可以进行语音唤醒、语音交互、音乐播放等控制,就想着两者结合起来,实现一个(多模态交互)的初级demo,通过STOP手势来停止音乐播放,LIKE手势收藏音乐,OK手势切换下一首。调试过程中,在头肩&手势识别的sample里,参考UART sample,将UART1 rx、tx配置为PA9、PA10这组GPIO,使用串口板接到我的语音交互开发板上,串口没有任何数据出来,接到PC端串口工具,也收不到数据。由于时间比较紧张,临时使用了打印串口做了个demo,感觉还比较有意思,可以为一些智能设备增加一个全新交互方式,提高可玩性,增强用户粘性。。。核心代码也比较简单,就是一个手势过滤去重加控制指令的串口通信:

void business_uart_action(int gesture_state)
{
    static int last_gesture_state = GESTURE_OTHER;
    if (last_gesture_state != gesture_state) {
        last_gesture_state = gesture_state;

        //为数据保密真正的串口指令数据,没有贴出来
        switch (gesture_state) {
        case GESTURE_LIKE:
            send_control_data(business_uart_dev, favorite_music, sizeof(favorite_music), SYS_FOREVER_US);
            break;
        case GESTURE_OK:
            send_control_data(business_uart_dev, next_music, sizeof(next_music), SYS_FOREVER_US);
            break;
        case GESTURE_STOP:
            send_control_data(business_uart_dev, stop_music, sizeof(stop_music), SYS_FOREVER_US);
            break;
        case GESTURE_YES:
        case GESTURE_OTHER:
        case GESTURE_SIX:
        default:
            break;
        }
    }
}

void on_receive_hsd_result(hsd_t *hsd, hsd_event event, void *data, void *user_data)
{
    if (event == HSD_EVENT_HEAD_SHOULDER) {
        hsd_head_shoulder_detect *result = (hsd_head_shoulder_detect *)data;
        if (result->track_count > 0)
            printk("detectedhead shoulder cnt: %d\n", result->track_count);
#ifdef CONFIG_WEBUSB
        webusb_handle_hs_data(result);
#endif
    } else if (event == HSD_EVENT_GESTURE_RECOGNIZE) {
        head_shoulder_detect *result = (head_shoulder_detect *)data;
        if (result->gesture_state != GESTURE_OTHER) {
            printk("gesture result: %s\n", gesture_human_string(result->gesture_state));
            business_uart_action(result->gesture_state);
        }
#ifdef CONFIG_WEBUSB
        webusb_handle_gesture_data(result);
#endif
    }
}

关于UART1无法首发数据的问题暂时没有再继续研究,先留个TODO项,把体验文章发出来,后面再找问题吧。

总结

一套体验走下来,发现识别速度还是非常快的,也很灵敏。同时,也发现STOP和LIKE两个手势的误识别比较严重,总体确实处于demo的水准,离商业化还有一定的差距。也期待后续的语音识别应用的上线,👍👍👍

推荐阅读
关注数
5165
内容数
99
聆思科技官方专栏,专注AIOT芯片,持续分享有趣的解决方案。商务合作微信:listenai-csk 技术交流QQ群:825206462
目录
极术微信服务号
关注极术微信号
实时接收点赞提醒和评论通知
安谋科技学堂公众号
关注安谋科技学堂
实时获取安谋科技及 Arm 教学资源
安谋科技招聘公众号
关注安谋科技招聘
实时获取安谋科技中国职位信息