【“星睿O6”AI PC】Ultra Fast Lane Detection V2 NPU部署

文章原名【“星睿O6”AI PC开发套件评测】Ultra Fast Lane Detection V2 NPU部署

模型详情

模型介绍

传统的车道线检测方法通常依赖于像素级分割，在严重遮挡或极端光照等复杂条件下，效率和性能面临挑战。Ultra-Fast-Lane-Detection-V2 采用了一种受人类感知启发的新方法，利用上下文和全局信息。该方法将车道线检测表述为基于锚点的序数分类问题，利用全局特征。它通过稀疏坐标在混合锚点上表示车道线，大幅降低了计算成本，实现了超快速度。其大感受野使其在复杂场景下也能实现鲁棒检测，在速度和精度上均达到业界领先水平。
基础模型实现见这里。

论文链接：Ultra Fast Deep Lane Detection with Hybrid Anchor Driven Ordinal Classification

Ultra-Fast-Lane-Detection-V2和V1对比

特性/模型	Ultra-Fast-Lane-Detection-V2	Ultra-Fast-Lane-Detection-V1
车道表示方式	使用混合锚点系统（行锚点和列锚点）表示车道，通过稀疏坐标建模车道位置	使用行选择方式，将车道表示为预定义行上的位置选择
分类与回归	采用序数分类方法，利用分类的序数关系和数学期望进行车道定位	采用分类方式，未涉及序数分类，而是通过结构损失函数优化车道的连续性和形状
数据集与性能	在四个数据集（TuSimple、CULane、CurveLanes、LLAMAS）上进行了测试，性能和速度均达到了 SOTA 水平	主要在 TuSimple 和 CULane 数据集上进行了测试，性能和速度也达到了 SOTA 水平
模型复杂度	通过混合锚点和序数分类进一步降低了模型复杂度，适合轻量级部署	通过行选择和结构损失函数优化模型，复杂度相对较低，但未涉及混合锚点和序数分类
主要优点	混合锚点系统有效解决了单一锚点系统在不同车道类型上的定位误差问题，序数分类提升了定位精度	全局特征和结构损失函数使其在车道结构建模方面具有独特优势
速度	轻量级版本速度可达 300+ FPS	轻量级版本速度可达 322.5 FPS
适用场景	适合需要高精度和实时性的车道检测任务，尤其是在复杂场景下（如严重遮挡、极端光照条件）	适合需要快速部署和实时处理的车道检测任务，尤其是在无视觉线索的场景下

模型基本信息

领域：车道线检测
模型来源：ufldv2_culane_res34_320x1600
二进制模型：[Ultra-Fast-Lane-Detection-V2.cix]()
输入：1x3x320x1600
输出：
- loc_row: (1, 200, 72, 4)
- exist_row: (1, 2, 72, 4)
- loc_col: (1, 100, 81, 4)
- exist_col: (1, 2, 81, 4)
参数量：216.40 M
模型大小：826 M

量化模型并导出为设备端二进制

cfg配置

[Common]
mode = build

[Parser]
model_type = onnx
model_name = Ultra-Fast-Lane-Detection-v2
detection_postprocess = 
model_domain = image_classification
input_model = model/Ultra-Fast-Lane-Detection-v2.onnx
output_dir = ./out_v2
input_shape = [1,3,320,1600]
input = input

[Optimizer]
calibration_data = datasets/cal_v2.npy
calibration_batch_size = 1
metric_batch_size = 1
output_dir = ./out_v2
dataset = numpydataset
quantize_method_for_activation = per_tensor_asymmetric
quantize_method_for_weight = per_channel_symmetric_restricted_range
dump_dir = ./
save_statistic_info = True
weight_bits = 8
bias_bits = 32
activation_bits = 8
cast_dtypes_for_lib = True

[GBuilder]
target = X2_1204MP3
outputs = Ultra-Fast-Lane-Detection-V2.cix
tiling = fps
profile = True

编译运行

cixbuild cfg/Ultra-Fast-Lane-Detection-V2.cfg

模型编译成功

NPU 上推理

在 SOC 上运行二进制模型。
首先将编译好的 Ultra-Fast-Lane-Detection-V2.cix、test_data 和 inference_npu_v2.py 拷贝到 SOC 上，然后运行 inference_npu_v2.py 脚本。

python3 inference_npu_v2.py  --onnx_path ./Ultra-Fast-Lane-Detection-V2.cix

部分代码

input_data, original_img = preprocess_image_ufld_v2(img_path, target_size=(IMG_WIDTH, IMG_HEIGHT), crop_ratio=CROP_RATIO)
ori_h, ori_w = original_img.shape[:2]
datas.append(input_data)

# Inference
input_data = [input_data]
outputs = model.forward(input_data)
outputs_dict = {
    'loc_row': outputs[0],
    'loc_col': outputs[1],
    'exist_row': outputs[2],
    'exist_col': outputs[3]
}
print(f"NPU inference time: {model.get_cur_dur()*1000:.2f}ms") 
outputs_dict['loc_row'] = np.reshape(outputs_dict['loc_row'], (1, 200, 72, 4))
outputs_dict['exist_row'] = np.reshape(outputs_dict['exist_row'], (1, 2,72, 4))
outputs_dict['loc_col'] = np.reshape(outputs_dict['loc_col'], (1, 100, 81, 4))
outputs_dict['exist_col'] = np.reshape(outputs_dict['exist_col'], (1, 2, 81, 4))

# Post-process - Pass CROP_RATIO
lane_coords = post_process_ufld_v2(
    outputs_dict, 
    (ori_h, ori_w), 
    CROP_RATIO, 
    NUM_ROW,  # Pass NUM_ROW
    NUM_COL,  # Pass NUM_COL
    ROW_ANCHOR,  # Pass ROW_ANCHOR
    COL_ANCHOR  # Pass COL_ANCHOR
)
# Draw results
result_img = draw_lanes_v2(original_img, lane_coords)

# Save output
output_filename = os.path.join(output_dir, "npu_v2_" + os.path.basename(img_path))
cv2.imwrite(output_filename, result_img)