您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2023, Vol. 58 ›› Issue (9): 59-70.doi: 10.6040/j.issn.1671-9352.0.2022.349

•   • 上一篇    下一篇

一种用于航拍图像的目标检测算法

李程1,2(),车文刚1,2,*(),高盛祥1,2   

  1. 1. 昆明理工大学信息工程与自动化学院, 云南 昆明 650500
    2. 昆明理工大学云南省计算机技术应用重点实验室, 云南 昆明 650500
  • 收稿日期:2022-06-23 出版日期:2023-09-20 发布日期:2023-09-08
  • 通讯作者: 车文刚 E-mail:841364941@qq.com;wgche@qq.com
  • 作者简介:李程(1993—),男,硕士研究生,研究方向为计算机视觉.E-mail: 841364941@qq.com
  • 基金资助:
    国家自然科学基金资助项目(61972186);国家自然科学基金资助项目(U21B2027)

A object detection algorithm for aerial images

Cheng LI1,2(),Wengang CHE1,2,*(),Shengxiang GAO1,2   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
    2. Computer Technology Application Key Laboratory of Yunnan Province, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
  • Received:2022-06-23 Online:2023-09-20 Published:2023-09-08
  • Contact: Wengang CHE E-mail:841364941@qq.com;wgche@qq.com

摘要:

提出了一种用于航拍图像的目标检测算法DSB-YOLO(depthwise separable convolutional backbone and YOLO)。在YOLOv5s的基础上, 首先从主干网提取特征图感受野的角度出发, 通过改变卷积核的间隔采样, 降低特征图的感受野以便更好地提取小目标的信息; 其次, 改进了网络Neck部分的特征金字塔模型(feature pyramid network, FPN)和路径聚合网络(path aggregation network, PAN)的特征融合路径, 从而使网络浅层采样的特征图中大量位置信息能够与网络深层提取的特征图较好地结合在一起, 有效地提高了小目标的准确检出率; 接着将C3Transformer模块加入到主干网络中, 用来整合全图信息; 然后, 对网络进行了轻量化处理, 把网络主干的部分卷积改为深度可分离卷积并集成了SE注意力机制, 其目的是聚焦并选择对目标检测任务有用的信息, 从而提升了模型的检测效率。利用VisDrone数据集进行的对比实验结果表明, 在输入图像分辨率为1 280×1 280像素时, 本文提出的DSB-YOLO算法测试平均精度指标mAP50、mAP0.5 ∶0.95与原模型相比, 分别提升了11%和17.5%;部署在嵌入式平台Jetson TX2上的运算速率可以达到21FPS, 模型性能达到适用标准。

关键词: 计算机视觉, 航拍图像目标检测, 深度学习

Abstract:

A object detection algorithm DSB-YOLO (depthwise separable convolutional backbone and YOLO) for aerial images is proposed. Based on YOLOv5s, firstly, from the perspective of extracting the perceptual field of the feature map from the backbone network, the perceptual field of the feature map is reduced by changing the interval sampling of the convolutional kernel to better extract the information of small objects. Secondly, the feature pyramid network (FPN) and path aggregation network (PAN) feature fusion paths in the Neck part of the network are improved, so that the large amount of location information in the shallow sampled feature maps can be better combined with the deep extracted feature maps of the network. This allows the network to combine the large amount of location information in the shallow sampled feature map with the deep extracted feature map, effectively improving the accurate detection rate of small objects. The C3Transformer module was then added to the backbone network to integrate the full image information; then, the network was lightened by replacing the partial convolution of the network backbone with a depth-separable convolution and integrating the SE attention mechanism, which aims to focus and select the information useful for the object detection task, thus improving the detection efficiency of the model. Comparative experimental results using the VisDrone dataset show that, at an input image resolution of 1 280×1 280 pixels, the DSB-YOLO algorithm proposed in this paper tests average accuracy metrics mAP50 and mAP0.5 ∶0.95 that are 11% and 17.5% higher, respectively, compared to the original model; Deployed on the embedded platform Jetson TX2, computing rates of up to 21FPS can be achieved and model performance meets applicable standards.

Key words: computer vision, object detection in aerial image, deep learning

中图分类号: 

  • TP391.41

图1

YOLOv5s算法框架"

图2

提取特征图的过程"

图3

目标宽高散点图"

表1

主干网感受野分析表"

主干网 特征层 特征图大小/ 像素 感受野大小/像素主干网 特征层 特征图大小/ 像素 感受野大小/像素
YOLOv5s的主干网(输入图像尺寸640×640像素)P1(F=6, S=2, P=2) 320×320 2 DSB-YOLO的主干网(输入图像尺寸640×640像素)P1(F=6, S=2, P=2) 320×320 2
P2(F=3, S=2) 160×160 6 P2(F=3, S=1, P=1) 320×320 6
P3(F=3, S=2) 80×80 14 P3(F=3, S=2, P=1) 160×160 10
P4(F=3, S=2) 40×40 30 P4(F=3, S=2, P=1) 80×80 18
P5(F=3, S=2) 20×20 62 P5(F=3, S=2, P=1) 40×40 34

图4

DSB-YOLO算法框架"

图5

Transformer编码模块结构"

图6

C3Transformer结构"

图7

深度可分离卷积"

图8

SE通道注意力模块"

图9

加入SE注意力机制的深度可分离卷积"

图10

DSB-YOLO算法结构"

图11

数据集图像"

表2

DSB-YOLO算法在VisDrone数据集上的消融实验"

序号 算法 mAP50/% mAP/% FPS BFLOPs
A YOLOv5s 33.2 17.5 47.2 15.9
B A+C3Transformer 36.0 18.1 43.0 16.7
C B+主干网改进 40.2 21.5 25.0 65.2
D C+Neck部分改进 41.2 22.3 25.0 67.5
D+轻量化改进 42.3 24.8 28.4 51.4
E A+主干网改进 37.4 19.9 29.2 64.4
F E+轻量化改进 38.4 20.7 34.2 48.5
G F+Neck部分改进 41.4 21.5 34.2 50.8
H G+C3Transformer 42.3 24.8 28.4 51.4

表3

DSB-YOLO算法与YOLOv5s在VisDrone数据集上的对比实验"

输入图像大小/像素 算法 mAP50/% mAP/%
1 280×1 280 YOLOv5s 48.1 27.5
DSB-YOLO 53.4 32.3

图12

密集小目标检测对比"

图13

高空小目标检测对比"

图14

遮挡目标检测对比"

表4

不同算法在VisDrone数据集上的检测性能"

算法 mAP50/% mAP/%
DSYolov3[11] 44.5 22.3
SyNet[21] 48.4 25.1
SAMFR[22] 40.0 20.2
SlimYOLOv3[12] - 23.9
Zhang et al[23] 45.2 22.6
DSB-YOLO 53.4 32.3
1 GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
2 REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
3 LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin: Springer-Verlag, 2016: 21-37.
4 ZHU P, WEN L, XIAO B, et al. Vision meets drones: a challenge[J/OL]. arXiv, 2018. https://arxiv.org/abs/1804.07437.
5 ZHANG H, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[J/OL]. arXiv, 2017. https://arxiv.org/pdf/1710.09412.pdf.
6 BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOV4: optimal speed and accuracy of object detection[J/OL]. arXiv, 2020. https://arxiv.org/abs/2004.10934.
7 HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[J/OL]. arXiv, 2017. https://arxiv.org/abs/1704.04861v1.
8 TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10778-10787.
9 HAN K, WANG Y H, TIAN Q, et al. Ghostnet: more features from cheap operations[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1580-1589.
10 REDMON J, FARHADI A. YOLOV3: an incremental improvement[J/OL]. arXiv, 2018. https://arxiv.org/abs/1804.02767.
11 LI Z , LIU X , ZHAO Y , et al. A lightweight multi-scale aggregated model for detecting aerial images captured by UAVs[J]. Journal of Visual Communication and Image Representation, 2021, 77 (1): 103058.
12 ZHANG P, ZHONG Y, LI X. SlimYOLOv3: narrower, faster and better for real-time UAV applications[C]//Poceedings of Computer Vision and Pattern Recognition(CVPR). Seoul: IEEE, 2019.
13 DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[J/OL]. arXiv, 2020. https://arxiv.org/pdf/2010.11929v2.pdf.
14 查俊伟, 张洪艳. 动态感受野特征选择去雾网络[J]. 电子科技, 2023, 36 (7): 1- 8.
ZHA Junwei , ZHANG Hongyan . Dynamic receptive field feature selection dehazing network[J]. Electronic Science and Technology, 2023, 36 (7): 1- 8.
15 李翠平, 李仲学, 余东明. 基于泰森多边形法的空间品位插值[J]. 辽宁工程技术大学学报, 2007, 26 (4): 488- 491.
doi: 10.3969/j.issn.1008-0562.2007.04.003
LI Cuiping , LI Zhongxue , YU Dongming . Ore grade interpolation based on Thiessen polygon method[J]. Journal of Liaoning Technical University, 2007, 26 (4): 488- 491.
doi: 10.3969/j.issn.1008-0562.2007.04.003
16 WANG C Y, LIAO H, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle: IEEE, 2020.
17 CUI C, GAO T, WEI S, et al. PP-LCNet: a lightweight CPU convolutional neural network[J/OL]. arXiv, 2021. https://arxiv.org/abs/2109.15099v1.
18 JIE H, LI S, GANG S. Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018.
19 肖顺亮, 强赞霞, 刘卫光. 基于CSP改进用于拥挤情况的行人检测算法[J]. 计算机技术与发展, 2021, 31 (7): 52- 58.
XIAO Shunliang , QIANG Zanxia , LIU Weiguang . An improved pedestrian detection algorithm for crowd based on CSP[J]. Computer Technology and Development, 2021, 31 (7): 52- 58.
20 JIANG J H , FU X J , QIN R , et al. High-speed lightweight ship detection algorithm based on YOLO-V4 for three-channels RGB SAR image[J]. Remote Sensing, 2021, 13 (10): 1909.
doi: 10.3390/rs13101909
21 ALBABA B M, OZER S. SyNet: an ensemble network for object detection in UAV images[C]//Proceedings of Computer Vision and Pattern Recognition(CVPR). Milan: IEEE, 2020.
22 WANG H, WANG Z, JIA M, et al. Spatial attention for multi-scale feature refinement for object detection[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul: IEEE, 2019.
23 ZHANG X, IZQUIERDO E, CHANDRAMOULI K. Dense and small object detection in UAV vision based on cascade network[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul: IEEE, 2019.
[1] 仲诚诚,周恒,张梓童,张春雷. LAC-UNet:基于胶囊表达局部-整体特征关系的语义分割模型[J]. 《山东大学学报(理学版)》, 2023, 58(11): 116-126.
[2] 徐菲菲,许赟杰. 基于Arc-LSTM的人职匹配研究[J]. 《山东大学学报(理学版)》, 2021, 56(1): 83-90.
[3] 郝长盈,兰艳艳,张海楠,郭嘉丰,徐君,庞亮,程学旗. 基于拓展关键词信息的对话生成模型[J]. 《山东大学学报(理学版)》, 2019, 54(7): 68-76.
[4] 刘飚,路哲,黄雨薇,焦萌,李泉其,薛瑞. 神经网络结构在功耗分析中的性能对比[J]. 《山东大学学报(理学版)》, 2019, 54(1): 60-66.
[5] 庞博,刘远超. 融合pointwise及深度学习方法的篇章排序[J]. 山东大学学报(理学版), 2018, 53(3): 30-35.
[6] 刘明明,张敏情,刘佳,高培贤. 一种基于浅层卷积神经网络的隐写分析方法[J]. 山东大学学报(理学版), 2018, 53(3): 63-70.
[7] 刘铭, 昝红英, 原慧斌. 基于SVM与RNN的文本情感关键句判定与抽取[J]. 山东大学学报(理学版), 2014, 49(11): 68-73.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 何海伦, 陈秀兰*. 变性剂和缓冲系统对适冷蛋白酶MCP-01和中温蛋白酶BP-01构象影响的圆二色光谱分析何海伦, 陈秀兰*[J]. 山东大学学报(理学版), 2013, 48(1): 23 -29 .
[2] 孙小婷1,靳岚2*. DOSY在寡糖混合物分析中的应用[J]. J4, 2013, 48(1): 43 -45 .
[3] 杨莹,江龙*,索新丽. 容度空间上保费泛函的Choquet积分表示及相关性质[J]. J4, 2013, 48(1): 78 -82 .
[4] 廖明哲. 哥德巴赫的两个猜想[J]. J4, 2013, 48(2): 1 -14 .
[5] 赵同欣1,刘林德1*,张莉1,潘成臣2,贾兴军1. 紫藤传粉昆虫与花粉多型性研究[J]. 山东大学学报(理学版), 2014, 49(03): 1 -5 .
[6] 王开荣,高佩婷. 建立在DY法上的两类混合共轭梯度法[J]. 山东大学学报(理学版), 2016, 51(6): 16 -23 .
[7] 唐风琴1,白建明2. 一类带有广义负上限相依索赔额的风险过程大偏差[J]. J4, 2013, 48(1): 100 -106 .
[8] 程智1,2,孙翠芳2,王宁1,杜先能1. 关于Zn的拉回及其性质[J]. J4, 2013, 48(2): 15 -19 .
[9] 汤晓宏1,胡文效2*,魏彦锋2,蒋锡龙2,张晶莹2,. 葡萄酒野生酿酒酵母的筛选及其生物特性的研究[J]. 山东大学学报(理学版), 2014, 49(03): 12 -17 .
[10] 冒爱琴1, 2, 杨明君2, 3, 俞海云2, 张品1, 潘仁明1*. 五氟乙烷灭火剂高温热解机理研究[J]. J4, 2013, 48(1): 51 -55 .