一种用于航拍图像的目标检测算法

doi:10.6040/j.issn.1671-9352.0.2022.349

摘要/Abstract

摘要：

提出了一种用于航拍图像的目标检测算法DSB-YOLO(depthwise separable convolutional backbone and YOLO)。在YOLOv5s的基础上, 首先从主干网提取特征图感受野的角度出发, 通过改变卷积核的间隔采样, 降低特征图的感受野以便更好地提取小目标的信息; 其次, 改进了网络Neck部分的特征金字塔模型(feature pyramid network, FPN)和路径聚合网络(path aggregation network, PAN)的特征融合路径, 从而使网络浅层采样的特征图中大量位置信息能够与网络深层提取的特征图较好地结合在一起, 有效地提高了小目标的准确检出率; 接着将C3Transformer模块加入到主干网络中, 用来整合全图信息; 然后, 对网络进行了轻量化处理, 把网络主干的部分卷积改为深度可分离卷积并集成了SE注意力机制, 其目的是聚焦并选择对目标检测任务有用的信息, 从而提升了模型的检测效率。利用VisDrone数据集进行的对比实验结果表明, 在输入图像分辨率为1 280×1 280像素时, 本文提出的DSB-YOLO算法测试平均精度指标mAP50、mAP0.5 ∶0.95与原模型相比, 分别提升了11%和17.5%;部署在嵌入式平台Jetson TX2上的运算速率可以达到21FPS, 模型性能达到适用标准。

关键词: 计算机视觉, 航拍图像目标检测, 深度学习

Abstract:

A object detection algorithm DSB-YOLO (depthwise separable convolutional backbone and YOLO) for aerial images is proposed. Based on YOLOv5s, firstly, from the perspective of extracting the perceptual field of the feature map from the backbone network, the perceptual field of the feature map is reduced by changing the interval sampling of the convolutional kernel to better extract the information of small objects. Secondly, the feature pyramid network (FPN) and path aggregation network (PAN) feature fusion paths in the Neck part of the network are improved, so that the large amount of location information in the shallow sampled feature maps can be better combined with the deep extracted feature maps of the network. This allows the network to combine the large amount of location information in the shallow sampled feature map with the deep extracted feature map, effectively improving the accurate detection rate of small objects. The C3Transformer module was then added to the backbone network to integrate the full image information; then, the network was lightened by replacing the partial convolution of the network backbone with a depth-separable convolution and integrating the SE attention mechanism, which aims to focus and select the information useful for the object detection task, thus improving the detection efficiency of the model. Comparative experimental results using the VisDrone dataset show that, at an input image resolution of 1 280×1 280 pixels, the DSB-YOLO algorithm proposed in this paper tests average accuracy metrics mAP50 and mAP0.5 ∶0.95 that are 11% and 17.5% higher, respectively, compared to the original model; Deployed on the embedded platform Jetson TX2, computing rates of up to 21FPS can be achieved and model performance meets applicable standards.

Key words: computer vision, object detection in aerial image, deep learning

中图分类号:

TP391.41

李程,车文刚,高盛祥. 一种用于航拍图像的目标检测算法[J]. 《山东大学学报(理学版)》, 2023, 58(9): 59-70.

Cheng LI,Wengang CHE,Shengxiang GAO. A object detection algorithm for aerial images[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(9): 59-70.

图/表 18

图1

图2

图3

表1

图4

图5

图6

图7

图8

图9

图10

图11

表2

表3

图12

图13

图14

表4

参考文献 23

1	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
2	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
3	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin: Springer-Verlag, 2016: 21-37.
4	ZHU P, WEN L, XIAO B, et al. Vision meets drones: a challenge[J/OL]. arXiv, 2018. https://arxiv.org/abs/1804.07437.
5	ZHANG H, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[J/OL]. arXiv, 2017. https://arxiv.org/pdf/1710.09412.pdf.
6	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOV4: optimal speed and accuracy of object detection[J/OL]. arXiv, 2020. https://arxiv.org/abs/2004.10934.
7	HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[J/OL]. arXiv, 2017. https://arxiv.org/abs/1704.04861v1.
8	TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10778-10787.
9	HAN K, WANG Y H, TIAN Q, et al. Ghostnet: more features from cheap operations[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1580-1589.
10	REDMON J, FARHADI A. YOLOV3: an incremental improvement[J/OL]. arXiv, 2018. https://arxiv.org/abs/1804.02767.
11	LI Z , LIU X , ZHAO Y , et al. A lightweight multi-scale aggregated model for detecting aerial images captured by UAVs[J]. Journal of Visual Communication and Image Representation, 2021, 77 (1): 103058.
12	ZHANG P, ZHONG Y, LI X. SlimYOLOv3: narrower, faster and better for real-time UAV applications[C]//Poceedings of Computer Vision and Pattern Recognition(CVPR). Seoul: IEEE, 2019.
13	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[J/OL]. arXiv, 2020. https://arxiv.org/pdf/2010.11929v2.pdf.
14	查俊伟, 张洪艳. 动态感受野特征选择去雾网络[J]. 电子科技, 2023, 36 (7): 1- 8.
	ZHA Junwei , ZHANG Hongyan . Dynamic receptive field feature selection dehazing network[J]. Electronic Science and Technology, 2023, 36 (7): 1- 8.
15	李翠平, 李仲学, 余东明. 基于泰森多边形法的空间品位插值[J]. 辽宁工程技术大学学报, 2007, 26 (4): 488- 491. doi: 10.3969/j.issn.1008-0562.2007.04.003
	LI Cuiping , LI Zhongxue , YU Dongming . Ore grade interpolation based on Thiessen polygon method[J]. Journal of Liaoning Technical University, 2007, 26 (4): 488- 491. doi: 10.3969/j.issn.1008-0562.2007.04.003
16	WANG C Y, LIAO H, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle: IEEE, 2020.
17	CUI C, GAO T, WEI S, et al. PP-LCNet: a lightweight CPU convolutional neural network[J/OL]. arXiv, 2021. https://arxiv.org/abs/2109.15099v1.
18	JIE H, LI S, GANG S. Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018.
19	肖顺亮, 强赞霞, 刘卫光. 基于CSP改进用于拥挤情况的行人检测算法[J]. 计算机技术与发展, 2021, 31 (7): 52- 58.
	XIAO Shunliang , QIANG Zanxia , LIU Weiguang . An improved pedestrian detection algorithm for crowd based on CSP[J]. Computer Technology and Development, 2021, 31 (7): 52- 58.
20	JIANG J H , FU X J , QIN R , et al. High-speed lightweight ship detection algorithm based on YOLO-V4 for three-channels RGB SAR image[J]. Remote Sensing, 2021, 13 (10): 1909. doi: 10.3390/rs13101909
21	ALBABA B M, OZER S. SyNet: an ensemble network for object detection in UAV images[C]//Proceedings of Computer Vision and Pattern Recognition(CVPR). Milan: IEEE, 2020.
22	WANG H, WANG Z, JIA M, et al. Spatial attention for multi-scale feature refinement for object detection[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul: IEEE, 2019.
23	ZHANG X, IZQUIERDO E, CHANDRAMOULI K. Dense and small object detection in UAV vision based on cascade network[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul: IEEE, 2019.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

主干网	特征层	特征图大小/ 像素	感受野大小/像素	主干网	特征层	特征图大小/ 像素	感受野大小/像素
YOLOv5s的主干网(输入图像尺寸640×640像素)	P1(F=6, S=2, P=2)	320×320	2	DSB-YOLO的主干网(输入图像尺寸640×640像素)	P1(F=6, S=2, P=2)	320×320	2
	P2(F=3, S=2)	160×160	6		P2(F=3, S=1, P=1)	320×320	6
	P3(F=3, S=2)	80×80	14		P3(F=3, S=2, P=1)	160×160	10
	P4(F=3, S=2)	40×40	30		P4(F=3, S=2, P=1)	80×80	18
	P5(F=3, S=2)	20×20	62		P5(F=3, S=2, P=1)	40×40	34

序号	算法	mAP50/%	mAP/%	FPS	BFLOPs
A	YOLOv5s	33.2	17.5	47.2	15.9
B	A+C3Transformer	36.0	18.1	43.0	16.7
C	B+主干网改进	40.2	21.5	25.0	65.2
D	C+Neck部分改进	41.2	22.3	25.0	67.5
	D+轻量化改进	42.3	24.8	28.4	51.4
E	A+主干网改进	37.4	19.9	29.2	64.4
F	E+轻量化改进	38.4	20.7	34.2	48.5
G	F+Neck部分改进	41.4	21.5	34.2	50.8
H	G+C3Transformer	42.3	24.8	28.4	51.4

输入图像大小/像素	算法	mAP50/%	mAP/%
1 280×1 280	YOLOv5s	48.1	27.5
1 280×1 280	DSB-YOLO	53.4	32.3

算法	mAP50/%	mAP/%
DSYolov3^[11]	44.5	22.3
SyNet^[21]	48.4	25.1
SAMFR^[22]	40.0	20.2
SlimYOLOv3^[12]	-	23.9
Zhang et al^[23]	45.2	22.6
DSB-YOLO	53.4	32.3

[1]	仲诚诚,周恒,张梓童,张春雷. LAC-UNet:基于胶囊表达局部-整体特征关系的语义分割模型[J]. 《山东大学学报(理学版)》, 2023, 58(11): 116-126.
[2]	徐菲菲,许赟杰. 基于Arc-LSTM的人职匹配研究[J]. 《山东大学学报(理学版)》, 2021, 56(1): 83-90.
[3]	郝长盈,兰艳艳,张海楠,郭嘉丰,徐君,庞亮,程学旗. 基于拓展关键词信息的对话生成模型[J]. 《山东大学学报(理学版)》, 2019, 54(7): 68-76.
[4]	刘飚,路哲,黄雨薇,焦萌,李泉其,薛瑞. 神经网络结构在功耗分析中的性能对比[J]. 《山东大学学报(理学版)》, 2019, 54(1): 60-66.
[5]	庞博,刘远超. 融合pointwise及深度学习方法的篇章排序[J]. 山东大学学报（理学版）, 2018, 53(3): 30-35.
[6]	刘明明,张敏情,刘佳,高培贤. 一种基于浅层卷积神经网络的隐写分析方法[J]. 山东大学学报（理学版）, 2018, 53(3): 63-70.
[7]	刘铭, 昝红英, 原慧斌. 基于SVM与RNN的文本情感关键句判定与抽取[J]. 山东大学学报（理学版）, 2014, 49(11): 68-73.