JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2023, Vol. 58 ›› Issue (9): 59-70.doi: 10.6040/j.issn.1671-9352.0.2022.349

Previous Articles     Next Articles

A object detection algorithm for aerial images

Cheng LI1,2(),Wengang CHE1,2,*(),Shengxiang GAO1,2   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
    2. Computer Technology Application Key Laboratory of Yunnan Province, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
  • Received:2022-06-23 Online:2023-09-20 Published:2023-09-08
  • Contact: Wengang CHE E-mail:841364941@qq.com;wgche@qq.com

Abstract:

A object detection algorithm DSB-YOLO (depthwise separable convolutional backbone and YOLO) for aerial images is proposed. Based on YOLOv5s, firstly, from the perspective of extracting the perceptual field of the feature map from the backbone network, the perceptual field of the feature map is reduced by changing the interval sampling of the convolutional kernel to better extract the information of small objects. Secondly, the feature pyramid network (FPN) and path aggregation network (PAN) feature fusion paths in the Neck part of the network are improved, so that the large amount of location information in the shallow sampled feature maps can be better combined with the deep extracted feature maps of the network. This allows the network to combine the large amount of location information in the shallow sampled feature map with the deep extracted feature map, effectively improving the accurate detection rate of small objects. The C3Transformer module was then added to the backbone network to integrate the full image information; then, the network was lightened by replacing the partial convolution of the network backbone with a depth-separable convolution and integrating the SE attention mechanism, which aims to focus and select the information useful for the object detection task, thus improving the detection efficiency of the model. Comparative experimental results using the VisDrone dataset show that, at an input image resolution of 1 280×1 280 pixels, the DSB-YOLO algorithm proposed in this paper tests average accuracy metrics mAP50 and mAP0.5 ∶0.95 that are 11% and 17.5% higher, respectively, compared to the original model; Deployed on the embedded platform Jetson TX2, computing rates of up to 21FPS can be achieved and model performance meets applicable standards.

Key words: computer vision, object detection in aerial image, deep learning

CLC Number: 

  • TP391.41

Fig.1

YOLOv5s algorithm framework"

Fig.2

Process of extracting feature maps"

Fig.3

Object width and height scatter plot"

Table 1

Backbone receptive field analysis table"

主干网 特征层 特征图大小/ 像素 感受野大小/像素主干网 特征层 特征图大小/ 像素 感受野大小/像素
YOLOv5s的主干网(输入图像尺寸640×640像素)P1(F=6, S=2, P=2) 320×320 2 DSB-YOLO的主干网(输入图像尺寸640×640像素)P1(F=6, S=2, P=2) 320×320 2
P2(F=3, S=2) 160×160 6 P2(F=3, S=1, P=1) 320×320 6
P3(F=3, S=2) 80×80 14 P3(F=3, S=2, P=1) 160×160 10
P4(F=3, S=2) 40×40 30 P4(F=3, S=2, P=1) 80×80 18
P5(F=3, S=2) 20×20 62 P5(F=3, S=2, P=1) 40×40 34

Fig.4

DSB-YOLO algorithm framework"

Fig.5

Architecture of Transformer encoder"

Fig.6

Architecture of C3Transformer"

Fig.7

Depthwise separable convolution"

Fig.8

SE channel attention module"

Fig.9

Depthwise separable convolution with SE channel attention module"

Fig.10

DSB-YOLO algorithm structure"

Fig.11

Images of VisDrone dataset"

Table 2

Ablation experiment results of DSB-YOLO algorithm on VisDrone dataset"

序号 算法 mAP50/% mAP/% FPS BFLOPs
A YOLOv5s 33.2 17.5 47.2 15.9
B A+C3Transformer 36.0 18.1 43.0 16.7
C B+主干网改进 40.2 21.5 25.0 65.2
D C+Neck部分改进 41.2 22.3 25.0 67.5
D+轻量化改进 42.3 24.8 28.4 51.4
E A+主干网改进 37.4 19.9 29.2 64.4
F E+轻量化改进 38.4 20.7 34.2 48.5
G F+Neck部分改进 41.4 21.5 34.2 50.8
H G+C3Transformer 42.3 24.8 28.4 51.4

Table 3

Comparative experiment of DSB-YOLO algorithm and Yolov5s on VisDrone dataset"

输入图像大小/像素 算法 mAP50/% mAP/%
1 280×1 280 YOLOv5s 48.1 27.5
DSB-YOLO 53.4 32.3

Fig.12

Dense small target detection comparison"

Fig.13

High altitude small target detection comparison"

Fig.14

Occlusion target detection comparison"

Table 4

Detection performance of different algorithms on VisDrone dataset"

算法 mAP50/% mAP/%
DSYolov3[11] 44.5 22.3
SyNet[21] 48.4 25.1
SAMFR[22] 40.0 20.2
SlimYOLOv3[12] - 23.9
Zhang et al[23] 45.2 22.6
DSB-YOLO 53.4 32.3
1 GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
2 REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
3 LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin: Springer-Verlag, 2016: 21-37.
4 ZHU P, WEN L, XIAO B, et al. Vision meets drones: a challenge[J/OL]. arXiv, 2018. https://arxiv.org/abs/1804.07437.
5 ZHANG H, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[J/OL]. arXiv, 2017. https://arxiv.org/pdf/1710.09412.pdf.
6 BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOV4: optimal speed and accuracy of object detection[J/OL]. arXiv, 2020. https://arxiv.org/abs/2004.10934.
7 HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[J/OL]. arXiv, 2017. https://arxiv.org/abs/1704.04861v1.
8 TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10778-10787.
9 HAN K, WANG Y H, TIAN Q, et al. Ghostnet: more features from cheap operations[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1580-1589.
10 REDMON J, FARHADI A. YOLOV3: an incremental improvement[J/OL]. arXiv, 2018. https://arxiv.org/abs/1804.02767.
11 LI Z , LIU X , ZHAO Y , et al. A lightweight multi-scale aggregated model for detecting aerial images captured by UAVs[J]. Journal of Visual Communication and Image Representation, 2021, 77 (1): 103058.
12 ZHANG P, ZHONG Y, LI X. SlimYOLOv3: narrower, faster and better for real-time UAV applications[C]//Poceedings of Computer Vision and Pattern Recognition(CVPR). Seoul: IEEE, 2019.
13 DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[J/OL]. arXiv, 2020. https://arxiv.org/pdf/2010.11929v2.pdf.
14 查俊伟, 张洪艳. 动态感受野特征选择去雾网络[J]. 电子科技, 2023, 36 (7): 1- 8.
ZHA Junwei , ZHANG Hongyan . Dynamic receptive field feature selection dehazing network[J]. Electronic Science and Technology, 2023, 36 (7): 1- 8.
15 李翠平, 李仲学, 余东明. 基于泰森多边形法的空间品位插值[J]. 辽宁工程技术大学学报, 2007, 26 (4): 488- 491.
doi: 10.3969/j.issn.1008-0562.2007.04.003
LI Cuiping , LI Zhongxue , YU Dongming . Ore grade interpolation based on Thiessen polygon method[J]. Journal of Liaoning Technical University, 2007, 26 (4): 488- 491.
doi: 10.3969/j.issn.1008-0562.2007.04.003
16 WANG C Y, LIAO H, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle: IEEE, 2020.
17 CUI C, GAO T, WEI S, et al. PP-LCNet: a lightweight CPU convolutional neural network[J/OL]. arXiv, 2021. https://arxiv.org/abs/2109.15099v1.
18 JIE H, LI S, GANG S. Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018.
19 肖顺亮, 强赞霞, 刘卫光. 基于CSP改进用于拥挤情况的行人检测算法[J]. 计算机技术与发展, 2021, 31 (7): 52- 58.
XIAO Shunliang , QIANG Zanxia , LIU Weiguang . An improved pedestrian detection algorithm for crowd based on CSP[J]. Computer Technology and Development, 2021, 31 (7): 52- 58.
20 JIANG J H , FU X J , QIN R , et al. High-speed lightweight ship detection algorithm based on YOLO-V4 for three-channels RGB SAR image[J]. Remote Sensing, 2021, 13 (10): 1909.
doi: 10.3390/rs13101909
21 ALBABA B M, OZER S. SyNet: an ensemble network for object detection in UAV images[C]//Proceedings of Computer Vision and Pattern Recognition(CVPR). Milan: IEEE, 2020.
22 WANG H, WANG Z, JIA M, et al. Spatial attention for multi-scale feature refinement for object detection[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul: IEEE, 2019.
23 ZHANG X, IZQUIERDO E, CHANDRAMOULI K. Dense and small object detection in UAV vision based on cascade network[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul: IEEE, 2019.
[1] ZHONG Chengcheng, ZHOU Heng, ZHANG Zitong, ZHANG Chunlei. LAC-UNet: semantic segmentation model based on capsules for representing part-whole hierarchical features [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(11): 116-126.
[2] Fei-fei XU,Yun-jie XU. Research on matching resumes and positions based on Arc-LSTM [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(1): 83-90.
[3] Chang-ying HAO,Yan-yan LAN,Hai-nan ZHANG,Jia-feng GUO,Jun XU,Liang PANG,Xue-qi CHENG. Dialogue generation model based on extended keywords information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(7): 68-76.
[4] LIU Biao, LU Zhe, HUANG Yu-wei, JIAO Meng, LI Quan-qi, XUE Rui. Comparative study on neural network structures in power analysis [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(1): 60-66.
[5] PANG Bo, LIU Yuan-chao. Fusion of pointwise and deep learning methods for passage ranking [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 30-35.
[6] LIU Ming-ming, ZHANG Min-qing, LIU Jia, GAO Pei-xian. Steganalysis method based on shallow convolution neural network [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 63-70.
[7] LIU Ming, ZAN Hong-ying, YUAN Hui-bin. Key sentiment sentence prediction using SVM and RNN [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 68-73.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] HE Hai-lun, CHEN Xiu-lan* . Circular dichroism detection of the effects of denaturants and buffers on the conformation of cold-adapted protease MCP-01 and  mesophilic protease BP01[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2013, 48(1): 23 -29 .
[2] SUN Xiao-ting1, JIN Lan2*. Application of DOSY in oligosaccharide mixture analysis[J]. J4, 2013, 48(1): 43 -45 .
[3] YANG Ying, JIANG Long*, SUO Xin-li. Choquet integral representation of premium functional and related properties on capacity space[J]. J4, 2013, 48(1): 78 -82 .
[4] Ming-Chit Liu. THE TWO GOLDBACH CONJECTURES[J]. J4, 2013, 48(2): 1 -14 .
[5] ZHAO Tong-xin1, LIU Lin-de1*, ZHANG Li1, PAN Cheng-chen2, JIA Xing-jun1. Pollinators and pollen polymorphism of  Wisteria sinensis (Sims) Sweet[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 1 -5 .
[6] WANG Kai-rong, GAO Pei-ting. Two mixed conjugate gradient methods based on DY[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(6): 16 -23 .
[7] TANG Feng-qin1, BAI Jian-ming2. The precise large deviations for a risk model with extended negatively upper orthant dependent claim  sizes[J]. J4, 2013, 48(1): 100 -106 .
[8] CHENG Zhi1,2, SUN Cui-fang2, WANG Ning1, DU Xian-neng1. On the fibre product of Zn and its property[J]. J4, 2013, 48(2): 15 -19 .
[9] TANG Xiao-hong1, HU Wen-xiao2*, WEI Yan-feng2, JIANG Xi-long2, ZHANG Jing-ying2, SHAO Xue-dong3. Screening and biological characteristics studies of wide wine-making yeasts[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 12 -17 .
[10] MAO Ai-qin1,2, YANG Ming-jun2, 3, YU Hai-yun2, ZHANG Pin1, PAN Ren-ming1*. Study on thermal decomposition mechanism of  pentafluoroethane fire extinguishing agent[J]. J4, 2013, 48(1): 51 -55 .