基于YOLOv3模型的高阶目标检测方法

doi:10.6040/j.issn.1671-9352.4.2021.034

摘要/Abstract

摘要： 目标检测是计算机视觉的重要分支,目前基于深度学习的目标检测算法相较于传统目标检测算法在检测精度和检测时间上虽能略胜一筹,但其难以同时兼顾检测速度与检测精度,因此针对这一问题提出了改进YOLOv3后的Mul-YOLO目标检测网络。Mul-YOLO目标检测网络利用Haar小波进行数据预处理,将图像信息的低频特征在不同分辨率下层层分解,用以获得水平方向、垂直方向以及斜对角方向上的高频特征,进而利用高频特征记录的相应特征信息,减小被检测目标在几何状态变化、光照变化和背景变化下对检测精度带来的负面影响。在特征层上采样、卷积和拼接的过程中融入高阶计算,由此增强在有限的感受野内的特征表述能力,使得训练网络更加关注映射特征的显著性信息,增强了图像的分辨率,有效地减少了数据集训练过程中由连续的卷积和池化带来的信息丢失问题。在PASCAL VOC数据集下的实验结果表明,本文提出的Mul-YOLO目标检测模型相较于传统目标检测模型有了明显的改进,比如相较于Faster R-CNN ResNet提取特征的方法,mAP提高了8.97%,并且单张图片的检测时间提高了172 ms。与YOLOv3提取特征的方法相比,其mAP提高了33.48%,达到了检测精度与检测时间同时相得益彰的目的,综合其他比较结果,本文方法的有效性可以有效地得以验证。

关键词: 目标检测, Mul-YOLO, Haar小波, 高区分性特征, 高阶计算

Abstract: Target detection is an important branch of computer vision, although the current target detection approaches based on deep learning can solve the issues that are usually caused by traditional target detection methods in detection accuracy and detection time, it is still difficult to take both detection speed and detection accuracy into account. Therefore, this paper proposes the Mul-YOLO target detection network based on the improved YOLOv3, which uses Haar wavelet for data preprocessing, decomposes low-frequency features of image information layer by layer in different resolutions, and then obtains high-frequency features in horizontal, vertical and diagonal directions. The information recorded by the aforementioned high-frequency features can reduce the negative effects to detection accuracy that are usually brought by geometric state change, illumination change and background change. Convolution and concatenating on the feature layer in combination with the third-order calculation are integrated, and the feature extraction which makes the training network pay more attention to the significant information of the mapping features, is strengthened in the limited receptive field. This enhances the image resolution, and makes up for the problem of information loss caused by continuous convolution and pooling in the data set training process. The experimental results on PASCAL VOC data sets show that the proposed Mul-YOLO target detection approach has obvious improvements compared with the previous generation of target detection model. For example, mAP is improved by 8.97%compared with the Faster R-CNN ResNet feature extraction method, the detection time of single image is decreased by 172 ms, while mAP is increased by 30.48% compared with the YOLOv3 feature extraction method, achieveing the purpose that detection accuracy and detection time complement each other at the same time. The detection accuracy is therefore improved, and the detection time remains unchanged and the effectiveness of proposed approaches can be guaranteed also.

Key words: target detection, Mul-YOLO, Haar wavelet, high distinguishability feature, high order sampling

中图分类号:

TP30

严晨旭,邵海见,邓星. 基于YOLOv3模型的高阶目标检测方法[J]. 《山东大学学报(理学版)》, 2022, 57(3): 20-30.

YAN Chen-xu, SHAO Hai-jian, DENG Xing. Multihigh order target detection method based on YOLOv3 model[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(3): 20-30.

参考文献

[1] 周俊宇,赵艳明.卷积神经网络在图像分类和目标检测应用综述[J]. 计算机工程与应用, 2017, 53(13):34-41. ZHOU Junyu, ZHOU Yanming. Application of convolution neural network in image classification and object detection[J]. Computer Engineering and Applications, 2017, 53(13):34-41.
[2] 李忠海, 杨超, 梁书浩. 基于超像素分割和混合权值 AdaBoost 运动检测算法[J]. 电光与控制, 2018, 25(2):33-37. LI Zhonghai, YANG Chao, LIANG Shuhao. AdaBoost moving-target detection algorithm based on superpixel segmentation and mixed weight[J]. Electronics Optics & Control, 2018, 25(2):33-37.
[3] 胡昭华, 张维新, 邵晓雯. 超像素特征的运动目标检测算法[J]. 控制理论与应用, 2017, 34(12):1568-1574. HU Zhaohua, ZHANG Weixin, SHAO Xiaowen. Moving object detection algorithm with superpixel features[J]. Control Theory & Applications, 2017, 34(12):1568-1574.
[4] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C] //Proceedings of the IEEE Computer Society Conference on Computer Vision & Pattern Recognition(CVPR). San Diego: IEEE, 2005: 886-893.
[5] LIENHART R, MAYDT J. An extended set of Haar-like features for rapid object detection[C] //International Conference on Image Processing(ICIP). Rochester: IEEE, 2002: 900-903.
[6] VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features[C] //Proceedings of the IEEE Computer Society Conference on Computer Vision & Pattern Recognition(CVPR). Kauai: IEEE, 2001: 511-518.
[7] FELZENSZWALB P F, MCALLESTER D, RAMANAN D. A discriminatively trained, multiscale, deformable part model[C] //Proceedings of the IEEE Computer Society Conference on Computer Vision & Pattern Recognition(CVPR). Anchorage: IEEE, 2008: 1-8.
[8] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 32(9):1627-1645.
[9] CARREIRA J, AGRAWAL P, FRAGKIADAKI K, et al. Human pose estimation with iterative error feedback[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4733-4742.
[10] 张顺, 龚一宏, 王进军. 深度卷积神经网络的发展及其在计算机视觉领域的应用[J]. 计算机学报, 2019, 42(3):453-482. ZHANG Shun, GONG Yihong, WANG Jinjun. The development of deep convolution neural network and its applications on computer vision[J]. Chinese Journal of Computers, 2019, 42(3):453-482.
[11] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmenta-teon[C] //Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
[12] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The PASCAL visual object classes(VOC)wchallenge[J]. International Journal of Computer Vision, 2010, 88(2):303-338.
[13] REN Shaoqing, HE Kaiming, GIRSHICK Ross, et al. Faster R-CNN: towards realtime object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6):1137-1149.
[14] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
[15] REDMON Joseph, FARHADI Ali. YOLO9000: better, faster, stronger[C] //2017 IEEE Conference on Computer Vision andPattern Recognition(CVPR). Honolulu: IEEE, 2017: 6517-6525.
[16] REDMON Joseph, FARHADI Ali. YOLOv3: an incremental improvement[J/OL]. arXiv, 2018. https://arxiv.org/pdf/1804.02767.pdf.
[17] DENG Z R, YANG R, LAN R S, et al. SE-IYOLOV3: an accurate small scale face detector for outdoor security[J]. Mathematics, 2020, 8(1): 93.
[18] HURTIK Petr, MOLEK Vojtech, HULA Jan, et al. Poly-YOLO: higher speed, more precise detection and instance segmenta-tion for YOLOv3[J/OL]. arXiv, 2005. https://arxiv.org/pdf/2005.13243.pdf.
[19] 张冬明, 靳国庆, 代锋, 等. 基于深度融合的显著性目标检测算法[J]. 计算机学报, 2019, 42(9):2076-2086. ZHANG Dongming, JIN Guoqing, DAI Feng, et al. Salient object detection based on deep fusion of hand-crafted features[J]. Chinese Journal of Computers, 2019, 42(9):2076-2086.
[20] ALI B, CHENG M M, HOU Q B, et al. Salient object detection: a survey[J]. Computational Visual Media, 2019, 5(2):117-150.
[21] GAO Y, WANG M, TAO D C. 3-D object retrieval and recognition with hypergraph analysis[J]. IEEE Transactions on Image Processing, 2012, 21(9):4290-4303.
[22] ZHANG K, GUO Y R, WANG X S, et al. Channel-wise and feature-points reweights densenet for image classification[C] //Proceedings of the 2019 IEEE International Conference on Image Processing. Piscataway: IEEE, 2019: 410-414.
[23] BENJILALI W, GUICQUERO W, JACQUES L, et al. Hardware-friendly compressive imaging on the basis of random modulati-ons & permutations for image acquisition and classification[C] //Proceedings of the 2019 IEEE International Conferenceon Image Processing. Piscataway: IEEE, 2019: 2085-2089.
[24] ZHANG Z Y, CUI Z, XU C Y, et al. Pattern-affinitive propagation across depth, surface normal and semantic segmenta-ti-on[C] //Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2019: 4106-4115.
[25] DING H H, JIANG X D, SHUAI B, et al. Semantic correlation promoted shape-variant context for segmentation[C] //Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2019: 8885-8894.
[26] HONG S, YOU T, KWAK S, et al. Online tracking by learning discriminative saliency map with convolutional neural network[C] //Proceedings of the 32nd International Conference on Machine Learning. Lille: ACM, 2015: 597-606.
[27] DI L, JI Y F, LISCHINSKI D, et al. Multi-scale context intertwining for semantic segmentation[C] //LNCS 11207: Proceedings of the 15th European Conference on Computer Vision. Berlin: Springer, 2018: 622-638.
[28] CRAYE C, FILLIAT D, GOUDOU J F, et al. Environment exploration for object-based visual saliency learning[C] //Proceedings of the 2016 IEEE International Conference on Robotics and Automation, Stockholm. Piscataway: IEEE, 2016: 2303-2309.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed