JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2025, Vol. 60 ›› Issue (1): 63-73.doi: 10.6040/j.issn.1671-9352.4.2023.0213

Previous Articles    

Monocular 3D object detection algorithm combining depth guidance and multi-scale channel attention mechanism

LIU Qing1, LI Wei1*, YU Shaoyong2, SONG Yuping3, ZHOU Qidi1, ZOU Weilin1   

  1. 1. School of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, Fujian, China;
    2. School of Mathematics and Information Engineering, Longyan University, Longyan 364012, Fujian, China;
    3. School of Mathematical Sciences, Xiamen University, Xiamen 361005, Fujian, China
  • Published:2025-01-10

Abstract: For issues where the absence of essential spatial structure signals makes it highly challenging to estimate 3D bounding boxes accurately from a single picture, a monocular 3D object detection algorithm is proposed based on a multi-scale channel attention mechanism plus depth guidance to conquer these challenges. To introduce 3D data and effectively capture spatial information from different scales of feature maps, the depth maps and monocular image feature maps are pre-processed in the feature extraction module using a pyramid split algorithm, respectively, and then on the basic of the weight using the channel-wise attention module to calibrate the corresponding feature vectors to generate a refined feature map which is richer in multi-scale feature information. A depth-guided dynamic local convolution network is suggested for applying depth maps as specific kernels that contain spatial structure signals to monocular image feature maps. This method mitigates error accumulation from direct fusion and addresses the scale sensitivity issue of objects looking larger or smaller with distance. The models performance is assessed and also compared using various evaluation metrics. Experimental results demonstrate that the method proposed in this paper improves the 3D detection accuracy for cars,pedestrians and cyclists in the autonomous driving datasets when compared to other algorithms.

Key words: monocular 3D object detection, depth guidance, multi-scale channel-wise attention mechanism, autonomous driving

CLC Number: 

  • TP391
[1] MOUSAVIAN A, ANGUELOV D, FLYNN J, et al. 3D bounding box estimation using deep learning and geometry[C] //2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu: IEEE, 2017:5632-5640.
[2] QIN Zengyi, WANG Jinglu, LU Yan. Monogrnet: a geometric reasoning network for monocular 3D object localization[C] //2019 33th AAAI Conference on Artificial Intelligence(AAAI-19). Hawaii: AAAI Press, 2019:8851-8858.
[3] SIMONELLI A, BULO S R, PORZI L, et al. Disentangling monocular 3D object detection: from single to multi-class recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(3):1219-1231.
[4] BRAZIL G, LIU X M. M3D-RPN: monocular 3D region proposal network for object detection[C] //2019 IEEE/CVF International Conference on Computer Vision(ICCV). Seoul: IEEE, 2019:9286-9295.
[5] XIANG Y, CHOI W, LIN Y Q, et al. Data-driven 3D voxel patterns for object category recognition[C] //2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Boston: IEEE, 2015:1903-1911.
[6] LIU Zongdai, ZHOU Dingfu, LU Feixiang, et al. Autoshape: real-time shape-aware monocular 3D object detection[C] //2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal: IEEE, 2021:15621-15630.
[7] SONG Xibin, LI Wei, ZHOU Dingfu, et al. MLDA-Net: multi-level dual attention-based network for self-supervised monocular depth estimation[J]. IEEE Transactions on Image Processing, 2021, 30:4691-4705.
[8] GODARD C, AODHA O M, FIRMAN M, et al. Digging into self-supervised monocular depth estimation[C] //2019 IEEE/CVF International Conference on Computer Vision(ICCV). Seoul: IEEE, 2019:3827-3837.
[9] WANG Qi, CHEN Jian, DENG Jiangqiang, et al. 3D-CenterNet: 3D object detection network for point clouds with center estimation priority[J]. Pattern Recognition, 2021, 115:107884.
[10] CHEN Yongjian, TAI Lei, SUN Kai, et al. Monopair: monocular 3D object detection using pairwise spatial relationships[C] //2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle: IEEE Computer Society, 2020:12090-12099.
[11] YIN T W, ZHOU X Y, KRAHENBUHL P. Center-based 3D object detection and tracking[C] //2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Nashville: IEEE, 2021:11779-11788.
[12] FU Huan, GONG Mingming, WANG Chaohui, et al. Deep ordinal regression network for monocular depth estimation[C] //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Salt Lake City: IEEE, 2018:2002-2011.
[13] MAYER N, ILG E, HAUSSER P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C] //2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Las Vegas: IEEE Computer Society, 2016:4040-4048.
[14] WANG Y, CHAO W L, GARG D, et al. Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving[C] //2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Long Beach: IEEE, 2019:8437-8445.
[15] PARK D, AMBRUS R, GUIZILINI V, et al. Is pseudo-lidar needed for monocular 3D object detection?[C] //2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal: IEEE, 2021:3142-3152.
[16] MA Xinzhu, WANG Zhihui, LI Haojie, et al. Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving[C] //2019 IEEE/CVF International Conference on Computer Vision(ICCV). Seoul: IEEE, 2019:6850-6859.
[17] GARG D, WANG Y, HARIHARAN B, et al. Wasserstein distances for stereo disparity estimation[J]. Advances in Neural Information Processing Systems, 2020, 33:22517-22529.
[18] ZHANG Hu, ZU Keke, LU Jian, et al. EPSANet: an efficient pyramid squeeze attention block on convolutional neural network[C] //Asian Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022:541-557.
[19] WU Bichen, WAN Alvin, YUE Xiangyu, et al. Shift: a zero flop, zero parameter alternative to spatial convolutions[C] //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Salt Lake City: IEEE, 2018:9127-9135.
[20] SHEPLEY A J, FALZON G, KWAN P, et al. Confluence: a robust non-IoU alternative to non-maxima suppression in object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10):11561-11574.
[21] HE Kaiminh, ZHANG Xiangyu, REN Shaoqi, et al. Deep residual learning for image recognition[C] //2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Las Vegas: IEEE Computer Society, 2016:770-778.
[22] HU Jie, SHEN Li, SUN Gang. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8):2011-2023.
[23] BRABANDERE D B, JIA X, TUYTELAARS T, et al. Dynamic filter networks[J]. Proceedings NIPS 2016, 2016, 29:1-9.
[24] WANG Xin, LV Rongrong, ZHAO Yang, et al. Multi-scale context aggregation network with attention-guided for crowd counting[C] //2020 15th IEEE International Conference on Signal Processing(ICSP). Beijing: IEEE, 2020:240-245.
[25] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C] //2016 14th European Conference on Computer Vision(ECCV). Amsterdam: Springer, 2016:21-37.
[26] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the kitti vision benchmark suite[C] //2012 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Providence: IEEE, 2012:3354-3361.
[27] LI Peixuan, ZHAO Huaici, LIU Pengfei, et al. Rtm3D: real-time monocular 3D detection from object keypoints for autonomous driving[C] //2020 16th European Conference on Computer Vision(ECCV). Beilin: Springer, 2020:644-660.
[28] CHEN X Z, KUNDU K, ZHU Y K, et al. 3D object proposals using stereo imagery for accurate object class detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017, 40(5):1259-1272.
[29] QIN Zengyi, WANG Jinglu, LU Yan. Triangulation learning network: from monocular to stereo 3D object detection[C] //2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Long Beach: IEEE, 2019:7607-7615.
[30] XU Bin, CHEN Zhenzhou. Multi-level fusion based 3D object detection from monocular images[C] //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Salt Lake City: IEEE, 2018:2345-2353.
[31] SHI X P, CHEN Z X, KIM T K. Distance-normalized unified representation for monocular 3D object detection[C] //2020 16th European Conference on Computer Vision(ECCV). Berlin: Springer, 2020:91-107.
[32] CAI Yingjie, LI Buyu, JIAO Zeyu, et al. Monocular 3D object detection with decoupled structured polygon estimation and height-guided depth estimation[C] //2020 34th AAAI Conference on Artificial Intelligence(AAAI-20). New York: AAAI Press, 2020:10478-10485.
[33] DENG J, DONG W, SOCHER R, et al. Imagenet: a large-scale hierarchical image database[C] //2009 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Miami: IEEE, 2009:248-255.
[34] DAI Jifeng, QI Haozhi, XIONG Yuwen, et al. Deformable convolutional networks[C] //2017 IEEE International Conference on Computer Vision(ICCV). Venice: IEEE Computer Society, 2017:764-773.
[1] Xia LIANG,Jie GUO. A method of online teaching platform selection based on online reviews [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(9): 108-118.
[2] Chao LI,Wei LIAO. Chinese disease text classification model driven by medical knowledge [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 122-130.
[3] Jie JI,Chengjie SUN,Lili SHAN,Boyue SHANG,Lei LIN. A prompt learning approach for telecom network fraud case classification [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 113-121.
[4] Qi LUO,Gang GOU. Multimodal conversation emotion recognition based on clustering and group normalization [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 105-112.
[5] Fengxu ZHAO,Jian WANG,Yuan LIN,Hongfei LIN. Probability distribution optimization model for learning to rank [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 95-104.
[6] Xingyu HUANG,Mingyu ZHAO,Ziyu LYU. Category-wise knowledge probers for representation learning of graph neural networks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 85-94.
[7] Liang GUI,Yao XU,Shizhu HE,Yuanzhe ZHANG,Kang LIU,Jun ZHAO. Factual error detection in knowledge graphs based on dynamic neighbor selection [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 76-84.
[8] Ning XIAN,Yixing FAN,Tao LIAN,Jiafeng GUO. Noise network alignment method integrating multiple features [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 64-75.
[9] Chengjie SUN,Zongwei LI,Lili SHAN,Lei LIN. A document-level event extraction method based on core arguments [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 53-63.
[10] Peiyu LIU,Bowen YAO,Zefeng GAO,Wayne Xin ZHAO. Matrix product operator based sequential recommendation model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 44-52, 104.
[11] Wei SHAO,Gaoyu ZHU,Lei YU,Jiafeng GUO. Dimensionality reduction and retrieval algorithms for high dimensional data [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 27-43.
[12] Jiyuan YANG,Muyang MA,Pengjie REN,Zhumin CHEN,Zhaochun REN,Xin XIN,Fei CAI,Jun MA. Research on self-supervised pre-training for recommender systems [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 1-26.
[13] Haisu CHEN,Jiachun LIAO,Sicheng YAO. Identification and statistical analysis methods of personal information disclosure in open government data [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 95-106.
[14] Xin WEN,Deyu LI. The ML-KNN method based on attribute weighting [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 107-117.
[15] Xueqiang ZENG,Yu SUN,Ye LIU,Zhongying WAN,Jiali ZUO,Mingwen WANG. Emoji embedded representation based on emotion distribution [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 81-94.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!