《山东大学学报(理学版)》 ›› 2022, Vol. 57 ›› Issue (7): 65-72.doi: 10.6040/j.issn.1671-9352.1.2021.032
• • 上一篇
柳利芳1,马园园2*
LIU Li-fang1, MA Yuan-yuan2*
摘要: 针对跨模态信息检索的策略和核心问题,从提升检索性能的角度,分析了多视角对称非负矩阵分解方法用于跨模态检索的优势,提出了一种新的基于对称非负矩阵分解的跨模态检索框架。首先在Wikipedia、Pascal公开数据集上习得一致的子空间表示;然后基于该子空间,设计了一种实时样本在子空间中的投影方法。与典型相关分析、语义匹配和偏最小二乘回归相比,在MAP和PR曲线这2个指标上,本文所提出的方法具有最优的性能表现,表明了该方法应用于跨模态信息检索任务中的潜力。
中图分类号:
[1] 丁恒, 陆伟. 基于相关性的跨模态信息检索研究[J]. 现代图书情报技术, 2016,(1):17-23. DING Heng, LU Wei. A study on correlation-based cross-modal information retrieval[J]. New Technology of Library and Information Service, 2016,(1):17-23. [2] BARNARD K, FORSYTH D. Learning the semantics of words and pictures[C] //Proceedings Eighth IEEE International Conference on Computer Vision. Vancouver: IEEE, 2001: 408-415.. [3] DENOYER L, GALLINARI P. Bayesian network model for semi-structured document classification[J]. Information Processing and Management, 2004, 40(5): 807-827. [4] SCLAROFF S, CASCIA M L, SETHI S, et al. Unifying textual and visual cues for content-based image retrieval on the world wide web[J]. Computer Vision and Image Understanding, 1999, 75(1/2):86-98. [5] RASIWASIA N, COSTA PEREIRA J, COVIELLO E, et al. A new approach to cross-modal multimedia retrieval[C] //Proceedings of the 18th ACM International Conference on Multimedia. Firenze: ACM, 2010: 251-260. [6] 马园园. 基于对称非负矩阵分解的信息融合方法与应用研究[D]. 武汉: 华中师范大学, 2018. MA Yuanyuan. Information fusion methods and application based on symmetric nonnegative matrix factorization[D]. Wuhan: Huazhong Normal University, 2018. [7] PEREIRA J C, COVIELLO E, DOYLE G, et al. On the role of correlation and abstraction in cross-modal multimedia retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(3):521-535. [8] 冯方向. 基于深度学习的跨模态检索研究[D]. 北京: 北京邮电大学, 2015. FENG Fangxiang. Deep learning for cross-modal retrieval[D]. Beijing: Beijing University of Posts and Telecommunications, 2015. [9] CHAUDHURI K, KAKADE S M, LIVESCU K, et al. Multi-view clustering via canonical correlation analysis[C] //Proceedings of the 26th Annual International Conference on Machine Learning. Montreal: ACM, 2009: 129-136. [10] HARDOON D R, SZEDMAK S, SHAWE-TAYLOR J J N C. Canonical correlation analysis: an overview with application to learning methods[J]. Neural Computation, 2004, 16(12):2639-2664. [11] LIU X, SU L, JIANG D, et al. Cross-modal retrieval of Chinese-CQA based on CCA algorithm[C] //Proceedings of 2018 International Conference on Computational, Modeling, Simulation and Mathematical Statistics. [S.l.] : DEStech, 2018: 326-333. [12] 李志义, 黄子风, 许晓绵. 基于表示学习的跨模态检索模型与特征抽取研究综述[J]. 情报学报, 2018, 37(4):422-435. LI Zhiyi, HUANG Zifeng, XU Xiaojin. A review of the cross-modal retrieval model and feature extraction based on representation learning[J]. Journal of The China Society for Scientific and Technical Information, 2018, 37(4):422-435. [13] 邵杰. 基于深度学习的跨模态检索[D]. 北京: 北京邮电大学, 2017. SHAO Jie. Cross-modal retrieval based on deep learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2017. [14] DHILLON P, FOSTER D P, UNGAR L H. Multi-view learning of word embeddings via cca[C] //Advances in Neural Information Processing Systems. Granada: NeurlIPS, 2011: 199-207. [15] ZHENG W, ZHOU X, ZOU C, et al. Facial expression recognition using kernel canonical correlation analysis(KCCA)[J]. IEEE Transactions on Neural Networks, 2006, 17(1):233-238. [16] BACH F R, LANCKRIET G R, JORDAN M I. Multiple kernel learning, conic duality, and the SMO algorithm[C] //Proceedings of the Twenty-first International Conference on Machine Learning. New York: ACM, 2004: 1-8. [17] RASIWASIA N, MORENO P J, VASCONCELOS N. Bridging the gap: query by semantic example[J]. IEEE Transactions on Multimedia, 2007, 9(5):923-938. [18] 司守奎,孙兆亮. 数学建模算法与应用[M]. 北京:国防工业出版社, 2015. SI Shoukui, SUN Zhaoliang. Mathematical modeling[M]. Beijing: National Defense Industry Press, 2015. [19] ROSIPAL R, KRÄMER N. Overview and recent advances in partial least squares[C] //International Statistical and Optimization Perspectives Workshop “Subspace, Latent Structure and Feature Selection”. Bohinj: Springer, 2005: 34-51. [20] WU Y, WANG S, HUANG Q. Multi-modal semantic autoencoder for cross-modal retrieval[J]. Neurocomputing, 2019, 331:165-175. [21] XU M, ZHU Z, ZHAO Y, et al. Subspace learning by kernel dependence maximization for cross-modal retrieval[J]. Neurocomputing, 2018, 309:94-105. [22] KUANG D, YUN S, PARK H. SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering[J]. Journal of Global Optimization, 2015, 62(3):545-574. [23] KUANG D, DING C, PARK H. Symmetric nonnegative matrix factorization for graph clustering[C] //Proceedings of the 2012 SIAM International Conference on Data Mining. California: SIAM, 2012: 106-117. [24] ZELNIK-MANOR L, PERONA P. Self-tuning spectral clustering[C] //Advances in Neural Information Processing Systems. Vancouver: NerulIPS, 2005: 1601-1608. [25] NG A Y, JORDAN M I, WEISS Y. On spectral clustering: analysis and an algorithm[C] //Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. [S.l.] : MIT Press, 2001: 849-856. [26] LEE D D, SEUNG H S. Algorithms for non-negative matrix factorization[C] //Neural Information Processing Systems. Vancouver: NeurlIPS, 2001: 556-562. [27] LONG B, ZHANG Z, YU P S. Co-clustering by block value decomposition[C] //Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. Chicago: ACM, 2005: 635-640. [28] SHI X, LU H, HE Y, et al. Community detection in social network with pairwisely constrained symmetric non-negative matrix factorization[C] //Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Montreal: ACM, 2015: 541-546. [29] MA X, GAO L, YONG X, et al. Semi-supervised clustering algorithm for community structure detection in complex networks[J]. Physica A: Statistical Mechanics and Its Applications, 2010, 389(1):187-197. [30] MA Y, HU X, HE T, et al. Clustering and integrating of heterogeneous microbiome data by joint symmetric nonnegative matrix factorization with Laplacian regularization[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 17(3):788-795. [31] MA Y, HU X, HE T, et al. Multi-view clustering microbiome data by joint symmetric nonnegative matrix factorization with Laplacian regularization[C] //Bioinformatics and Biomedicine(BIBM). Shenzhen: IEEE, 2016: 625-630. [32] JIANG X, HU X, XU W. Microbiome data representation by joint nonnegative matrix factorization with Laplacian regularization[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2015, 14(2):353-359. [33] DU R, DRAKE B, PARK H. Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization[J]. Journal of Global Optimization, 2019, 74(4):861-877. [34] GUAN Z, ZHANG L, PENG J, et al. Multi-view concept learning for data representation[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(11):3016-3028. [35] RASHTCHIAN C, YOUNG P, HODOSH M, et al. Collecting image annotations using Amazons mechanical turk[C] //Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazons Mechanical Turk. Honolulu: ACM, 2010: 139-147. |
[1] | 易三莉,陈建亭,贺建峰. ASR-UNet: 一种基于注意力机制改进的视网膜血管[J]. 《山东大学学报(理学版)》, 2021, 56(9): 13-20. |
[2] | 王静红,梁丽娜,李昊康,周易. 基于注意力网络特征的社区发现算法[J]. 《山东大学学报(理学版)》, 2021, 56(9): 1-12,20. |
[3] | 王伟玉, 史存会, 俞晓明, 刘悦, 程学旗. 一种事件粒度的抽取式话题简短表示生成方法[J]. 《山东大学学报(理学版)》, 2021, 56(5): 66-75. |
[4] | 张一鸣,王国胤,胡军,傅顺. 基于密度峰值和网络嵌入的重叠社区发现[J]. 《山东大学学报(理学版)》, 2021, 56(1): 91-102. |
[5] | 许侃,刘瑞鑫,林鸿飞,刘海峰,冯娇娇,李家平,林原,徐博. 基于异质网络嵌入的学术论文推荐方法[J]. 《山东大学学报(理学版)》, 2020, 55(11): 35-45. |
[6] | 张凌,任雪芳. 数据智能分类与分类智能检索-识别[J]. 《山东大学学报(理学版)》, 2020, 55(10): 7-14. |
[7] | 林明星. 基于变分结构引导滤波的低照度图像增强算法[J]. 《山东大学学报(理学版)》, 2020, 55(9): 72-80. |
[8] | 王佳麒,杨沐昀,赵铁军,赵臻宇. 检务文书检索数据集的构建[J]. 《山东大学学报(理学版)》, 2020, 55(7): 81-87. |
[9] | 余鹰,吴新念,王乐为,张应龙. 基于标记相关性的多标记三支分类算法[J]. 《山东大学学报(理学版)》, 2020, 55(3): 81-88. |
[10] | 温柳英,袁伟. 多标签符号型属性值划分的聚类方法[J]. 《山东大学学报(理学版)》, 2020, 55(3): 58-69. |
[11] | 张敏情,周能,刘蒙蒙,王涵,柯彦. 基于Paillier的同态加密域可逆信息隐藏[J]. 《山东大学学报(理学版)》, 2020, 55(3): 1-8,18. |
[12] | 王新乐,杨文峰,廖华明,王永庆,刘悦,俞晓明,程学旗. 基于多维度特征的主题标签流行度预测[J]. 《山东大学学报(理学版)》, 2020, 55(1): 94-101. |
[13] | 李妮,关焕梅,杨飘,董文永. 基于BERT-IDCNN-CRF的中文命名实体识别方法[J]. 《山东大学学报(理学版)》, 2020, 55(1): 102-109. |
[14] | 杨亚茹, 王永庆, 张志斌, 刘悦, 程学旗. 基于多元信息融合的用户关联模型[J]. 《山东大学学报(理学版)》, 2019, 54(9): 105-113. |
[15] | 张迪,查东东,刘华勇. 带两类形状参数的三次λμ-α-DP曲线的构造[J]. 《山东大学学报(理学版)》, 2019, 54(9): 114-126. |
|