基于多视角对称非负矩阵分解的跨模态信息检索方法

doi:10.6040/j.issn.1671-9352.1.2021.032

摘要/Abstract

摘要： 针对跨模态信息检索的策略和核心问题,从提升检索性能的角度,分析了多视角对称非负矩阵分解方法用于跨模态检索的优势,提出了一种新的基于对称非负矩阵分解的跨模态检索框架。首先在Wikipedia、Pascal公开数据集上习得一致的子空间表示;然后基于该子空间,设计了一种实时样本在子空间中的投影方法。与典型相关分析、语义匹配和偏最小二乘回归相比,在MAP和PR曲线这2个指标上,本文所提出的方法具有最优的性能表现,表明了该方法应用于跨模态信息检索任务中的潜力。

关键词: 多视角聚类, 对称非负矩阵分解, 跨模态检索, 子空间学习

Abstract: This article summarizes the strategies and core issues in cross-modal information retrieval and analyses the advantages of multi-view symmetric nonnegative matrix factorization for cross-modal retrieval in terms of improving retrieval effect. A new cross-modal retrieval framework based on symmetric non-negative matrix factorization is proposed. Firstly, a consistent subspace representation is learned from the Wikipedia and Pascal datasets. Then, based on the subspace, a method of mapping real-time samples into subspaces is designed. Compared with the canonical correlation analysis, semantic matching and partial least squares regression, the proposed method has the best performance in terms of MAP and PR curves. The results demonstrate that the proposed algorithm has the potential ability in the task of cross-modal information retrieval.

Key words: multi-view clustering, symmetric nonnegative matrix factorization, cross-modal retrieval, subspace learning

中图分类号:

TP391

柳利芳,马园园. 基于多视角对称非负矩阵分解的跨模态信息检索方法[J]. 《山东大学学报(理学版)》, 2022, 57(7): 65-72.

LIU Li-fang, MA Yuan-yuan. Cross-modal information retrieval method based on multi-view symmetric nonnegative matrix factorization[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(7): 65-72.

参考文献

[1] 丁恒, 陆伟. 基于相关性的跨模态信息检索研究[J]. 现代图书情报技术, 2016,(1):17-23. DING Heng, LU Wei. A study on correlation-based cross-modal information retrieval[J]. New Technology of Library and Information Service, 2016,(1):17-23.
[2] BARNARD K, FORSYTH D. Learning the semantics of words and pictures[C] //Proceedings Eighth IEEE International Conference on Computer Vision. Vancouver: IEEE, 2001: 408-415..
[3] DENOYER L, GALLINARI P. Bayesian network model for semi-structured document classification[J]. Information Processing and Management, 2004, 40(5): 807-827.
[4] SCLAROFF S, CASCIA M L, SETHI S, et al. Unifying textual and visual cues for content-based image retrieval on the world wide web[J]. Computer Vision and Image Understanding, 1999, 75(1/2):86-98.
[5] RASIWASIA N, COSTA PEREIRA J, COVIELLO E, et al. A new approach to cross-modal multimedia retrieval[C] //Proceedings of the 18th ACM International Conference on Multimedia. Firenze: ACM, 2010: 251-260.
[6] 马园园. 基于对称非负矩阵分解的信息融合方法与应用研究[D]. 武汉: 华中师范大学, 2018. MA Yuanyuan. Information fusion methods and application based on symmetric nonnegative matrix factorization[D]. Wuhan: Huazhong Normal University, 2018.
[7] PEREIRA J C, COVIELLO E, DOYLE G, et al. On the role of correlation and abstraction in cross-modal multimedia retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(3):521-535.
[8] 冯方向. 基于深度学习的跨模态检索研究[D]. 北京: 北京邮电大学, 2015. FENG Fangxiang. Deep learning for cross-modal retrieval[D]. Beijing: Beijing University of Posts and Telecommunications, 2015.
[9] CHAUDHURI K, KAKADE S M, LIVESCU K, et al. Multi-view clustering via canonical correlation analysis[C] //Proceedings of the 26th Annual International Conference on Machine Learning. Montreal: ACM, 2009: 129-136.
[10] HARDOON D R, SZEDMAK S, SHAWE-TAYLOR J J N C. Canonical correlation analysis: an overview with application to learning methods[J]. Neural Computation, 2004, 16(12):2639-2664.
[11] LIU X, SU L, JIANG D, et al. Cross-modal retrieval of Chinese-CQA based on CCA algorithm[C] //Proceedings of 2018 International Conference on Computational, Modeling, Simulation and Mathematical Statistics. [S.l.] : DEStech, 2018: 326-333.
[12] 李志义, 黄子风, 许晓绵. 基于表示学习的跨模态检索模型与特征抽取研究综述[J]. 情报学报, 2018, 37(4):422-435. LI Zhiyi, HUANG Zifeng, XU Xiaojin. A review of the cross-modal retrieval model and feature extraction based on representation learning[J]. Journal of The China Society for Scientific and Technical Information, 2018, 37(4):422-435.
[13] 邵杰. 基于深度学习的跨模态检索[D]. 北京: 北京邮电大学, 2017. SHAO Jie. Cross-modal retrieval based on deep learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2017.
[14] DHILLON P, FOSTER D P, UNGAR L H. Multi-view learning of word embeddings via cca[C] //Advances in Neural Information Processing Systems. Granada: NeurlIPS, 2011: 199-207.
[15] ZHENG W, ZHOU X, ZOU C, et al. Facial expression recognition using kernel canonical correlation analysis(KCCA)[J]. IEEE Transactions on Neural Networks, 2006, 17(1):233-238.
[16] BACH F R, LANCKRIET G R, JORDAN M I. Multiple kernel learning, conic duality, and the SMO algorithm[C] //Proceedings of the Twenty-first International Conference on Machine Learning. New York: ACM, 2004: 1-8.
[17] RASIWASIA N, MORENO P J, VASCONCELOS N. Bridging the gap: query by semantic example[J]. IEEE Transactions on Multimedia, 2007, 9(5):923-938.
[18] 司守奎,孙兆亮. 数学建模算法与应用[M]. 北京:国防工业出版社, 2015. SI Shoukui, SUN Zhaoliang. Mathematical modeling[M]. Beijing: National Defense Industry Press, 2015.
[19] ROSIPAL R, KRÄMER N. Overview and recent advances in partial least squares[C] //International Statistical and Optimization Perspectives Workshop “Subspace, Latent Structure and Feature Selection”. Bohinj: Springer, 2005: 34-51.
[20] WU Y, WANG S, HUANG Q. Multi-modal semantic autoencoder for cross-modal retrieval[J]. Neurocomputing, 2019, 331:165-175.
[21] XU M, ZHU Z, ZHAO Y, et al. Subspace learning by kernel dependence maximization for cross-modal retrieval[J]. Neurocomputing, 2018, 309:94-105.
[22] KUANG D, YUN S, PARK H. SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering[J]. Journal of Global Optimization, 2015, 62(3):545-574.
[23] KUANG D, DING C, PARK H. Symmetric nonnegative matrix factorization for graph clustering[C] //Proceedings of the 2012 SIAM International Conference on Data Mining. California: SIAM, 2012: 106-117.
[24] ZELNIK-MANOR L, PERONA P. Self-tuning spectral clustering[C] //Advances in Neural Information Processing Systems. Vancouver: NerulIPS, 2005: 1601-1608.
[25] NG A Y, JORDAN M I, WEISS Y. On spectral clustering: analysis and an algorithm[C] //Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. [S.l.] : MIT Press, 2001: 849-856.
[26] LEE D D, SEUNG H S. Algorithms for non-negative matrix factorization[C] //Neural Information Processing Systems. Vancouver: NeurlIPS, 2001: 556-562.
[27] LONG B, ZHANG Z, YU P S. Co-clustering by block value decomposition[C] //Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. Chicago: ACM, 2005: 635-640.
[28] SHI X, LU H, HE Y, et al. Community detection in social network with pairwisely constrained symmetric non-negative matrix factorization[C] //Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Montreal: ACM, 2015: 541-546.
[29] MA X, GAO L, YONG X, et al. Semi-supervised clustering algorithm for community structure detection in complex networks[J]. Physica A: Statistical Mechanics and Its Applications, 2010, 389(1):187-197.
[30] MA Y, HU X, HE T, et al. Clustering and integrating of heterogeneous microbiome data by joint symmetric nonnegative matrix factorization with Laplacian regularization[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 17(3):788-795.
[31] MA Y, HU X, HE T, et al. Multi-view clustering microbiome data by joint symmetric nonnegative matrix factorization with Laplacian regularization[C] //Bioinformatics and Biomedicine(BIBM). Shenzhen: IEEE, 2016: 625-630.
[32] JIANG X, HU X, XU W. Microbiome data representation by joint nonnegative matrix factorization with Laplacian regularization[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2015, 14(2):353-359.
[33] DU R, DRAKE B, PARK H. Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization[J]. Journal of Global Optimization, 2019, 74(4):861-877.
[34] GUAN Z, ZHANG L, PENG J, et al. Multi-view concept learning for data representation[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(11):3016-3028.
[35] RASHTCHIAN C, YOUNG P, HODOSH M, et al. Collecting image annotations using Amazons mechanical turk[C] //Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazons Mechanical Turk. Honolulu: ACM, 2010: 139-147.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed