JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2026, Vol. 61 ›› Issue (3): 86-95.doi: 10.6040/j.issn.1671-9352.0.2024.230

Previous Articles    

Center moment discrepancy multimodal sentiment analysis based on self-attention mechanism

CHEN Zhongyuan, LU Chong*   

  1. College of Information Management, Xinjiang University of Finance and Economics, Urumqi 830012, Xinjiang, China
  • Published:2026-03-18

Abstract: A center moment discrepancy multimodal sentiment analysis based on self-attention mechanism(SA-CMD)is proposed, aiming to address issues related to modality correlation mining, feature fusion strategies, and label updating mechanisms in existing models. First, an encoder is used to encode the extracted feature sequences, and the weights of each modalitys features are dynamically adjusted through a self-attention mechanism to capture the complex dependencies between modalities. Next, the center moment discrepancy method is introduced to dynamically optimize feature representations and label distributions, enhancing the models robustness. During the feature fusion process, the model calculates the distance discrepancy between modality features and their respective positive and negative centers to generate more accurate feature labels, further improving the quality of the fused features. Finally, a linear layer is used to project the fused features onto a lower-dimensional space for prediction. Experimental results show that SA-CMD outperforms existing baseline models in the public CMU-MOSI and CMU-MOSEI datasets across various evaluation metrics, especially in terms of the Pearson correlation coefficient, binary classification accuracy, and seven-class classification accuracy. Ablation experiments further verify the key role of the self-attention mechanism and the center moment discrepancy method in enhancing model performance, fully demonstrating the effectiveness and robustness of the SA-CMD model in multimodal sentiment analysis tasks.

Key words: multimodal sentiment analysis, self-attention mechanism, center moment discrepancy, multimodal feature fusion

CLC Number: 

  • TP391
[1] ZADEH A A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C] //Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Melbourne: ACL, 2018:2236-2246.
[2] TSAI Y H H, LIANG P P, ZADEH A, et al. Learning factorized multimodal representations[EB/OL].(2018-06-16)[2024-07-04].https://arxiv.org/abs/1806.06176.
[3] TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C] //Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019:6558-6569.
[4] LIU Weide, ZHAN Huijing, CHEN Hao, et al. Multimodal sentiment analysis with missing modality: a knowledge-transfer approach[EB/OL].(2023-11-28)[2024-07-04]. https://arxiv.org/abs/2401.10747.
[5] 罗渊贻,吴锐,刘家锋,等. 面向情感语义不一致的多模态情感分析方法[J/OL]. 计算机研究与发展.(2024-03-09)[2024-07-04]. http://kns.cnki.net/kcms/detail/11.1777.tp.20240305.1731.006.html. LUO Yuanyi, WU Rui, LIU Jiafeng, et al. Multimodal sentiment analysis method for sentimental semantic inconsistency[J/OL]. Journal of Computer Research and Development.(2024-03-09)[2024-07-04]. http://kns.cnki.net/kcms/detail/11.1777.tp.20240305.1731.006.html.
[6] BALTRUŠAITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: a survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(2):423-443.
[7] SUN Z, SARMA P K, SETHARES W, et al. Multi-modal sentiment analysis using deep canonical correlation analysis[EB/OL].(2019-07-15)[2024-07-04]. https://arxiv.org/abs/1907.08696.
[8] MAI Sijie, HU Haifeng, XING Songlong. Modality to modality translation: an adversarial representation learning and graph fusion network for multimodal fusion[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI Press, 2020:164-172.
[9] HAZARIKA D, ZIMMERMANN R, PORIA S. Misa: modality-invariant and-specific representations for multimodal sentiment analysis[C] //Proceedings of the 28th ACM International Conference on Multimedia. Seattle: ACM, 2020:1122-1131.
[10] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[EB/OL].(2017-07-23)[2024-07-04]. https://arxiv.org/abs/1707.07250.
[11] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[EB/OL].(2018-05-31)[2024-07-04]. https://arxiv.org/abs/1806.00064.
[12] SUN Z, SARMA P, SETHARES W, et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI Press, 2020:8992-8999.
[13] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[EB/OL].(2019-09-01)[2024-07-04]. https://arxiv.org/abs/2109.00412.
[14] 田昌宁,贺昱政,王笛,等. 基于Transformer的多子空间多模态情感分析[J/OL]. 西北大学学报(自然科学版).(2024-04-03)[2024-07-04]. https://xdxbzk.nwu.edu.cn/thesisDetails#10.16152/j.cnki.xdxbzr.2024-02-002&lang=zh. TIAN Changning, HE Yuzheng, WANG Di, et al. Multi-subspace multimodal sentiment analysis method based on Transformer[J/OL]. Journal of Northwest University(Natural Science Edition).(2024-04-03)[2024-07-04]. https://xdxbzk.nwu.edu.cn/thesisDetails#10.16152/j.cnki.xdxbzr.2024-02-002&lang=zh.
[15] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL].(2014-09-03)[2024-07-04]. https://arxiv.org/abs/1409.0473.
[16] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30:5998-6008.
[17] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL].(2018-10-11)[2024-07-04]. https://arxiv.org/abs/1810.04805.
[18] WANG Di, GUO Xutong, TIAN Yumin, et al. TETFN: a text enhanced transformer fusion network for multimodal sentiment analysis[J]. Pattern Recognition, 2023, 136:109259.
[19] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[20] ZELLINGER W, GRUBINGER T, LUGHOFER E, et al. Central moment discrepancy(CMD)for domain-invariant representation learning[EB/OL].(2017-02-28)[2024-07-04]. https://arxiv.org/abs/1702.08811.
[21] YU Weimeng, XU Hua, YUAN Ziqi, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Virtual: AAAI Press, 2021, 35(12):10790-10797.
[22] ZADEH A, ZELLERS R, PINCUS E, et al. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB/OL].(2016-06-20)[2024-07-04]. https://arxiv.org/abs/1606.06259.
[23] LIN Han, ZHANG Pinglu, LIN Jiading, et al. PS-mixer: a polar-vector and strength-vector mixer model for multimodal sentiment analysis[J]. Information Processing & Management, 2023, 60(2):103229.
[1] Chan LU,Junjun GUO,Kaiwen TAN,Yan XIANG,Zhengtao YU. Multimodal sentiment analysis based on text-guided hierarchical adaptive fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 31-40, 51.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!