《山东大学学报(理学版)》 ›› 2026, Vol. 61 ›› Issue (3): 86-95.doi: 10.6040/j.issn.1671-9352.0.2024.230
• • 上一篇
陈忠源,路翀*
CHEN Zhongyuan, LU Chong*
摘要: 为了解决现有模型在模态间相关性挖掘、特征融合方式和标签更新机制上存在的问题,提出一种基于自注意力机制的中心距差异多模态情感分析方法(center moment discrepancy multimodal sentiment analysis based on self-attention mechanism, SA-CMD)。首先,使用编码器对提取的特征序列进行编码,并通过自注意力机制动态调整各模态特征的权重,捕捉模态间复杂的依赖关系。然后,引入中心距差异方法动态优化特征表示和标签分布,增强模型的鲁棒性。在特征融合过程中,通过计算模态特征与其正负中心的距离差异,生成更准确的特征标签,进一步提高融合特征的质量。最终,使用线性层将融合特征投影到低维空间进行预测。实验结果表明,SA-CMD在公开数据集CMU-MOSI和CMU-MOSEI上的各项评价指标均优于现有基准模型,特别是在相关系数、二分类精度和七分类精度指标上表现优越。进一步验证自注意力机制和中心距差异方法在提升模型性能中的关键作用,充分说明SA-CMD模型在多模态情感分析任务中的有效性和鲁棒性。
中图分类号:
| [1] ZADEH A A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C] //Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Melbourne: ACL, 2018:2236-2246. [2] TSAI Y H H, LIANG P P, ZADEH A, et al. Learning factorized multimodal representations[EB/OL].(2018-06-16)[2024-07-04].https://arxiv.org/abs/1806.06176. [3] TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C] //Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019:6558-6569. [4] LIU Weide, ZHAN Huijing, CHEN Hao, et al. Multimodal sentiment analysis with missing modality: a knowledge-transfer approach[EB/OL].(2023-11-28)[2024-07-04]. https://arxiv.org/abs/2401.10747. [5] 罗渊贻,吴锐,刘家锋,等. 面向情感语义不一致的多模态情感分析方法[J/OL]. 计算机研究与发展.(2024-03-09)[2024-07-04]. http://kns.cnki.net/kcms/detail/11.1777.tp.20240305.1731.006.html. LUO Yuanyi, WU Rui, LIU Jiafeng, et al. Multimodal sentiment analysis method for sentimental semantic inconsistency[J/OL]. Journal of Computer Research and Development.(2024-03-09)[2024-07-04]. http://kns.cnki.net/kcms/detail/11.1777.tp.20240305.1731.006.html. [6] BALTRUŠAITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: a survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(2):423-443. [7] SUN Z, SARMA P K, SETHARES W, et al. Multi-modal sentiment analysis using deep canonical correlation analysis[EB/OL].(2019-07-15)[2024-07-04]. https://arxiv.org/abs/1907.08696. [8] MAI Sijie, HU Haifeng, XING Songlong. Modality to modality translation: an adversarial representation learning and graph fusion network for multimodal fusion[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI Press, 2020:164-172. [9] HAZARIKA D, ZIMMERMANN R, PORIA S. Misa: modality-invariant and-specific representations for multimodal sentiment analysis[C] //Proceedings of the 28th ACM International Conference on Multimedia. Seattle: ACM, 2020:1122-1131. [10] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[EB/OL].(2017-07-23)[2024-07-04]. https://arxiv.org/abs/1707.07250. [11] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[EB/OL].(2018-05-31)[2024-07-04]. https://arxiv.org/abs/1806.00064. [12] SUN Z, SARMA P, SETHARES W, et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI Press, 2020:8992-8999. [13] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[EB/OL].(2019-09-01)[2024-07-04]. https://arxiv.org/abs/2109.00412. [14] 田昌宁,贺昱政,王笛,等. 基于Transformer的多子空间多模态情感分析[J/OL]. 西北大学学报(自然科学版).(2024-04-03)[2024-07-04]. https://xdxbzk.nwu.edu.cn/thesisDetails#10.16152/j.cnki.xdxbzr.2024-02-002&lang=zh. TIAN Changning, HE Yuzheng, WANG Di, et al. Multi-subspace multimodal sentiment analysis method based on Transformer[J/OL]. Journal of Northwest University(Natural Science Edition).(2024-04-03)[2024-07-04]. https://xdxbzk.nwu.edu.cn/thesisDetails#10.16152/j.cnki.xdxbzr.2024-02-002&lang=zh. [15] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL].(2014-09-03)[2024-07-04]. https://arxiv.org/abs/1409.0473. [16] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30:5998-6008. [17] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL].(2018-10-11)[2024-07-04]. https://arxiv.org/abs/1810.04805. [18] WANG Di, GUO Xutong, TIAN Yumin, et al. TETFN: a text enhanced transformer fusion network for multimodal sentiment analysis[J]. Pattern Recognition, 2023, 136:109259. [19] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780. [20] ZELLINGER W, GRUBINGER T, LUGHOFER E, et al. Central moment discrepancy(CMD)for domain-invariant representation learning[EB/OL].(2017-02-28)[2024-07-04]. https://arxiv.org/abs/1702.08811. [21] YU Weimeng, XU Hua, YUAN Ziqi, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Virtual: AAAI Press, 2021, 35(12):10790-10797. [22] ZADEH A, ZELLERS R, PINCUS E, et al. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB/OL].(2016-06-20)[2024-07-04]. https://arxiv.org/abs/1606.06259. [23] LIN Han, ZHANG Pinglu, LIN Jiading, et al. PS-mixer: a polar-vector and strength-vector mixer model for multimodal sentiment analysis[J]. Information Processing & Management, 2023, 60(2):103229. |
| [1] | 卢婵,郭军军,谭凯文,相艳,余正涛. 基于文本指导的层级自适应融合的多模态情感分析[J]. 《山东大学学报(理学版)》, 2023, 58(12): 31-40, 51. |
|
||