您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2014, Vol. 49 ›› Issue (12): 1-6.doi: 10.6040/j.issn.1671-9352.3.2014.159

• 论文 •    下一篇

基于情感依存元组的新闻文本主题情感分析

周文, 张书卿, 欧阳纯萍, 刘志明, 阳小华   

  1. 南华大学计算机科学与技术学, 湖南 衡阳 421001
  • 收稿日期:2014-08-28 修回日期:2014-10-17 出版日期:2014-12-20 发布日期:2014-12-20
  • 作者简介:周文(1988- ),男,硕士研究生,研究方向为自然语言处理、信息检索与知识发现.E-mail:mrwentian@foxmail.com
  • 基金资助:
    湖南省自然科学基金资助项目(11JJ6047,13JJ4076);湖南省教育厅优秀青年项目(13B101);南华大学重点学科和创新团队建设基金资助项目;衡阳市科技局科技计划项目(2013KG66,2013KG67)

Topic sentiment analysis of Chinese news based on emotional dependency tuple

ZHOU Wen, ZHANG Shu-qing, OUYANG Chun-ping, LIU Zhi-ming, YANG Xiao-hua   

  1. School of Computer Science and Technology, University of South China, Hengyang 421001, Hunan, China
  • Received:2014-08-28 Revised:2014-10-17 Online:2014-12-20 Published:2014-12-20

摘要: 以情感依存元组(EDT)作为中文情感表达的基本结构,把新闻文本主题情感倾向性判别任务分成主题识别、情感倾向性分析和主客观分类三个逐层递进的子任务。在主题识别前先对TF-IDF方法进行改进,再结合基于交叉熵方法提取主题特征词,同时考虑了新闻文章标题的主题表征作用,将标题词纳入主题特征集;然后基于空间向量模型计算句子与主题特征向量的相似度,在此基础上考虑句子位置、长度及句子与标题的相似度,计算句子的主题相关度以抽取主题句;最后建立情感依存元组判别模型计算主题句的情感,采用主、客观分类规则筛选出新闻倾向关键句。本方法在COAE 2014评测中各项指标皆逼近最好成绩,表明基于情感依存元组的分类方法具有较高的分类性能。

关键词: 情感分析, 情感依存元组, 倾向关键句, 主题情感

Abstract: Taking the emotional dependency tuple (EDT) as the basic structure of Chinese emotional expression, the news text theme emotion recognition task was divided into three progressive sub-tasks: topics identification, emotional tendentiousness analysis, subjective and objective classification. TF-IDF method was improved before identifying the topic, and then the cross-entropy-based method was combined to extract themes feature words. The topic representation of the news title was taken into consideration at the same time, and the title words were put into the theme feature set. The similarity between sentence and the topic feature vector was calculated based on the vector space model. Some statistical rules such as sentence position, sentence length and sentence's similarity with title were added on this foundation to get topic sentences. Finally, the emotional dependency tuple discriminant model was established to calculate sentences emotion and the subjective and objective judgment rule were used to filter out the tendency key sentence. The approaching to the best results of experiment based on COAE 2014 evaluation data shows that the classification method based on the EDT has high classification performance.

Key words: emotional dependency tuple, tendency key sentence, sentiment analysis, theme emotional

中图分类号: 

  • TP391
[1] 赵妍妍,秦兵,刘挺. 文本情感分析[J]. 软件学报, 2010, 21(8): 1834-1848. ZHAO Yanyan, QIN Bing, LIU Ting.Sentiment analysis[J]. Journal of Software, 2010, 21(8):1834-1848.
[2] 姚天昉,程希文,徐飞玉,等. 文本意见挖掘综述[J]. 中文信息学报, 2008, 22(3): 71-80. YAO Tianfang, CHENG Xiwen, XU Feiyu, et al. A survey of opinion mining for texts [J]. Journal of Chinese Information Processing, 2008, 22(3): 71-80.
[3] KIM S M, HOVY E. Automatic detection of opinion bearing words and sentences[C]//Proceedings of the IJCNLP 2005. Morristown: ACL, 2005:61-66.
[4] TURNEY P D. Thumbs up or down Semantic orientation applied to unsupervised of reviews[C]//Proceedongs of 40th Annual Meeting of the Association for Computation Linguistics. Somerset: ACL, 2002:417-424.
[5] PANG Bo, LEE L, VAITHYANATHAN S. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of the 2002 Conference on Empirical Methods In Natural Language Processing. Somerset: ACL, 2002:79-86.
[6] 朱嫣岚, 闵锦, 周雅倩,等.基于HowNet的词汇语义倾向计算[J].中文信息学报, 2006, 20(1):14-20. ZHU Yanlan, MIN Jin, ZHOU Yaqian, et al.Semantic orientation computing based on HowNet[J]. Journal of Chinese Information Processing, 2006, 20(1):14-20.
[7] 韩忠明,张玉沙,张慧,等. 有效的中文微博短文本倾向性分类算法[J]. 计算机应用与软件, 2012, 29(10):89-93. HAN Zhongming, ZHANG Yusha, ZHANG Hui, et al. On effective short text tendency classification algorithm for chinese microblogging [J]. Computer Applications and Software, 2012, 29(10):89-93.
[8] 冯时,付永东,阳锋,等. 基于依存句法的博文情感倾向分析研究[J]. 计算机研究与发展, 2012,49(11):2395-2406. FENG Shi, FU Yongdong, YANG Feng, et al. Blob sentiment orientation analysis based on dependency parsing [J]. Journal of Computer Research and Development, 2012, 49(11):2395-2406.
[9] MATSUMOTO S, TAKAMURA H, OKUMURA M. Sentiment classification using word sub-sequences and dependency subtrees[C]//Proc of 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer, 2005: 301-311.
[10] WU Yuanbin, ZHANG Qi, HUANG Xuanjing, et al. Phrase dependency parsing for opinion mining[C]//Proceedings of 47th Annual Meeting of the Association for Computational Linguistics. Somerset:ACL, 2009:1533-1541.
[11] 王伟,赵东岩,赵伟. 中文新闻关键事件的主题句识别[J]. 北京大学学报:自然科学版, 2011, 47(5):789-796. WANG Wei, ZHAO Dongyan, ZHAO Wei. Identification of topic sentence about key event in chinese news[J]. Scientiarum Naturalium Universitatis Pekinensis, 2011, 47(5):789-796.
[12] 施聪莺,徐朝军,杨晓江. TFIDF算法研究综述[J].计算机应用, 2009, 29(6): 167-180. SHI Congying, XU Chaojun, YANG Xiaojiang. Study of TFIDF algorithm [J]. Journal of Computer Applications, 2009, 29(6):167-180.
[1] 余传明,冯博琳,田鑫,安璐. 基于深度表示学习的多语言文本情感分析[J]. 山东大学学报(理学版), 2018, 53(3): 13-23.
[2] 陈鑫,薛云,卢昕,李万理,赵洪雅,胡晓晖. 基于保序子矩阵和频繁序列模式挖掘的文本情感特征提取方法[J]. 山东大学学报(理学版), 2018, 53(3): 36-45.
[3] 何炎祥, 刘健博, 孙松涛, 文卫东. 基于层叠条件随机场的微博商品评论情感分类[J]. 山东大学学报(理学版), 2015, 50(11): 67-73.
[4] 朱珠, 李寿山, 戴敏, 周国栋. 结合主动学习和自动标注的评价对象抽取方法[J]. 山东大学学报(理学版), 2015, 50(07): 38-44.
[5] 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报(理学版), 2014, 49(11): 22-30.
[6] 杨佳能, 阳爱民, 周咏梅. 基于语义分析的中文微博情感分类方法[J]. 山东大学学报(理学版), 2014, 49(11): 14-21.
[7] 朱玺, 董喜双, 关毅, 刘志广. 基于半监督学习的微博情感倾向性分析[J]. 山东大学学报(理学版), 2014, 49(11): 37-42.
[8] 孙松涛, 何炎祥, 蔡瑞, 李飞, 贺飞艳. 面向微博情感评测任务的多方法对比研究[J]. 山东大学学报(理学版), 2014, 49(11): 43-50.
[9] 张成功1,2,刘培玉1,2*,朱振方1,2,方明1,2. 一种基于极性词典的情感分析方法[J]. J4, 2012, 47(3): 47-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!