您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2014, Vol. 49 ›› Issue (11): 37-42.doi: 10.6040/j.issn.1671-9352.3.2014.136

• 论文 • 上一篇    下一篇

基于半监督学习的微博情感倾向性分析

朱玺, 董喜双, 关毅, 刘志广   

  1. 哈尔滨工业大学计算机科学与技术学院, 黑龙江 哈尔滨 150001
  • 收稿日期:2014-08-28 修回日期:2014-10-21 出版日期:2014-11-20 发布日期:2014-11-25
  • 作者简介:朱玺(1991- ),男,硕士研究生,研究方向为自然语言处理. E-mail:zhuxi910511@163.com

Sentiment analysis of Chinese Micro-blog based on semi-supervised learning

ZHU Xi, DONG Xi-shuang, GUAN Yi, LIU Zhi-guang   

  1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, Heilongjiang, China
  • Received:2014-08-28 Revised:2014-10-21 Online:2014-11-20 Published:2014-11-25

摘要: 微博情感倾向性分析通常指对中文微博中每个句子褒义、贬义或者中性的情感进行自动分类.针对微博碎片化和情感类别失衡的特点,在半监督学习reserved self-training方法的框架基础上提取了适用于微博情感分类的文本特征,并提出了针对情感倾向性分析通过训练度阈值设定的方法来优化reserved self-training迭代终止的条件,在保留reserved self-training能有效处理微博语料中语料情感不平衡问题的优点基础上,防止了训练过度情况的发生.COAE 2014微博情感倾向性评测结果证明了该方法的有效性.

关键词: 情感分析, 训练度阈值, reserved self-training

Abstract: Sentiment analysis of Chinese Micro-blog usually refers to classification of Micro-blogs into positive, negative and neutral polarity. According to the characteristics of Micro-blogs, such as fragmentation and imbalanced of sentiment class, on the basis of reserved self-training method we presented before, text features were extracted that are appropriate for the sentiment analysis of Micro-blog, and then a training degree threshold setup method was proposed to optimize the iteration termination condition of reserved self-training method. These methods not only take advantage of the effective treatment on imbalanced distribution problem but also prevent the overtraining problem in training process. The evaluation result in COAE2014 showed the effectiveness of these methods.

Key words: training degree threshold, sentiment analysis, reserved self-training

中图分类号: 

  • TP391
[1] 王远怀, 于洪彦, 李响. 网络评论如何影响网络购物意愿?[J]. 中大管理研究, 2013, 8(2):1-19. WANG Huaiyuan, YU Hongyan, LI Xiang. How network comment to influence the online shopping intention?[J]. China Management Studies, 2013, 8(2):1-19.
[2] PANG Bo, LEE L, VAITHYANATHAN S. Thumbs up? sentiment classification using machine learning techniques[C]// Proceedings of the 2002 Conference on Empirical Methods In Natural Language Processing. Somerset: ACL, 2002: 79-86.
[3] LIU Z, DONG X, GUAN Y, et al. Reserved self-training: a semi-supervised sentiment classification method for Chinese Micro-blogs[C]// Proceedings of IJCNLP. Somerset: ACL, 2013: 455-462.
[4] BAKLIWAL A, FOSTER J, VAN DER PUIL J, et al. Sentiment analysis of political tweets: towards an accurate classifier[C]// Proceedings of NAACL Workshop on Language Analysis in Social Media. Atlanta, GA, 2013: 49-58.
[5] BARBOSA L, FENG J. Robust sentiment detection on Twitter from biased and noisy data[C]// Proceedings of the 23rd International Conference on Computational Linguistics. Philadelphia, PA, USA: Association for Computational Linguistics, 2010: 36-44.
[6] RUSTAMOY S, CLEMENTS M A. Sentence-level subjectivity detection using neuro-fuzzy models[C]// Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis.Atlanta: Association for Computational Linguistics, 2013: 108-114.
[7] BOLLEN J, PEPE A, MAO Huina. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena[C]// Proceedings of ICWSM.[S.l.]: AAAI Press, 2011: 450-453.
[8] MEENA A, PRABHAKAR T. Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis[M]. Berlin Heidelberg: Springer, 2007: 573-580.
[9] SOCHER R, PENNINGTON J, HUANG E, et al. Semi-supervised recursive autoencoders for predicting sentiment distributions[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Philadelphia, PA, USA: Association for Computational Linguistics, 2011: 151-161.
[10] TAN C, LEE L, TANG J, et al. User-level sentiment analysis incorporating social networks[C]// Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data mining. New York: ACM, 2011: 1397-1405.
[11] LI Shoushan, WANG Zhongqing, ZHOU Guodong, et al. Semi-supervised learning for imbalanced sentiment classification[C]// Proceedings of International Joint Conference on Artificial Intelligence(IJCAI).[S.l.]: AAAI Press, 2011, 22(3):1826-1831.
[12] DONG X, GUAN Y, LI B, et al. Sentiment analysis on Chinese words and sentences based on maximum entropy model[C]// Proceedings of COAE.Shanghai:[s.n.], 2009: 50-58.
[13] BLUMER A, EHRENFEUCHT A, HAUSSLER D, et al. Occam's razor[J]. Information Processing Letters, 1987, 24(6):377-380.
[1] 余传明,冯博琳,田鑫,安璐. 基于深度表示学习的多语言文本情感分析[J]. 山东大学学报(理学版), 2018, 53(3): 13-23.
[2] 陈鑫,薛云,卢昕,李万理,赵洪雅,胡晓晖. 基于保序子矩阵和频繁序列模式挖掘的文本情感特征提取方法[J]. 山东大学学报(理学版), 2018, 53(3): 36-45.
[3] 何炎祥, 刘健博, 孙松涛, 文卫东. 基于层叠条件随机场的微博商品评论情感分类[J]. 山东大学学报(理学版), 2015, 50(11): 67-73.
[4] 朱珠, 李寿山, 戴敏, 周国栋. 结合主动学习和自动标注的评价对象抽取方法[J]. 山东大学学报(理学版), 2015, 50(07): 38-44.
[5] 周文, 张书卿, 欧阳纯萍, 刘志明, 阳小华. 基于情感依存元组的新闻文本主题情感分析[J]. 山东大学学报(理学版), 2014, 49(12): 1-6.
[6] 杨佳能, 阳爱民, 周咏梅. 基于语义分析的中文微博情感分类方法[J]. 山东大学学报(理学版), 2014, 49(11): 14-21.
[7] 孙松涛, 何炎祥, 蔡瑞, 李飞, 贺飞艳. 面向微博情感评测任务的多方法对比研究[J]. 山东大学学报(理学版), 2014, 49(11): 43-50.
[8] 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报(理学版), 2014, 49(11): 22-30.
[9] 张成功1,2,刘培玉1,2*,朱振方1,2,方明1,2. 一种基于极性词典的情感分析方法[J]. J4, 2012, 47(3): 47-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!