您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2014, Vol. 49 ›› Issue (11): 22-30.doi: 10.6040/j.issn.1671-9352.3.2014.074

• 论文 • 上一篇    下一篇

基于依存分析与特征组合的微博情感分析

夏梦南, 杜永萍, 左本欣   

  1. 北京工业大学计算机学院, 北京 100124
  • 收稿日期:2014-08-28 修回日期:2014-10-17 出版日期:2014-11-20 发布日期:2014-11-25
  • 作者简介:夏梦南(1990- ),女,硕士研究生,研究方向为情感分析. E-mial:sxlfguoliang@126.com
  • 基金资助:
    国家科技支撑计划子课题项目(2013BAH21B02-01);北京市自然科学基金资助项目(4123091);北京市属高等学校人才强教深化计划“中青年骨干人才培养计划”项目(PHR20110815)

Micro-blog opinion analysis based on syntactic dependency and feature combination

XIA Meng-nan, DU Yong-ping, ZUO Ben-xin   

  1. College of Computer Science, Beijing University of Technology, Beijing 100124, China
  • Received:2014-08-28 Revised:2014-10-17 Online:2014-11-20 Published:2014-11-25

摘要: 针对微博短文本存在口语化、简洁化等社交网络特征,充分利用句法依存关系以及条件随机场(conditional random fields, CRFs),抽取候选评价对象,并在基于机器学习的微博情感分类方法的基础上结合情感分析词典,引入情感值、微博标签、主题等特征,优化分类性能.在COAE(Chinese opinion analysis evaluation)微博评测数据集上,以准确率、召回率、F1值为评价指标对所提方法进行验证,证实了基于句法依存分析与CRFs相结合的评价对象抽取方法的有效性,分析了各类特征对情感分类性能的影响,最终在COAE微博观点句识别任务中准确率达91.4%.

关键词: 情感分析, 特征选择, 情感要素抽取

Abstract: Micro-blog opinion mining faces the difficulty because of the short text's conciseness. The technique of syntactic dependency relation analysis and CRFs(Conditional Random Fields) were combined to extract the candidate opinion objects. And then the dictionaries of the opinion analysis and all kinds of semantic features were used in the machine learning method to improve the performance of the opinion classification. The precision, recall and F1 values were used as the evaluation metric. The experimental results on the COAE(Chinese opinion analysis evaluation) data set verify both the validity of emotion factor extraction approach and the impact on opinion classification performance by different features. The macro and micro precisions for the opinion classification task are both 91.4%.

Key words: opinion mining, emotion factor extraction, feature selection

中图分类号: 

  • TP391
[1] 文坤梅,徐帅,李瑞轩,等. 微博及中文微博信息处理研究综述[J]. 中文信息学报, 2012,26(6):27-37. WEN Kunmei, XU Shuai, LI Ruixuan, et al. Survey of Micro-blog and Chinese Microblog information processing[J]. Journal of Chinese Information Processing, 2012, 26(6):27-37.
[2] 杜伟夫,谭松波,云晓春,等. 一种新的情感词汇语义倾向计算方法[J]. 计算机研究与发展,2009,26(10):1713-1720. DU Weifu, TAN Songbo, YUN Xiaochun, et al. A new method to compute semantic orientation[J]. Journal of Computer Research and Development, 2009, 26(10):1713-1720.
[3] 李寿山,李逸薇,黄居仁,等. 基于双语信息和标签传播算法的中文情感词典构建方法[J]. 中文信息学报,2013,27(06):75-81. LI Shoushan, LI Yiwei, HUANG Juren, et al. Construction of Chinese sentiment lexicon using bilingual information and label propagation algorithm[J]. Journal of Chinese Information Processing, 2013, 27(06):75-81.
[4] PANG Bo, LEE L, VAITHYANATHAN S. Thumbs up? Sentiment classification using machine learning techniques[C]// Proceedings of 2002 Conference on Empirical Methods in Natural Language Processing. Somerset: ACL, 2002: 79-86.
[5] PANG Bo, LEE Lilian. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts[C]// Proceedings of the 42nd Meeting of the Association for Computational Linguistics. Philadelphia,PA,USA: Association for Computational Linguistics, 2004: 271-278.
[6] 孙艳,周学广,付伟. 基于主题情感混合模型的无监督文本情感分析[J]. 北京大学学报,2013,49(01):102-108. SUN Yan, ZHOU Xueguang, FU Wei. Unsupervised topic and sentiment unification model for sentiment analysis[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2013, 49(01):102-108.
[7] 谢丽星,周明,孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报, 2012,26(01):73-83. XIE Lixing, ZHOU Ming, SUN Maosong. Hierarchical structure based hybrid approach to sentiment analysis of Chinese Micro-blog and its feature extraction[J]. Journal of Chinese Information Processing, 2012, 26(01):73-83.
[8] 曹海涛. 基于PAD模型的中文微博情感分析研究[D]. 大连:大连理工大学计算机应用技术系,2013. CAO Haitao. Chinese Micro-blog sentiment analysis based on the PAD model[D]. Dalian: Dalian University of Technology, 2013.
[9] MEI Qiaozhu, LING Xu, MATTHEW W, et al. Topic sentiment mixture: modeling facets and opinions in weblogs[C]// Proceedings of the 16th International Conference on World Wide Web. Banff, Alberta, Canada, 2007: 171-180.
[10] 张想. 面向热点话题型微博的情感分析研究[D]. 哈尔滨:哈尔滨工业大学,2013. ZHANG Xiang. Research on sentiment analysis for hot topic Micro-blog[D]. Harbin: Harbin Institute of Technology, 2013.
[11] 张珊,于留宝,胡长军. 基于表情图片与情感词的中文微博情感分析[J]. 计算机科学, 2012,39(11):146-148. ZHANG Shan, YU Liubao, HU Changjun. Sentiment analysis of Chinese Micro-blogs based on emoticons and emotional words[J]. Computer Science, 2012, 39(11):146-148.
[12] QIU Guang, LIU Bing, BU Jianjun, et al. Expanding domain sentiment lexicon through double propagation[C]// Proceedings of the 21st Internation Joint Conference on Artifical Intelligence (IJCAI-09). Freiburg: IJCAI-INT, 2009: 1199-1204.
[13] LIU Zitao, YU Wenchao, CHEN Wei, et al. Short text feature selection for Micro-blog mining[C]// Proceedings of International Conference on Computational Intelligence and Software Engineering (CiSE 2010). Piscataway: IEEE, 2010: 1-4.
[1] 余传明,冯博琳,田鑫,安璐. 基于深度表示学习的多语言文本情感分析[J]. 山东大学学报(理学版), 2018, 53(3): 13-23.
[2] 陈鑫,薛云,卢昕,李万理,赵洪雅,胡晓晖. 基于保序子矩阵和频繁序列模式挖掘的文本情感特征提取方法[J]. 山东大学学报(理学版), 2018, 53(3): 36-45.
[3] 黄天意,祝峰. 基于流形学习的代价敏感特征选择[J]. 山东大学学报(理学版), 2017, 52(3): 91-96.
[4] 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93.
[5] 李钊,孙占全,李晓,李诚. 基于信息损失量的特征选择方法研究及应用[J]. 山东大学学报(理学版), 2016, 51(11): 7-12.
[6] 何炎祥, 刘健博, 孙松涛, 文卫东. 基于层叠条件随机场的微博商品评论情感分类[J]. 山东大学学报(理学版), 2015, 50(11): 67-73.
[7] 朱珠, 李寿山, 戴敏, 周国栋. 结合主动学习和自动标注的评价对象抽取方法[J]. 山东大学学报(理学版), 2015, 50(07): 38-44.
[8] 周文, 张书卿, 欧阳纯萍, 刘志明, 阳小华. 基于情感依存元组的新闻文本主题情感分析[J]. 山东大学学报(理学版), 2014, 49(12): 1-6.
[9] 杨佳能, 阳爱民, 周咏梅. 基于语义分析的中文微博情感分类方法[J]. 山东大学学报(理学版), 2014, 49(11): 14-21.
[10] 朱玺, 董喜双, 关毅, 刘志广. 基于半监督学习的微博情感倾向性分析[J]. 山东大学学报(理学版), 2014, 49(11): 37-42.
[11] 孙松涛, 何炎祥, 蔡瑞, 李飞, 贺飞艳. 面向微博情感评测任务的多方法对比研究[J]. 山东大学学报(理学版), 2014, 49(11): 43-50.
[12] 郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报(理学版), 2014, 49(11): 74-81.
[13] 于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58.
[14] 张成功1,2,刘培玉1,2*,朱振方1,2,方明1,2. 一种基于极性词典的情感分析方法[J]. J4, 2012, 47(3): 47-50.
[15] 易超群,李建平,朱成文. 一种基于分类精度的特征选择支持向量机[J]. J4, 2010, 45(7): 119-121.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!