您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2014, Vol. 49 ›› Issue (11): 43-50.doi: 10.6040/j.issn.1671-9352.3.2014.016

• 论文 • 上一篇    下一篇

面向微博情感评测任务的多方法对比研究

孙松涛, 何炎祥, 蔡瑞, 李飞, 贺飞艳   

  1. 武汉大学计算机学院, 湖北 武汉 430072
  • 收稿日期:2014-08-28 修回日期:2014-10-17 出版日期:2014-11-20 发布日期:2014-11-25
  • 作者简介:孙松涛(1986- ),男,博士研究生,研究方向为自然语言处理、情感分析和信息检索. E-mail: stsun@whu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(61303115);武汉市科技攻关项目(201210421135)

Comparative study of methods for Micro-blog sentiment evaluation tasks

SUN Song-tao, HE Yan-xiang, CAI Rui, LI Fei, HE Fei-yan   

  1. Computer School, Wuhan University, Wuhan 430072, Hubei, China
  • Received:2014-08-28 Revised:2014-10-17 Online:2014-11-20 Published:2014-11-25

摘要: 主要对COAE 2014评测中采用的算法进行描述,并结合评测结果进行分析比较.本次评测共有5个任务,本文重点描述与微博相关的3个任务.在微博情感新词发现和判断的任务中,方法核心是利用谷歌翻译算法的对齐操作来获得候选新词,之后使用平均点互信息筛选高频词语.在微博倾向性分析任务中采用两种方法,一种是传统的基于情感词典的极性判断方法,另一种是结合情感词标注的基于条件随机场CRFs的极性判断方法.在微博观点句要素抽取任务中,首先利用名词在复杂网络中的中介性和趋近性提取候选产品名和属性名,然后分别采用3种方法完成对产品属性名的抽取,其中,第一种方法是基于简单规则的滑动窗口抽取策略,后面两种方法都是基于CRFs的有监督抽取策略.

关键词: 情感分析, 复杂网络, CRFs, 评价对象抽取, 微博

Abstract: This paper was a report on COAE2014. The methods to solve the tasks were described, and deeply analyzed by referring to the results. There were 5 different tasks in this year's contest, 3 of which were related to Micro-blog and were focused in this paper. In the new sentiment words discovering and determining of Micro-blog task, the important processes was extracting candidate new words by using the alignment results of Google translation service, then filtering frequent words by ranking their PMI. In the sentiment classification of Micro-blog task, two different methods were used to solve the problem. One was based on sentiment lexicon which was the traditional method. The other was based on CRFs combining the sentiment lexicon. The last task was to extract opinion aspects from Micro-blog and then to determine the sentiment on them. Firstly, the phrases that represent the products' name and aspects were extracted according the betweenness and closeness of the complex network formed by all the nouns in two steps respectively. Then, three methods were introduced to extract the exact product aspects and its sentiment. The first one was based on simple rules which extracted phrases in the sliding window. The other two were supervised learning procedures which were all based on CRFs.

Key words: sentiment analysis, CRFs, Micro-blog, aspect sentiment extraction, complex network

中图分类号: 

  • TP391
[1] NASUKAWA T, YI J H. Sentiment analysis: capturing favorability using natural language processing[C]//Proceedings of the 2nd International Conference on Knowledge Capture(K-CAP'03). New York: ACM, 2003:70-77.
[2] DAVE K, LAWRENCE S, PENNOCK D M. Mining the peanut gallery: opinion extraction and semantic classification of product reviews[C]//Proceedings of the 12th International Conference on World Wide Web. New York: ACM, 2003:519-528.
[3] PANG Bo, LEE L. Opinion mining and sentiment analysis[J]. Foundations and Trends in Information Retrieval, 2008, 2(1-2):1-135.
[4] 赵妍妍,秦兵,刘挺. 文本情感分析[J]. 软件学报, 2010, 21(8):1834-1848. ZHAO Yanyan, QIN Bing, LIU Ting. Sentiment analysis[J]. Journal of Software, 2010, 21(8):1834-1848.
[5] LIU Bing. Sentiment analysis and opinion mining[J]. Synthesis Lectures on Human Language Technologies, 2012, 5(1):1-167.
[6] 崔世起,刘群,孟遥,等. 基于大规模语料库的新词检测[J]. 计算机研究与发展, 2006, 43(5):927-932. CUI Shiqi, LIU Qun, MENG Yao, et al. New word detection based on large-scale corpus[J]. Journal of Computer Research and Development, 2006, 43(5):927-932.
[7] 张海军,栾静,李勇,等. 基于统计学习框架的中文新词检测方法[J]. 计算机科学, 2012, 39(2):232-235. ZHNAG Haijun, LUAN Jing, LI Yong, et al. Method of new Chinese word detection based on statistical learning framework[J]. Computer Science, 2012, 39(2):232-235.
[8] 姚天昉,娄德成. 汉语语句主题语义倾向分析方法的研究[J]. 中文信息学报, 2007, 21(5):73-79. YAO Tianfang, LOU Decheng. Research on semantic orientation analysis for topics in Chinese sentences[J]. Journal of Chinese Information Processing, 2007, 21(5): 73-79.
[9] SHEN Yan, LI Shuchen, ZHENG Ling, et al. Emotion mining research on Micro-blog[C]//Proceedings of the 1st IEEE Symposium on Web Society Web Society(SWS'09).Washington: IEEE Computer Society, 2009:71-75.
[10] 谢丽星,周明,孙茂松,等.基于层次结构的多策略中文微博情感分析和特征抽取[J].中文信息学报, 2012,26(1):73-83. XIE Lixing, ZHOU Ming, SUN Maosong. Hierarchical structure based hybrid approach to sentiment analysis of chinese Micro-blog and its feature extraction[J]. Journal of Chinese Information Processing, 2012, 26(1):73-83.
[11] WANG Xiaolong,WEI Furu, LIU Xiaohua, et al. Topic sentiment analysis in Twitter: a graph-based hashtag sentiment classification approach[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. New York: ACM, 2011:1031-1040.
[12] KIM S, HOVY E. Extracting opinions, opinion holders, and topics expressed in online news media text[C]//Proceedings of the Workshop on Sentiment and Subjectivity in Text. New York: ACM, 2006:1-8.
[13] HU Minqing, LIU Bing. Mining and summarizing customer reviews[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04).New York: ACM, 2004:168-177.
[14] JAKOB N, GUREVYCH I. Extracting opinion targets in a single-and cross-domain setting with conditional random fields[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP'10).Stroudsburg, PA, USA: Association for Computational Linguistics, 2010:1035-1045.
[15] 王荣洋,鞠久鹏,李寿山,等. 基于CRFs 的评价对象抽取特征研究[J]. 中文信息学报, 2012, 26(2):56-61. WANG Rongyang, JU Jiupeng, LI Shoushan, et al. Feature engineering for crfs-based opinion target extraction[J]. Journal of Chinese Information Processing, 2012, 26(2):56-61.
[1] 余传明,冯博琳,田鑫,安璐. 基于深度表示学习的多语言文本情感分析[J]. 山东大学学报(理学版), 2018, 53(3): 13-23.
[2] 陈鑫,薛云,卢昕,李万理,赵洪雅,胡晓晖. 基于保序子矩阵和频繁序列模式挖掘的文本情感特征提取方法[J]. 山东大学学报(理学版), 2018, 53(3): 36-45.
[3] 张聪,裴家欢,黄锴宇,黄德根,殷章志. 基于语义图优化算法的中文微博观点摘要研究[J]. 山东大学学报(理学版), 2017, 52(7): 59-65.
[4] 张中军,张文娟,于来行,李润川. 基于网络距离和内容相似度的微博社交网络社区划分方法[J]. 山东大学学报(理学版), 2017, 52(7): 97-103.
[5] 王亚奇,王静. 考虑好奇心理机制的动态复杂网络谣言传播研究[J]. 山东大学学报(理学版), 2017, 52(6): 99-104.
[6] 胡默之,姚天昉. 中文微博观点句识别及评价对象抽取方法[J]. 山东大学学报(理学版), 2016, 51(7): 81-89.
[7] 孙赫,李淑琴,吕学强,刘克会. 微博城市投诉文本中的地理位置实体识别[J]. 山东大学学报(理学版), 2016, 51(3): 77-85.
[8] 朱梦珺,蒋洪迅,许伟. 基于金融微博情感与传播效果的股票价格预测[J]. 山东大学学报(理学版), 2016, 51(11): 13-25.
[9] 陈兴俊,魏晶晶,廖祥文,简思远,陈国龙. 基于词对齐模型的中文评价对象与评价词抽取[J]. 山东大学学报(理学版), 2016, 51(1): 58-64.
[10] 何炎祥, 刘健博, 孙松涛, 文卫东. 基于层叠条件随机场的微博商品评论情感分类[J]. 山东大学学报(理学版), 2015, 50(11): 67-73.
[11] 王立人, 余正涛, 王炎冰, 高盛祥, 李贤慧. 基于有指导LDA用户兴趣模型的微博主题挖掘[J]. 山东大学学报(理学版), 2015, 50(09): 36-41.
[12] 昝红英, 吴泳钢, 贾玉祥, 牛桂玲. 基于多源知识的中文微博命名实体链接[J]. 山东大学学报(理学版), 2015, 50(07): 9-16.
[13] 朱珠, 李寿山, 戴敏, 周国栋. 结合主动学习和自动标注的评价对象抽取方法[J]. 山东大学学报(理学版), 2015, 50(07): 38-44.
[14] 周超, 严馨, 余正涛, 洪旭东, 线岩团. 融合词频特性及邻接变化数的微博新词识别[J]. 山东大学学报(理学版), 2015, 50(03): 6-10.
[15] 唐波, 陈光, 王星雅, 王非, 陈小慧. 微博新词发现及情感倾向判断分析[J]. 山东大学学报(理学版), 2015, 50(01): 20-25.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!