您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2015, Vol. 50 ›› Issue (07): 38-44.doi: 10.6040/j.issn.1671-9352.3.2014.106

• 论文 • 上一篇    下一篇

结合主动学习和自动标注的评价对象抽取方法

朱珠, 李寿山, 戴敏, 周国栋   

  1. 苏州大学自然语言处理实验室, 江苏 苏州 215006
  • 收稿日期:2015-03-03 出版日期:2015-07-20 发布日期:2015-07-31
  • 通讯作者: 李寿山(1980-),男,博士,教授,研究方向为自然语言处理.E-mail:shoushan.li@gmail.com E-mail:shoushan.li@gmail.com
  • 作者简介:朱珠(1991-),女,硕士研究生,研究方向为自然语言处理.E-mail:zhuzhu0020@gmail.com
  • 基金资助:
    国家自然科学基金资助项目(61375073)

Opinion target extraction with active-learning and automatic annotation

ZHU Zhu, LI Shou-shan, DAI Min, ZHOU Guo-dong   

  1. Natural Language Processing Lab, Soochow University, Suzhou 215006, Jiangsu, China
  • Received:2015-03-03 Online:2015-07-20 Published:2015-07-31

摘要: 提出了结合主动学习和自动标注的评价对象抽取方法。具体实现过程中,首先,利用少量的已标注样本训练分类器,对非标注样本进行测试,获取自动标注结果及其置信度:其次,通过置信度计算每个样本的整体置信度,挑选出低置信度即不确定性高的样本待标注:最后,对待标注样本中置信度低的词语进行人工标注,而置信度高的部分则采用自动标注结果。实验表明,该方法可以在确保抽取性能的同时有效地减小人工标注语料的开销。

关键词: 情感分析, 评价对象抽取, 主动学习, 自动标注

Abstract: An opinion target extraction method combined active-learning and automatic annotation is introduced. Firstly, the results of automatically annotation with the confidence are obtained by using a few of labeled corpus to train the classifier to test the unlabeled samples: secondly, the samples of low confidence is annotated by calculating the confidence of every sample: finally, the words of low confidence in the selected samples is annotated manually, while the others are adopted the results of automatic annotation. The empirical results demonstrate that the proposed method effectively reduces the annotation cost and achieves good performance on opinion target extraction.

Key words: opinion target extraction, active-learning, automatic annotation, sentiment analysis

中图分类号: 

  • TP391
[1] PANG Bo, LEE L. Opinion mining and sentiment analysis[J]. Foundations and Trends in Information Retrieval, 2008, 2(1-2):1-135.
[2] PANG Bo, LEE L, VAITHYANATHAN S. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of EMNLP-02. Stroudsburg: Association for Computational Linguistics, 2002:79-86.
[3] 赵妍妍,秦兵,刘挺.文本情感分析[J]. 软件学报, 2010, 21(8):1834-1848. ZHAO Yanyan, QIN Bing, LIU Ting. Sentiment analysis[J]. Journal of Software, 2010, 21(8):1834-1848.
[4] LEWIS D, GALE W. Training text classifiers by uncertainty sampling[C]//Proceedings of SIGIR-94.London:Springer-verlag, 1994: 3-12.
[5] HU Minqing, LIU B. Mining opinion features in customer reviews[C]//Proceedings of AAAI-2004. California: AAAI Press, 2004: 755-760.
[6] LI Binyang, ZHOU L, FENG S, et al. A unified graph model for sentence-based opinion retrieval[C]// Proceedings of ACL.Stroudsburg:Association for Computational Linguistics, 2010:1367-1375.
[7] ZHUANG Li, JING F, ZHU X. Movie review mining and summarization[C]//Proceedings of CIKM-2006. New York: ACM, 2006: 43-50.
[8] JAKOB N. GUREVYCH I. Extracting opinion targets in a single and cross-domain setting with conditional random fields[C]//Proceedings of EMNLP-2010.Stroudsburg: Association for Computational Linguistics, 2010: 1035-1045.
[9] 王荣洋,鞠久鹏,李寿山,等. 基于CRFs的评价对象抽取特征研究[J]. 中文信息学报,2012,26(2):56-61. WANG Rongyang, JU Jiupeng, LI Shoushan, et al. Feature engineering for CRFs based opinion target extraction[J]. Journal of Chinese Information Processing, 2012, 26(2):56-61.
[10] 龙军, 殷建平, 祝恩, 等. 主动学习研究综述[J]. 计算机研究与发展, 2008, 45(S1):300-304. LONG Jun, YIN Jianping, ZHU En, et al. A survey of active learning[J]. Journal of Computer Research and Development, 2008, 45(Suppl):300-304.
[11] FREUND Y, SEUNG H S, SHAMIR E, et al. Selective sampling using the query by committee algorithm[J]. Machine learning, 1997, 28(2-3): 133-168.
[12] LEWIS D D, GALE W A. A sequential algorithm for training text classifiers[C]//Proceedings of the 17th ACM Int'l Conf on Research and Development in Information Retrieval. New York: Springer-verlag, 1994: 3-12.
[13] MUSLEA I, MINTON S, KNOBLOCK C. Active learning with multiple view[J]. Journal of Artificial Intelligence Research, 2006,27: 203-233
[14] MCCALLUM A, NIGAM K. Employing EM in pool-based active learning for text classification[C]//Proceedings of the 15th Int'l Conf on Machine Learning. New York: ACM, 1998: 500-512.
[15] MUSLEA I, MINTON S, KNOBLOCK C A. Active learning with multiple views[J]. Journal of Artificial Intelligence Research, 2006, 27(1):203-233.
[16] 宗成庆. 统计自然语言处理[M]. 北京:清华大学出版社,2008:1-475. ZONG Chengqing. Statistics natural language processing[M]. Beijing: Tsinghua University Press, 2008:1-475
[1] 余传明,冯博琳,田鑫,安璐. 基于深度表示学习的多语言文本情感分析[J]. 山东大学学报(理学版), 2018, 53(3): 13-23.
[2] 陈鑫,薛云,卢昕,李万理,赵洪雅,胡晓晖. 基于保序子矩阵和频繁序列模式挖掘的文本情感特征提取方法[J]. 山东大学学报(理学版), 2018, 53(3): 36-45.
[3] 陈兴俊,魏晶晶,廖祥文,简思远,陈国龙. 基于词对齐模型的中文评价对象与评价词抽取[J]. 山东大学学报(理学版), 2016, 51(1): 58-64.
[4] 何炎祥, 刘健博, 孙松涛, 文卫东. 基于层叠条件随机场的微博商品评论情感分类[J]. 山东大学学报(理学版), 2015, 50(11): 67-73.
[5] 周文, 张书卿, 欧阳纯萍, 刘志明, 阳小华. 基于情感依存元组的新闻文本主题情感分析[J]. 山东大学学报(理学版), 2014, 49(12): 1-6.
[6] 杨佳能, 阳爱民, 周咏梅. 基于语义分析的中文微博情感分类方法[J]. 山东大学学报(理学版), 2014, 49(11): 14-21.
[7] 朱玺, 董喜双, 关毅, 刘志广. 基于半监督学习的微博情感倾向性分析[J]. 山东大学学报(理学版), 2014, 49(11): 37-42.
[8] 孙松涛, 何炎祥, 蔡瑞, 李飞, 贺飞艳. 面向微博情感评测任务的多方法对比研究[J]. 山东大学学报(理学版), 2014, 49(11): 43-50.
[9] 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报(理学版), 2014, 49(11): 22-30.
[10] 张成功1,2,刘培玉1,2*,朱振方1,2,方明1,2. 一种基于极性词典的情感分析方法[J]. J4, 2012, 47(3): 47-50.
[11] 杨洋,王立宏*,刘其成. 一种主动式的半监督最近邻学习方法[J]. J4, 2011, 46(5): 110-115.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!