JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2014, Vol. 49 ›› Issue (11): 74-81.doi: 10.6040/j.issn.1671-9352.3.2014.328

Previous Articles     Next Articles

Feature selection algorithm based on sentiment topic model

ZHENG Yan1, PANG Lin2, BI Hui2, LIU Wei2, CHENG Gong2   

  1. 1. Beijing Founder Electronics CO., Ltd, Beijing 100085, China;
    2. National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing 100083, China
  • Received:2014-08-28 Revised:2014-10-17 Online:2014-11-20 Published:2014-11-25

Abstract: In order to exert potential commercial value and social value of subjectivity text in enterprise business intelligence and public opinion survey and so on, a novel feature selection algorithm based on sentiment topic model was proposed, which takes both opinion term and opinion co-occurrence term into consideration to help topic modeling, and then the conditional distributions of opinion term in positive topic and negative topic were effectively estimated. This method tries to measure the importance of opinion feature in sentiment orientation. SVM was used in the experimental stage for classification.The experiment result shows that the algorithm has a higher recognition ratio and offers practical capabilities for cross-domain.

Key words: text classification, feature selection, opinion mining, topic model

CLC Number: 

  • TP391
[1] KIM S M, HOVY E. Determining the sentiment of opinions[C]//Proceedings of the 20th International Conference on Computational Linguistics (COLING).Morristown:Association for Computational Linguistics, 2004:1367-1373.
[2] 马柏樟,颜志军. 基于潜在狄利特雷分布模型的网络评论产品特征抽取方法[J]. 计算机集成制造系统,2014,20(1):96-103. MA Baizhang, YAN Zhijun. Product features extraction of online reviews based on LDA model[J]. Computer Integrated Manufacturing Systems, 2014, 20(1):96-103.
[3] KAMAL A, ABULAISH M, ANWAR T. Mining feature-opinion pairs and their reliability scores from web opinion sources [C]//Proceedings of International Conference on Web Intelligence, Mining and Semantics(WIMS'2012). [S.l.]:[s.n.], 2012.
[4] WILSON T, WIEBE J, HOFFMANN P. Recognizing contextual polarity in phrase-level sentiment analysis [C]//Proceedings of Human Language Technologies Conference/Conference on Empirical Methods in Natural Language(HLT/EMNLP 2005). Vancouver, BC, Canada: [s.n.], 2005:347-354.
[5] PANG Bo, LEE L. Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales [C]//Proceedings of the 43rd Annual Meeting of Association for Computational Linguistics. Somerset: ACL, 2005:115-124.
[6] TAN Songbo, ZHANG Jin. An empirical study of sentiment analysis for Chinese documents [J]. Expert Systems with Applications, 2008, 34(4):2622-2629.
[7] 边肇祺, 张学工. 模式识别[M].2版. 北京:清华大学出版社,2000. BIAN Zhaoqi, ZHANG Xuegong. Pattern recognition[M]. 2nd. Beijing: Tsinghua University Press, 2000.
[8] APTE C. Automated learning of decision rules for text categorization [J]. ACM transactions on information systems, 1994, 12: 233-251.
[9] YANG Yiming, PEDERSON J O. A comparative study on feature selection in text categorization [C]//Proceedings of the 14th International Conference on Machine Learning. [S.l.]:[s.n.], 1997: 412-420.
[10] WHITERLAW C, GARG N, ARGAMON S. Using appraisal groups for sentiment analysis [C]//Proceedings of International Conference on Information and Know-ledge Management(CIKM'2005). New York: ACM, 2005: 625-631.
[11] FU Guohong, WANG Xin. Chinese sentence-level sentiment classification based on fuzzy sets[C]//Proceedings of, International Conference on Computational Linguistics(Coling'2010). Beijing, China, 2010: 312-319.
[12] GUO H, ZHU H, GUO Z. Domain customization for aspect-oriented opinion analysis with multi-level latent sentiment clues [C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. Maui, HI, USA, 2011: 2493-2496.
[13] 徐琳宏, 林鸿飞, 杨志豪. 基于语义理解的文本倾向性识别机制[J]. 中文信息学报, 2007, 21(1):96-100. XU Linhong, LIN Hongfei, YANG Zhihao. Text orientation indentification based on semantic comprehension[J]. Chinese Information Processing, 2007, 21(1): 96-100.
[14] LANDAUER T K, FOLTZ P W, LAHAM D. An introduction to latent semantic analysis[J]. Discourse Processes, 1998, 25(2): 259-284.
[15] HOFMANN T. Probabilistic latent semantic indexing [C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1999: 50-57.
[16] BLEI D M, NG A Y, JORDAN M I, et al. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[17] BLEI D M, MCAULIFFE J D. Supervised topic models [EB/OL]. [2014-04-09].http://arxiv.org/pdf/1003.0783v1.pdf.
[18] RAMAGE D, HALL D, NALLAPATI R. Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora [C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language. Philadelphia,PA,USA: Association for Computational Linguistics, 2009: 248-256.
[19] ALSUMAIT L, BARBARA D, DOMENICONI C. On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking [C]//Proceedings of the 8th IEEE International Conference on Data Mining (ICDM'08). Washington: IEEE Computer Society, 2008: 3-12.
[20] YAN Xiaohui, GUO Jiafeng, LAN Yanyan. A biterm topic model for short texts [C]//Proceedings of the 22nd International Conference on World Wide Web. Brazil: [s.n.], 2013: 1445-1456.
[21] TAN S B. Chinese sentiment corpus. [DB/OL]. [2014-04-09]. http://www.searchforum.org.cn/tansongbo/senti_ corpus.jsp.
[22] CHANG C C, LIN C J. LIBSVM: a library for support vector machines [CP/OL]. [2014-04-09]. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[1] HUANG Tian-yi, ZHU William. Cost-sensitive feature selection via manifold learning [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(3): 91-96.
[2] WAN Zhong-ying, WANG Ming-wen, ZUO Jia-li, WAN Jian-yi. Feature selection combined with the global and local information(GLFS) [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(5): 87-93.
[3] LI Zhao,SUN Zhan-,LI Xiao,LI Cheng,. Study on feature selection method based on information loss [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(11): 7-12.
[4] MA Yu-feng, RUAN Tong. Entity set expansion based on LDA and label propagation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(03): 20-27.
[5] MA Cheng-long, JIANG Ya-song, LI Yan-ling, ZHANG Yan, YAN Yong-hong. Short text classification based on word embedding similarity [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(12): 18-22.
[6] LUO Yi, LI Li, TAN Song-bo, CHENG Xue-qi. Sentiment analysis on Chinese Micro-blog corpus [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 1-7.
[7] XIA Meng-nan, DU Yong-ping, ZUO Ben-xin. Micro-blog opinion analysis based on syntactic dependency and feature combination [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 22-30.
[8] PAN Qing-qing, ZHOU Feng, YU Zheng-tao, GUO Jian-yi, XIAN Yan-tuan. Recognition method of Vietnamese named entity based on#br# conditional random fields [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(1): 76-79.
[9] WANG Shao-peng, PENG Yan, WANG Jie. Research of the text clustering based on LDA using in network public opinion analysis [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(09): 129-134.
[10] JIAO Lu-lin, PENG Yan, LIN Yun. Comparative research on text knowledge discovery for network public opinion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(09): 62-68.
[11] YU Ran 1,2, LIU Chun-yang3*, JIN Xiao-long 1, WANG Yuan-zhuo 1, CHENG Xue-qi 1. Chinese spam microblog filtering based on the fusion of
multi-angle features
[J]. J4, 2013, 48(11): 53-58.
[12] SHI Cun-hui, LIN Hong-fei*. Tracking event microblogs: a streaming dynamic topic model [J]. J4, 2012, 47(5): 13-18.
[13] YI Chao-qun, LI Jian-ping, ZHU Cheng-wen. A kind of feature selection based on classification accuracy of SVM [J]. J4, 2010, 45(7): 119-121.
[14] YANG Yu-Zhen, LIU Pei-Yu, SHU Zhen-Fang, QIU Ye. Research of an improved information gain methodusing distribution information of terms [J]. J4, 2009, 44(11): 48-51.
[15] YUAN Xiao-hang,DU Xiao-yong . iRIPPER: an improved rule-based text categorization algorithm [J]. J4, 2007, 42(11): 66-68 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!