JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2014, Vol. 49 ›› Issue (11): 74-81.doi: 10.6040/j.issn.1671-9352.3.2014.328

Previous Articles     Next Articles

Feature selection algorithm based on sentiment topic model

ZHENG Yan1, PANG Lin2, BI Hui2, LIU Wei2, CHENG Gong2   

  1. 1. Beijing Founder Electronics CO., Ltd, Beijing 100085, China;
    2. National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing 100083, China
  • Received:2014-08-28 Revised:2014-10-17 Online:2014-11-20 Published:2014-11-25

Abstract: In order to exert potential commercial value and social value of subjectivity text in enterprise business intelligence and public opinion survey and so on, a novel feature selection algorithm based on sentiment topic model was proposed, which takes both opinion term and opinion co-occurrence term into consideration to help topic modeling, and then the conditional distributions of opinion term in positive topic and negative topic were effectively estimated. This method tries to measure the importance of opinion feature in sentiment orientation. SVM was used in the experimental stage for classification.The experiment result shows that the algorithm has a higher recognition ratio and offers practical capabilities for cross-domain.

Key words: text classification, feature selection, opinion mining, topic model

CLC Number: 

  • TP391
[1] KIM S M, HOVY E. Determining the sentiment of opinions[C]//Proceedings of the 20th International Conference on Computational Linguistics (COLING).Morristown:Association for Computational Linguistics, 2004:1367-1373.
[2] 马柏樟,颜志军. 基于潜在狄利特雷分布模型的网络评论产品特征抽取方法[J]. 计算机集成制造系统,2014,20(1):96-103. MA Baizhang, YAN Zhijun. Product features extraction of online reviews based on LDA model[J]. Computer Integrated Manufacturing Systems, 2014, 20(1):96-103.
[3] KAMAL A, ABULAISH M, ANWAR T. Mining feature-opinion pairs and their reliability scores from web opinion sources [C]//Proceedings of International Conference on Web Intelligence, Mining and Semantics(WIMS'2012). [S.l.]:[s.n.], 2012.
[4] WILSON T, WIEBE J, HOFFMANN P. Recognizing contextual polarity in phrase-level sentiment analysis [C]//Proceedings of Human Language Technologies Conference/Conference on Empirical Methods in Natural Language(HLT/EMNLP 2005). Vancouver, BC, Canada: [s.n.], 2005:347-354.
[5] PANG Bo, LEE L. Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales [C]//Proceedings of the 43rd Annual Meeting of Association for Computational Linguistics. Somerset: ACL, 2005:115-124.
[6] TAN Songbo, ZHANG Jin. An empirical study of sentiment analysis for Chinese documents [J]. Expert Systems with Applications, 2008, 34(4):2622-2629.
[7] 边肇祺, 张学工. 模式识别[M].2版. 北京:清华大学出版社,2000. BIAN Zhaoqi, ZHANG Xuegong. Pattern recognition[M]. 2nd. Beijing: Tsinghua University Press, 2000.
[8] APTE C. Automated learning of decision rules for text categorization [J]. ACM transactions on information systems, 1994, 12: 233-251.
[9] YANG Yiming, PEDERSON J O. A comparative study on feature selection in text categorization [C]//Proceedings of the 14th International Conference on Machine Learning. [S.l.]:[s.n.], 1997: 412-420.
[10] WHITERLAW C, GARG N, ARGAMON S. Using appraisal groups for sentiment analysis [C]//Proceedings of International Conference on Information and Know-ledge Management(CIKM'2005). New York: ACM, 2005: 625-631.
[11] FU Guohong, WANG Xin. Chinese sentence-level sentiment classification based on fuzzy sets[C]//Proceedings of, International Conference on Computational Linguistics(Coling'2010). Beijing, China, 2010: 312-319.
[12] GUO H, ZHU H, GUO Z. Domain customization for aspect-oriented opinion analysis with multi-level latent sentiment clues [C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. Maui, HI, USA, 2011: 2493-2496.
[13] 徐琳宏, 林鸿飞, 杨志豪. 基于语义理解的文本倾向性识别机制[J]. 中文信息学报, 2007, 21(1):96-100. XU Linhong, LIN Hongfei, YANG Zhihao. Text orientation indentification based on semantic comprehension[J]. Chinese Information Processing, 2007, 21(1): 96-100.
[14] LANDAUER T K, FOLTZ P W, LAHAM D. An introduction to latent semantic analysis[J]. Discourse Processes, 1998, 25(2): 259-284.
[15] HOFMANN T. Probabilistic latent semantic indexing [C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1999: 50-57.
[16] BLEI D M, NG A Y, JORDAN M I, et al. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[17] BLEI D M, MCAULIFFE J D. Supervised topic models [EB/OL]. [2014-04-09].http://arxiv.org/pdf/1003.0783v1.pdf.
[18] RAMAGE D, HALL D, NALLAPATI R. Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora [C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language. Philadelphia,PA,USA: Association for Computational Linguistics, 2009: 248-256.
[19] ALSUMAIT L, BARBARA D, DOMENICONI C. On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking [C]//Proceedings of the 8th IEEE International Conference on Data Mining (ICDM'08). Washington: IEEE Computer Society, 2008: 3-12.
[20] YAN Xiaohui, GUO Jiafeng, LAN Yanyan. A biterm topic model for short texts [C]//Proceedings of the 22nd International Conference on World Wide Web. Brazil: [s.n.], 2013: 1445-1456.
[21] TAN S B. Chinese sentiment corpus. [DB/OL]. [2014-04-09]. http://www.searchforum.org.cn/tansongbo/senti_ corpus.jsp.
[22] CHANG C C, LIN C J. LIBSVM: a library for support vector machines [CP/OL]. [2014-04-09]. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[1] WU Xiaojun, CHEN Yidan, HAO Yaojun, SONG Changwei, HE Deqing. Multi-label feature selection with label manifold and dynamic graph constraints [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2025, 60(7): 69-83.
[2] Chao LI,Wei LIAO. Chinese disease text classification model driven by medical knowledge [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 122-130.
[3] CHENG Yuxuan, MAO Yu, ZHANG Xiaoqing, ZENG Yixiang, LIN Yaojin. Online multi-label feature selection based on sub-correlation features and neighborhood mutual information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 70-81.
[4] GAO Hefei, LI Yan, WANG Shuo. Feature selection for partial label learning based on neighborhood rough sets [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 100-113.
[5] ZHU Liquan, LIN Yaojin, MAO Yu, CHENG Yuxuan. Multi-label online stream feature selection based on high-dimensional correlation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 90-99.
[6] Chunyu SHI,Yu MAO,Haoyang LIU,Yaojin LIN. Hierarchical feature selection algorithm based on instance correlations [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 61-70.
[7] WANG Tinghua, HU Zhenwei, ZHAN Hongxiang. A novel unsupervised feature selection method [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(12): 130-140.
[8] MENG Jinxu, SHAN Hongtao, HUANG Runcai, YAN Fengting, LI Zhiwei, ZHENG Guangyuan, LIU Yiming, SHI Changtong. Text classification model based on dual-channel feature fusion based on XLNet [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(5): 36-45.
[9] ZHANG Zhi-hao, LIN Yao-jin, LU Shun, WU Yi-lin, WANG Chen-xi. Multi-label feature selection with streaming and missing labels [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(8): 39-52.
[10] SUN Lin, CHEN Yu-sheng, XU Jiu-cheng. Multilabel feature selection algorithm based on improved ReliefF [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(4): 1-11.
[11] SUN Lin, LIANG Na, XU Jiu-cheng. Feature selection using adaptive neighborhood mutual information and spectral clustering [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(12): 13-24.
[12] ZHANG Yao, MA Ying-cang, YAND Xiao-fei, ZHU Heng-dong, YANG Ting. Multi-label feature selection based on manifold structure and flexible embedding [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(7): 91-102.
[13] ZHANG Bin-yan, ZHU Xiao-fei, XIAO Zhao-hui, HUANG Xian-ying, WU Jie. Short text classification based on semi-supervised graph neural network [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(5): 57-65.
[14] BAO Liang, CHEN Zhi-hao, CHEN Wen-zhang, YE Kai, LIAO Xiang-wen. Dual co-matching network with multiway attention for opinion reading comprehension [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(3): 44-53.
[15] DONG Yan-ru, LIU Pei-yu, LIU Wen-feng, ZHAO Hong-yan. A text classification model based on BiLSTM and label embedding [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(11): 78-86.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!