JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2014, Vol. 49 ›› Issue (09): 62-68.doi: 10.6040/j.issn.1671-9352.2.2014.389

Previous Articles     Next Articles

Comparative research on text knowledge discovery for network public opinion

JIAO Lu-lin, PENG Yan, LIN Yun   

  1. College of Management, Capital Normal University, Beijing 100048, China
  • Received:2014-06-24 Revised:2014-08-27 Online:2014-09-20 Published:2014-09-30

Abstract: According to the field of network public opinion analysis, five clustering algorithms: system clustering, string kernels, K nearest neighbor algorithm, support vector machine algorithm and topic models were studied. A comprehensive comparative research of these five algorithms was conducted by using network public opinion data as data set and R language environment as experimental tool. At the same time, simulation experiments were carried out to compare these five algorithms' strengths and weaknesses. Experimental results show that "topic model" has better applicability than other algorithms in terms of text clustering. After further experiments we also found in topic models, CTM(Correlated Topic Model) method is more suitable for the exploration and discovery of class relations while Gibbs sampling method on the performance of text clustering method is better than the CTM method.

Key words: topic model, network public opinion, text knowledge discovery, text clustering

CLC Number: 

  • TP309
[1] 胡雷芳.五种常用系统聚类分析方法及其比较[J]. 浙江统计,2007, 4: 12-13. HU Leifang. Five commonly used cluster analysis methods and their comparison [J]. Zhejiang Statistics, 2007, 4:12-13.
[2] Huma Lodhi, Craig Saunders, John Shawe-Taylo, et al. Text classification using String Kernels [J]. Journal of Machine Learning Research, 2002, 2: 419-444.
[3] LEI Zhen, JIANG Yanjie, ZHAO Peng, et al. News event tracking using an improved hybrid of KNN and SVM [J]. Communication and Networking, 2009, 56: 431-438.
[4] Gregor Heinrich. Parameter estimation for text analysis[R]. Darmstadt:Fraunhofer IGD, 2004.
[5] 常州大学.基于文本语义相关的网络舆情信息分析方法:中国,CN103544255 A[P]. 2014-01-29. Changzhou University. Text semantic relativity based network public
[7] 李岩, 娄云. 文本聚类算法在舆情监控中的应用分析[J].电子设计工程, 2013, 21(1):70-74. LI Yan, LOU Yun, Applied research of text clustering algorithm in network monitoring public opinion [J]. Electronic Design Engineering, 2013, 21(1):70-74.
[8] 杨震,段立娟,赖英旭. 基于字符串相似性聚类的网络短文本舆情热点发现技术[J].北京工业大学学报, 2010, 36(5):669-672. YANG Zhen, DUAN Lijun, LAI Yingxun. Public opinion hotpot discovery technology of network short text based on string similarity clustering[J]. Journal of Beijing University of Technology, 2010, 36(5):669-672.
[9] 李岩,韩斌,赵剑. 基于短文本及情感分析的微博舆情分析[J]. 计算机应用与软件, 2013, 30(12):240-243 LI Yan, HAN Bin, ZHAO Jian. Analyzing microblogging public opinions based on short text and sentiment analysis [J]. Computer Application and Software, 2013, 30(12):240-243.
[10] WANG Xing, XIONG Fei, LIU Yun. Research on micro-blog information perception and mining platform[J]. Advanced Technologies, Embedded and Multimedia for Human-centric Computing, 2014, 260:753-761
[11] Frida Borng, Rainer Eising, Heike Klüver, et al. Identifying frames: a comparison of research methods[J].Interest Groups and Advocacy, 2014, 3:188-201.opinion information analysis method: China, CN103544255 A[P]. 2014-01-29.
[6] 汤寒青,王汉军. 改进的K-means算法在网络舆情分析中的应用[J].计算机系统应用,2011, 20(3):165-168. TANG Hanqing, WANG Hanjun. Application of improved K-means algorithm to analysis of online public opinions[J].Computer System and Applications, 2011, 20(3):165-168.
[1] MA Yu-feng, RUAN Tong. Entity set expansion based on LDA and label propagation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(03): 20-27.
[2] ZHENG Yan, PANG Lin, BI Hui, LIU Wei, CHENG Gong. Feature selection algorithm based on sentiment topic model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 74-81.
[3] WANG Shao-peng, PENG Yan, WANG Jie. Research of the text clustering based on LDA using in network public opinion analysis [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(09): 129-134.
[4] SHI Cun-hui, LIN Hong-fei*. Tracking event microblogs: a streaming dynamic topic model [J]. J4, 2012, 47(5): 13-18.
[5] PANG Guan-song, ZHANG Li-sha, JIANG Sheng-yi*, KUANG Li-min, WU Mei-ling. A multi-level clustering approach based on noun phrases for search results [J]. J4, 2010, 45(7): 39-44.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!