山东大学学报(理学版) ›› 2017, Vol. 52 ›› Issue (7): 44-51.doi: 10.6040/j.issn.1671-9352.1.2016.PC6
张鹏1,王素格1,2*,李德玉1,2,王杰1
ZHANG Peng1, WANG Su-ge1,2*, LI De-yu1,2, WANG Jie1
摘要: 互联网业已深入每个人的生活,团购平台、在线商店、在线消费等形式的电子商务平台已成为人们时下最流行的消费方式。几乎所有的电商平台都允许和鼓励用户在消费之后对产品或者服务进行评论,而且用户评论对潜在消费者和商家都具有极高的价值。这使得广告、虚假评论等形式的垃圾评论被人为地夹杂在用户评论中,以期达到虚假宣传、推广产品或者诋毁其他商家信誉的目的。垃圾评论检测和分析便是在这样一种应用背景下,研究如何有效地排除垃圾评论干扰,发挥有效评论价值的方法。针对COAE2015设定的垃圾评论识别任务,利用其提供的语料资源,设计了一种基于启发式规则的半监督垃圾评论分类方法。实验结果证明,提出的方法可以有效地识别垃圾评论,同时能够保持对有效评论的识别精度。
中图分类号:
[1] HEYDARI A, ALI TAVAKOLI M, SALIM N, et al. Detection of review spam: asurvey[J]. Expert Systems with Applications, 2015, 42(7):3634-3642. [2] JINDAL N, LIU B. Analyzing and detecting review spam[C] // Proceeding of 7th IEEE International Conference on Data Mining(ICDM 2007). New York: IEEE, 2007: 547-552. [3] CASTILLO C, DONATO D, BECCHETTI L, et al. A reference collection for Web spam[J]. ACM SigirForum, 2006, 40(2):11-24. [4] FETTERLY D, MANASSE M, NAJORK M. Spam, damn spam, and statistics:using statistical analysis to locate spam Web pages[C] // Proceedings of the 7th International Workshop on the Web and Databases: Collocated with ACM Sigmod/pods 2004. New York: ACM, 2004: 1-6. [5] JINDAL N, LIU B. Opinion spam and analysis[C] // Proceedings of the 2008 International Conference on Web Search and Data Mining. New York: ACM, 2008: 219-230. [6] LI H, CHEN Z, MUKHERJEE A, et al. Analyzing and detecting opinion spam on alarge-scale dataset via temporal and spatial patterns[C] // Proceedings of The 9th International AAAI Conference on Web and Social Media(ICWSM-15). Menlo Park: ICWSM, 2015: 26-29. [7] XIE S, WANG G, LIN S, et al. Review spam detection via temporal pattern discovery[C] // Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2012: 823-831. [8] FEI G, MUKHERJEE A, LIU B, et al. Exploiting burstiness in rviews for review spammer detection[J]. ICWSM, 2013, 13:175-184. [9] SHARMA K, LIN K I. Review spam detector with rating consistency check[C] // Proceedings of the 51st ACM Southeast Conference. New York: ACM, 2013: 34-39. [10] MUKHERJEE A, KUMAR A, LIU B, et al. Spotting opinion spammers using behavioral footprints[C] // Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2013: 632-640. [11] KARAMI A, ZHOU B. Online review spam detection by new linguistic features[C] // iConference 2015 Proceedings.Urbana: IDEALS, 2015: 1-5. [12] 刁宇峰, 杨亮, 林鸿飞. 基于LDA模型的博客垃圾评论发现[J]. 中文信息学报, 2011, 25(1):41-48. DIAO Yufeng, YANG Liang, LIN Hongfei. LDA-based opinion spam discovering[J]. Journal of Chinese Information Processing, 2011, 25(1):41-48. [13] MUKHERJEE A, VENKATARAMAN V. Opinion spam detection: an unsupervised approach using generative models[J]. Techincal Report, UH, 2014(07):1-11. [14] XU Q, ZHAO H. Using deep linguistic features for finding deceptive opinion spam[C] // In Proceedings of COLING 2012. Stroudsburg: ACL, 2013: 1341-1350. [15] AL NAJADA H, ZHU X. iSRD: spam review detection with imbalanced data distributions[C] // Proceeding of Information Reuse and Integration(IRI), 2014 IEEE 15th International Conference on Information Reuse and Integration. New York: IEEE, 2014: 553-560. [16] LI J, OTT M, CARDIE C, et al. Towards a general rule for identifying deceptive opinion spam[C] // InProceedings of the 52ndAnnual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2014: 1566-1576. [17] LIN Y, ZHU T, WANG X, et al. Towards online review spam detection[C] // Proceedings of the 23rd International Conference on World Wide Web. New York: ACM, 2014: 341-342. [18] 何珑. 基于随机森林的产品垃圾评论识别[J]. 中文信息学报, 2015, 29(3):150-154. HE Long. Identification of product review spam by random forest[J]. Journal of Chinese Information Processing, 2015, 29(3):150-154. [19] LI H, LIU B, MUKHERJEE A, et al. Spotting fake reviews using positive-unlabeled learning[J]. Computación y Sistemas, 2014, 18(3):467-475. |
[1] | 苏丰龙,谢庆华,黄清泉,邱继远,岳振军. 基于直推式学习的半监督属性抽取[J]. 山东大学学报(理学版), 2016, 51(3): 111-115. |
[2] | 杜红乐,张燕,张林. 不均衡数据集下的入侵检测[J]. 山东大学学报(理学版), 2016, 51(11): 50-57. |
[3] | 吴鹏飞,孟祥增,刘俊晓,马凤娟 . 基于结构与内容的网页主题信息提取研究[J]. J4, 2006, 41(3): 131-134 . |
|