您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

J4 ›› 2012, Vol. 47 ›› Issue (3): 43-46.

• 电子技术与信息 • 上一篇    下一篇

面向汽车领域的软文识别研究

唐都钰1,王大亮2,赵凯2,秦兵1,刘挺1   

  1. 1.哈尔滨工业大学计算机科学与技术学院社会计算与信息检索研究中心, 黑龙江  哈尔滨 150001;
    2.NEC中国研究院, 北京100084
  • 收稿日期:2011-11-30 出版日期:2012-03-20 发布日期:2012-04-01
  • 作者简介:唐都钰(1988- ),男,硕士研究生,研究方向为文本情感倾向性分析. Email: dytang@ir.hit.edu.cn
  • 基金资助:

    国家自然科学基金面上项目(60975055);国家自然科学基金重点项目(61133012)

Automobile domain oriented spam detection

TANG Du-yu1, WANG Da-liang2, ZHAO Kai2, QIN Bing1, LIU Ting1   

  1. 1. Research Center for Social Computing and Information Retrieval, School of Computer Science and Technology,
    Harbin Institute of Technology, Harbin 150001, Heilongjiang, China; 2. NEC Labs China, Beijing 100084, China
  • Received:2011-11-30 Online:2012-03-20 Published:2012-04-01

摘要:

 针对面向汽车领域的软文识别问题,将软文识别分为顶贴识别、无关帖识别、广告帖识别和伪造帖识别4个子任务,并分别使用基于规则的方法和基于机器学习的方法对4类软文进行识别。基于规则的方法综合考虑汽车领域专业信息、极性词信息、作者级别信息等因素;基于机器学习的方法结合网帖内容特征和作者信息特征,使用最大熵分类器进行模型训练。实验结果表明,对于领域特征明显、具有数值化反馈信息和明确标注数据的领域,适合使用机器学习的方法进行软文识别。

关键词: 软文识别;广告帖识别;规则;机器学习

Abstract:

 The task that aims to detect spam reviews for the automobile domain was divided into four sub-tasks: supporting review detection, irrelevant review detection, advertisement detection and fake review detection. Both rule-based methods and machine learning methods were used to identify spam reviews. Many aspects were considered in the rule-based method, such as automobile domain knowledge, words with polarity, and information of the author. The review content feature and author information were combined to train a model with a maxent classifier. Experimental results showed that machine learning method performs well for the domain whose property was obvious, with numerical feedback information and labeled training data.

Key words:  spam detection; advertisement detection; rule-based method; machine learnin

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!