您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

J4 ›› 2010, Vol. 45 ›› Issue (7): 39-44.

• 论文 • 上一篇    下一篇

一种基于名词短语的检索结果多层聚类方法

庞观松,张黎莎,蒋盛益*,邝丽敏,吴美玲   

  1. 广东外语外贸大学信息学院, 广东 广州 510420
  • 收稿日期:2010-04-02 出版日期:2010-07-16 发布日期:2010-09-06
  • 通讯作者: 蒋盛益(1963-),男,教授,博士,主要研究方向为数据挖掘与自然语言处理.
  • 作者简介:庞观松(1988-),男,硕士研究生,研究方向为数据挖掘应用.Email:pangguansong@163.com
  • 基金资助:

    国家自然科学基金资助项目(60673191);广东省高等学校自然科学研究重点项目(06Z012);广东省自然科学基金资助项目(9151026005000002)

A multi-level clustering approach based on noun phrases for search results

PANG Guan-song, ZHANG Li-sha, JIANG Sheng-yi*, KUANG Li-min, WU Mei-ling   

  1. School of Informatics, Guangdong University of Foreign Studies, Guangzhou 510420, Guangdong, China
  • Received:2010-04-02 Online:2010-07-16 Published:2010-09-06

摘要:

为了对检索结果获取高质量的聚类效果,提取名词短语作为候选类别标签,根据候选类别标签分布情况生成基础类,再使用具有线性时间复杂度的一趟聚类算法对基础类进行多层聚类。与NEC,STC和Lingo算法的对比实验表明:该方法在类别标签的可读性、有效性以及聚类性能上都优于以上3种方法。

关键词: 信息检索;检索结果聚类;文本聚类;多层聚类

Abstract:

In order to  get high qualitative clustering results, the noun phrases was selected as candidate cluster labels and generates basic clusters based on the distribution of candidate cluster labels. And then multi-level clustering was proceeded on basic clusters by using one pass clustering algorithm with linear time complexity. The comparative experiment was carried with our method, NEC algorithm, STC algorithm and Lingo  algorithm, and the results showed that our method could get more informative, readable cluster labels and more effective than other three methods.

Key words: information retrieval; search results clustering; text clustering; multi-level clustering

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!