您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

J4

• 论文 • 上一篇    下一篇

基于本体语义的定题爬虫

郑健珍1,林坤辉1,周昌乐2,康 恺1   

  1. 厦门大学软件学院, 福建 厦门 361005
  • 收稿日期:2006-03-29 修回日期:1900-01-01 出版日期:2006-10-24 发布日期:2006-10-24
  • 通讯作者: 郑健珍

Ontology based on focused crawler

ZHENG Jian-zhen,LIN Kun-hui,ZHOU Chang-le,KANG Kai   

  1. Software School, Xiamen Univ., Xiamen 361005, Fujian, China;
  • Received:2006-03-29 Revised:1900-01-01 Online:2006-10-24 Published:2006-10-24
  • Contact: ZHEN Jian-zhen

摘要: 定题爬虫能迅速获取网络上特定主题的大量信息,对专业搜索引擎及数据挖掘应用都具有重大价值.针对目前通用的基于关键词主题过滤策略的不足,在概念聚集思想启发下,提出了基于本体语义的主题过滤策略.同时根据网页具有不同位置不同信息重要性的特点,提出了改进的加权特征项权值计算公式,实现基于语义的网页实时过滤.为进一步提高爬虫的工作效率提出链接相关度预测算法.对比实验表明此策略具有可行性.

关键词: 定题爬虫, 主题过滤, 链接分析 , 本体语义

Abstract: Focused crawler can fetch large quantities of domain resources from the Web in a short time. It is very helpful in both foused search engines and data mining companies. In order to overcome the deficiency of topic filtering strategy based on keywords widly used nowadays, the paper proposed a topic filtering stratege based on concept elicited by concept congregation idea. The paper also proposed an authority modified weight calculation formula based on different importance of Web page information. By doing this, real time Web page filtering based on concept can be achieved. In the hope of improving focused crawler's work efficiency more, the paper also proposed a link forecast algorithm. At last, the comparative experiment shows that the strategies proposed in this paper are pratical.

Key words: hyperlinkanalyse , ontologysemanticanalyse, topicfiltering, focusedcrawler

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!