Web数据的深度定向采集

J4 ›› 2011, Vol. 46 ›› Issue (5): 34-38.

• SEWM 2011 会议 • 上一篇下一篇

Web数据的深度定向采集

夏天^1,2

1.数据工程与知识工程教育部重点实验室, 北京 100872; 2.中国人民大学信息资源管理学院, 北京100872

收稿日期:2010-12-06 发布日期:2011-05-25
作者简介:夏天(1978- ),男,博士,讲师,主要研究方向为Web数据挖掘. Email:iamxiatian@gmail.com
基金资助:
国家社会科学基金资助项目(09CTQ027)

Deep directional collection of Web data

XIA Tian^1,2

1. Key Laboratory of Data Engineering and Knowledge Engineering, MOE, Beijing 100872, China;
2. School of Information Resource Management, Renmin University of China, Beijing 100872, China

Received:2010-12-06 Published:2011-05-25

摘要/Abstract

摘要：

通过模拟人类访问网页的浏览行为,提取定向爬行子页面集限定爬虫的爬行方向;引入页面继承关系,并通过爬行条目的属性继承实现跨页面复合对象的数据关联关系;设计实现了支持深度定向采集的通用爬行流程。面向天涯热帖的舆情采集实验结果表明:该方法可以在整体处理流程不变的前提下,实现复杂对象的数据采集,并具有较高的采集效率。

关键词: 深度采集;定向网络爬虫;公共网络舆情

Abstract:

Based on the Web surf behaviors of human beings, crawling directions are restricted by extracted crawling subpages, and the associated relationships of crosspage compound object are realized through the properties′ inheritance between crawl datum. Then, the generalized crawl process with deep directional collection support is designed and implemented. Experimental results about the hot posts of the Tianya site show that this method can achieve data collection of complicated objects without changing the main procedure, and has high collection efficiency.

Key words: deep collection; directional web crawler; public web opinion

夏天1,2. Web数据的深度定向采集[J]. J4, 2011, 46(5): 34-38.

XIA Tian1,2. Deep directional collection of Web data[J]. J4, 2011, 46(5): 34-38.

参考文献

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

Web数据的深度定向采集

Deep directional collection of Web data

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

多维度评价

本文评价

推荐阅读 0