您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

J4 ›› 2013, Vol. 48 ›› Issue (11): 87-92.

• 论文 • 上一篇    下一篇

基于扩展特征向量空间模型的
多源数据融合

陈珂锐,潘君   

  1. 河南财经政法大学计算机与信息工程学院, 河南 郑州 450002
  • 收稿日期:2013-09-02 出版日期:2013-11-20 发布日期:2013-11-25
  • 作者简介:陈珂锐(1983- ),女,博士,讲师,主要研究领域为自然语言处理,机器学习,社会化搜索.Email:chenke0616@163.com
  • 基金资助:

    国家自然科学基金资助项目(61202285);国家级星火计划项目(2012GA750007);河南省教育厅科学技术研究重点项目(12A120002);河南省科技厅基础与前沿技术研究项目

Multi-source data fusion based on the expand vector space model

CHEN Ke-rui, PAN Jun   

  1. College of Computer and Information Engineering, Hennan University of Economics and Law,
    Zhengzhou 450002, Henan, China
  • Received:2013-09-02 Online:2013-11-20 Published:2013-11-25

摘要:

本体资源的扩充是自然语言处理的关键问题之一。传统的从单一数据源获取的信息其覆盖率较低,亟需建立一个整体的数据管理平台,对数据资源分类存储与整理。为此提出了AVP数据平台,构建AVP平台所面临的重要问题是多源数据的融合,即将不同来源的网站数据进行语义角色标注,对歧义词条进行识别判断,并最终归并到以义项为基本单位的数据仓库中;为解决多源数据融合的语义角色标注问题,给出了一种自动语义判歧方法。其基本思想是利用词条中的属性值对作为特征模板,并借助于属性值的共现概率,应用扩展向量空间模型对词条进行歧义识别。通过大量的实验对比可知,该系统在各方面均取得优异的成绩,所提出的算法能够很好地解决多源数据融合中的语义判歧问题。

关键词: 自然语言处理;本体;多源数据融合;语义判歧

Abstract:

The expansion of ontology resource is one of the key for the whole natural language processing. Since the information obtained traditionally from single data source could not reflect the overall picture and the coverage rate doesn’t reach targeted one, the construction of an integrated data management platform would be required to store and organize data sources by classification. The AVP data platform was  proposed firstly. In the process of data construction on AVP platform, the most important issue is to integrate multi-source data, in other words, to perform semantic role labeling on web data coming from different sources, to identify ambiguous entries, and to eventually merge into data warehouses which use sense as the basic unit. An automated method of semantic role matching has been suggested, and it would solve the problem of semantic role matching resulted from multi-source data fusion. The basic idea is to use attribute-values of entries as the feature template, and then apply expand vector space model to identity ambiguity for entries while assisted by the cooccurrence probability of attribute values. Through the massive experimental contrast, the system mentioned above performed very well in all respects. The theory and algorithm proposed in this paper could solve the problem of semantic role matching existed in multi-source data fusion effectively.

Key words: natural language processing; ontology; multisource data fusion; semantic role matching

中图分类号: 

  • TP391
[1] 刘雅辉1,2,刘春阳3*,张铁赢1,程学旗1. 图索引技术研究综述[J]. J4, 2013, 48(11): 44-52.
[2] 于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58.
[3] 郑建兴,张博锋*,岳晓冬,成泽宇. 基于友邻-用户模型的微博主题推荐研究[J]. J4, 2013, 48(11): 59-65.
[4] 彭庆喜,钱铁云. 基于量化情感的网店垃圾评论检测[J]. J4, 2013, 48(11): 66-72.
[5] 黄亮,杜永萍. 基于信任关系的潜在好友推荐方法[J]. J4, 2013, 48(11): 73-79.
[6] 张乃洲1, 曹薇2, 陈珂锐1, 李石君3. 一种基于时间感知的搜索引擎模型[J]. J4, 2013, 48(11): 80-86.
[7] 方志军,刘心韵,伍世虔,郑文娟. 基于子带加权融合的多尺度
Retinex图像增强算法
[J]. J4, 2013, 48(11): 93-98.
[8] 刘伍颖,易绵竹,张兴. 一种时空高效的多类别文本分类算法[J]. J4, 2013, 48(11): 99-104.
[9] 李玉倩 刘林 李金屏. 视频分析中灰度直方图的叠加原理研究[J]. J4, 2009, 44(11): 63-67.
[10] 谢桦 林尚垣 任雪芳. 单向粗关系及数据通讯安全[J]. J4, 2009, 44(9): 93-96.
[11] 许洁萍1,殷宏宇1,范子文2. 基于近似子乐句的翻唱歌曲识别研究[J]. J4, 2013, 48(7): 68-71.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!