您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (7): 76-84.doi: 10.6040/j.issn.1671-9352.1.2023.097

• 综述 • 上一篇    下一篇

基于动态邻居选择的知识图谱事实错误检测方法

桂梁1,2(),徐遥1,2,何世柱1,2,*(),张元哲1,2,*(),刘康1,2,赵军1,2   

  1. 1. 中国科学院自动化研究所复杂系统认知与决策实验室,北京 100190
    2. 中国科学院大学人工智能学院,北京 100049
  • 收稿日期:2023-10-18 出版日期:2024-07-20 发布日期:2024-07-15
  • 通讯作者: 何世柱,张元哲 E-mail:guiliang21@mails.ucas.an.cn;shizhu.he@nlpr.ia.ac.cn;yzzhang@nlpr.ia.ac.cn
  • 作者简介:桂梁(1997—),男,硕士研究生,研究方向为知识图谱. E-mail:guiliang21@mails.ucas.an.cn
  • 基金资助:
    国家重点研发计划项目(2022YFF0711900);国家自然科学基金资助项目(62376270);国家自然科学基金资助项目(62276264)

Factual error detection in knowledge graphs based on dynamic neighbor selection

Liang GUI1,2(),Yao XU1,2,Shizhu HE1,2,*(),Yuanzhe ZHANG1,2,*(),Kang LIU1,2,Jun ZHAO1,2   

  1. 1. The Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    2. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2023-10-18 Online:2024-07-20 Published:2024-07-15
  • Contact: Shizhu HE,Yuanzhe ZHANG E-mail:guiliang21@mails.ucas.an.cn;shizhu.he@nlpr.ia.ac.cn;yzzhang@nlpr.ia.ac.cn

摘要:

由于知识图谱(knowledge graph, KG)的构建和更新通常依赖大量网络数据和自动化方法,因此其中建模和获取的知识内容难免存在各种事实错误。为了解决这个问题,提出一种新知识图谱事实错误检测方法。该方法动态选择待检测事实的邻居节点,通过捕捉头尾实体之间的复杂关系来判断事实是否存在错误。首先利用图结构信息确定每个实体的潜在邻居; 然后根据实体的上下文信息动态地选择相关邻居,进而使用高效的图注意力网络编码节点的特性; 最终通过计算节点的头尾实体表示的一致性,判断待检测事实是否存在错误, 并在多个公开的知识图谱数据集上进行实验。结果表明, 该方法在错误检测方面表现优于现有的方法。

关键词: 知识图谱, 事实错误检测, 知识图谱嵌入, 质量控制, 动态邻居选择

Abstract:

The construction and updating of the knowledge graph(KG) usually depend on a wide range of web data and automated methods, inevitably resulting in factual inaccuracies in the modeled and acquired knowledge. To tackle this problem, a novelapproach for identifying factual inaccuracies within the knowledge graph is proposed. This method actively selects adjacent nodes of the facts to be checked, detecting errors by measuring the intricate associations linking the head and tail entities. More specifically, it first utilizes graph structure information to identify potential neighbors for each entity. Then, based on contextual information, it dynamically selects relevant neighbors and uses an efficient graph attention network to encode node features. Finally, by calculating the consistency of head and tail entity representations, it determines if the fact under consideration is erroneous. Experimental results on multiple public KG datasets demonstrate that this method outperforms existing approaches in error detection.

Key words: knowledge graph, fact error detection, knowledge graphembedding, quality control, dynamic neighbor selection

中图分类号: 

  • TP391.1

图1

DyNED框架"

表1

数据集的统计信息"

数据集 三元组/条 实体/个 关系/种
WN18RR 93 003 40 943 11
NELL-995 154 213 75 492 200
FB15k-237 310 116 14 541 237

表2

在异常率为5%的3个数据集上Precision@K和Recall@K的事实错误检测结果"

评价指标 方法 WN18RR NELL-995 FB15k-237
K=1% K=2% K=3% K=4% K=5% K=1% K=2% K=3% K=4% K=5% K=1% K=2% K=3% K=4% K=5%
Precision@KTransE 0.581 0.488 0.371 0.345 0.331 0.659 0.550 0.476 0.423 0.383 0.756 0.674 0.605 0.546 0.488
ComplEX 0.518 0.444 0.382 0.341 0.307 0.627 0.538 0.472 0.427 0.378 0.718 0.651 0.590 0.534 0.485
DistMult 0.574 0.451 0.390 0.349 0.322 0.630 0.553 0.493 0.446 0.408 0.709 0.646 0.582 0.529 0.483
KGTtm 0.770 0.628 0.516 0.444 0.396 0.808 0.691 0.602 0.535 0.481 0.815 0.767 0.713 0.612 0.579
CAGED 0.826 0.726 0.632 0.541 0.469 0.850 0.736 0.644 0.573 0.516 0.852 0.796 0.735 0.665 0.595
DyNED 0.924 0.808 0.697 0.608 0.539 0.893 0.806 0.696 0.636 0.598 0.918 0.856 0.786 0.716 0.648
Recall@K TransE 0.116 0.195 0.233 0.276 0.331 0.132 0.220 0.285 0.338 0.383 0.151 0.270 0.363 0.437 0.488
ComplEX 0.103 0.177 0.229 0.273 0.307 0.125 0.215 0.283 0.341 0.378 0.143 0.260 0.354 0.427 0.485
DistMult 0.114 0.180 0.234 0.279 0.322 0.126 0.221 0.295 0.357 0.408 0.141 0.258 0.349 0.423 0.483
KGTtm 0.154 0.251 0.309 0.355 0.396 0.161 0.276 0.361 0.428 0.481 0.163 0.307 0.428 0.490 0.579
CAGED 0.165 0.290 0.379 0.433 0.469 0.170 0.294 0.386 0.459 0.516 0.171 0.318 0.441 0.532 0.595
DyNED 0.185 0.323 0.418 0.486 0.539 0.178 0.342 0.472 0.483 0.598 0.184 0.342 0.472 0.573 0.648

表3

异常比例为5%的WN18RR和NELL-995数据集中,各模块的错误检测结果"

评价指标 方法 WN18RR NELL-995
K=1% K=2% K=3% K=4% K=5% K=1% K=2% K=3% K=4% K=5%
Precision@K DyNED 0.924 0.808 0.697 0.608 0.539 0.893 0.806 0.696 0.636 0.598
DyNED_Local 0.653 0.571 0.497 0.446 0.406 0.702 0.638 0.564 0.483 0.439
DyNED_Global 0.738 0.623 0.538 0.477 0.435 0.762 0.679 0.612 0.532 0.491
Recall@K DyNED 0.185 0.323 0.418 0.486 0.539 0.178 0.342 0.472 0.483 0.598
DyNED_Local 0.135 0.228 0.291 0.357 0.406 0.145 0.257 0.323 0.402 0.439
DyNED_Global 0.143 0.247 0.315 0.382 0.435 0.156 0.286 0.398 0.447 0.491
1 ZHANG Yongfeng, AI Qingyao, CHEN Xu, et al. Learning over knowledge-base embeddings for recommendation[EB/OL]. (2018-03-17)[2023-10-18]. http://arxiv.org/abs/1803.06540.
2 WANG Hongwei, ZHANG Fuzheng, ZHANG Mengdi, et al. Knowledge-aware graph neural networks with label smoothness regularization for recommender systems[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Anchorage: ACM, 2019: 968-977.
3 JUNG J, SON B, LYU S. Attnio: knowledge graph exploration with in-and-out attention flow for knowledge-grounded dialogue[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg: ACL, 2020: 3484-3497.
4 GU Yuxian , WEN Jiaxin , SUN Hao , et al. Eva2. 0: investigating open-domain chinese dialogue systems with large-scale pre-training[J]. Machine Intelligence Research, 2023, 20 (2): 207- 219.
doi: 10.1007/s11633-022-1387-3
5 SUCHANEK F M, KASNECI G, WEIKUN G. Yago: a core of semantic knowledge[C]//Proceedings of the 16th International Conference on World Wide Web. Albert: ACM, 2007: 697-706.
6 BOLLACKER K, EVANS C, PARITOSH P, et al. Freebase: a collaboratively created graph database for structuring human knowledge[C]//Proceedings of the 2008: ACM SIGMOD International Conference on Management of Data. Vancouver: ACM, 2008: 1247-1250.
7 LEHMANN J , ISELE R , JAKOB M , et al. Dbpedia: a large-scale, multilingual knowledge base extracted from wikipedia[J]. Semantic Web, 2015, 6 (2): 167- 195.
doi: 10.3233/SW-140134
8 CARLSON A , BETTERIDGE J , KISIEL B , et al. Toward an architecture for never-ending language learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2010, 24 (1): 1306- 1313.
doi: 10.1609/aaai.v24i1.7519
9 CHEN Chaochao, ZHENG Fei, CUI Jiamie, et al. Survey and open problems in privacy-preserving knowledge graph: merging, query, representation, completion, and applications[J/OL]. International Journal of Machine Learning and Cybernetics, 2024: 1-20. https://link.springer.com/article/10.1007/s13042-024-02106-6.
10 YANG Yuji , XU Bin , HU Jiawei , et al. Accurate and efficient method for constructing domain knowledge graph[J]. Journal of Software, 2018, 29 (10): 2931- 2947.
11 LAO N , COHEN W W . Relational retrieval using a combination of path-constrained random walks[J]. Machine Learning, 2010, 81, 53- 67.
doi: 10.1007/s10994-010-5205-8
12 WANG Zhen , ZHANG Jiawen , FENG Jianlin , et al. Knowledge graph embedding by translating on hyperplanes[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2014, 28 (1): 112- 119.
13 ZHANG Qinggang, DONG Junnan, DUAN Keyu, et al. Contrastive knowledge graph error detection[C]//Proceedings of the 31st ACM International Conference on Information & Knowledge Management. Atlanta: ACM, 2022: 2590-2599.
14 BRODY S, ALON U, YAHAV E. How attentive are graph attention networks?[EB/OL]. (2021-10-11)[2023-10-18]. http://arxiv.org/abs/2105.14491.
15 RUI Yong , CARMONA V I S , POURVALI M , et al. Knowledge mining: a cross-disciplinary survey[J]. Machine Intelligence Research, 2022, 19 (2): 89- 114.
doi: 10.1007/s11633-022-1323-6
16 REDDY H , RAJ N , GALA M , et al. Text-mining-based fake news detection using ensemble methods[J]. International Journal of Automation and Computing, 2020, 17 (2): 210- 221.
doi: 10.1007/s11633-019-1216-5
17 GALARRAGE L A, TEFLIOUDI C, HOSE K, et al. AMIE: association rule mining under incomplete evidence in ontological knowledge bases[C]//Proceedings of the 22nd International Conference on World Wide Web. Rio de Janeiro: ACM, 2013: 413-422.
18 GALAEEAFE L , TEFLIOUDI C , HOSE K , et al. Fast rule mining in ontological knowledge bases with AMIE ++[J]. The VLDB Journal, 2015, 24 (6): 707- 730.
doi: 10.1007/s00778-015-0394-1
19 CHENG Yurong, CHEN Lei, YUAN Ye, et al. Rule-based graph repairing: semantic and efficient repairing methods[C]//2018 IEEE 34th International Conference on Data Engineering (ICDE). Paris: IEEE, 2018: 773-784.
20 GUO Shu , WANG Quan , WANG Lihong , et al. Knowledge graph embedding with iterative guidance from soft rules[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32 (1): 4816- 4823.
21 GUO Shu, WANG Quan, WANG Lilong, et al. Jointly embedding knowledge graphs and logical rules[J]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Stroundsburg: Association for Computational Linguistics, 2016: 192-202.
22 TROUILLON T, WELBI J, RIEDEL S, et al. Complex embeddings for simple link prediction[C]//International Conference on Machine Learning. New York: ACM, 2016: 2071-2080.
23 LIN Yankai, LIU Zhiyuan, SUN Maosong, et al. Learning entity and relation embeddings for knowledge graph completion[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Astin: ACM, 2015, 29(1): 2181-2187.
24 FAN Miao, ZHOU Qiang, CHANG E, et al. Transition-based knowledge graph embedding with relational mapping properties[C]//Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing. Phuket: Association for Computational Linguistics, 2014: 328-337.
25 JIA Shengbin, XIANG Yang, CHEN Xiaojun, et al. TTMF: a triple trustworthiness measurement frame for knowledge graphs[EB/OL]. (2018-11-06)[2023-10-18]. http://arxiv.org/abs/1809.09414.
26 YANG B S, YIH W, HE X D, et al. Embedding entities and relations for learning and inference in knowledge bases[EB/OL]. (2014-12-27)[2023-10-18]. http://arXiv preprint arXiv:1412.6575.
[1] 黎超, 廖薇. 基于医疗知识驱动的中文疾病文本分类模型[J]. 《山东大学学报(理学版)》, 2024, 59(7): 122-130.
[2] 牛泽群,李晓戈,强成宇,韩伟,姚怡,刘洋. 基于图注意力神经网络的实体消歧方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 71-80, 94.
[3] 那宇嘉,谢珺,杨海洋,续欣莹. 融合上下文的知识图谱补全方法[J]. 《山东大学学报(理学版)》, 2023, 58(9): 71-80.
[4] 马飞翔,廖祥文,於志勇,吴运兵,陈国龙. 基于知识图谱的文本观点检索方法[J]. 山东大学学报(理学版), 2016, 51(11): 33-40.
[5] 陆玮洁1 ,刘冰1,2 ,邹华彬1 ,张衡1 ,张新玲1 ,杨国生1 . 茶叶HNMR指纹图谱双指标序列分析[J]. J4, 2009, 44(3): 35-38 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 丁超1,2, 元昌安1,3*, 覃晓1,3. 基于GEP的多数据流预测算法[J]. J4, 2010, 45(7): 50 -54 .
[2] 张德瑜,翟文广 . 关于整数n的k次补数[J]. J4, 2006, 41(5): 4 -07 .
[3] 王廷明,黎伯堂 . 一类矩阵秩恒等式的证明[J]. J4, 2007, 42(2): 43 -45 .
[4] 付永红1 ,余眝妙2 ,唐应辉3 ,李才良4 . 两水平修理策略下的M/(Mr,Gs)/1/N/N机器维修模型稳态概率算法与性能分析[J]. J4, 2009, 44(4): 72 -78 .
[5] 邹国平1,马儒宁1,丁军娣2,钟宝江3. 基于显著性加权颜色和纹理的图像检索[J]. J4, 2010, 45(7): 81 -85 .
[6] 陈 莉 . 不确定奇异系统的鲁棒故障诊断滤波器设计[J]. J4, 2007, 42(7): 62 -65 .
[7] 郭兰兰1,2,耿介1,石硕1,3,苑飞1,雷丽1,杜广生1*. 基于UDF方法的阀门变速关闭过程中的#br# 水击压强计算研究[J]. 山东大学学报(理学版), 2014, 49(03): 27 -30 .
[8] 史开泉. 信息规律智能融合与软信息图像智能生成[J]. 山东大学学报(理学版), 2014, 49(04): 1 -17 .
[9] 章 玲,周德群 . λ模糊测度及其Mbius变换和关联系数间关系的推导[J]. J4, 2007, 42(7): 33 -37 .
[10] 曾文赋1,黄添强1,2,李凯1,余养强1,郭躬德1,2. 基于调和平均测地线核的局部线性嵌入算法[J]. J4, 2010, 45(7): 55 -59 .