您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (7): 27-43.doi: 10.6040/j.issn.1671-9352.1.2023.062

• 综述 • 上一篇    下一篇

高维数据的降维与检索算法

邵伟1,2(),朱高宇1,2,于雷1,*(),郭嘉丰1   

  1. 1. 中国科学院计算技术研究所网络数据科学与技术重点实验室, 北京 100190
    2. 中国科学院大学计算机科学与技术学院, 北京 100049
  • 收稿日期:2023-10-13 出版日期:2024-07-20 发布日期:2024-07-15
  • 通讯作者: 于雷 E-mail:shaowei23@mails.ucas.ac.cn;yulei2008@ict.ac.cn
  • 作者简介:邵伟(2001—),男,硕士研究生,研究方向为数据分析、机器学习. E-mail:shaowei23@mails.ucas.ac.cn
  • 基金资助:
    国家重点研发计划资助项目(2022YFB2404200)

Dimensionality reduction and retrieval algorithms for high dimensional data

Wei SHAO1,2(),Gaoyu ZHU1,2,Lei YU1,*(),Jiafeng GUO1   

  1. 1. Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    2. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2023-10-13 Online:2024-07-20 Published:2024-07-15
  • Contact: Lei YU E-mail:shaowei23@mails.ucas.ac.cn;yulei2008@ict.ac.cn

摘要:

目前大多研究通过一些降维方法将高维向量转化为低维向量表示,再应用相关向量检索优化技术实现快速相似性检索,从而提高大模型应用表现。当前针对高维数据的降维方法繁多分散,在不同的研究背景下所采用的降维方法不尽相同,同样地,在向量检索技术上也存在大量不同的检索思路与优化方法。通过综述近期的降维和检索算法的主要思路及其优化方法,有助于产生二者之间的启发性联系,支撑后续高维向量检索优化算法研究的展开和深入。

关键词: 高维数据, 数据降维, 检索算法, 近似最近邻检索

Abstract:

At present, most studies use some dimensionality reduction methods to convert high-dimensional vectors into low-dimensional vector representations, and then apply related vector retrieval optimization technology to achieve fast similarity retrieval, thereby improving the application performance of large models. Currently, there are many and scattered dimensionality reduction methods for high-dimensional data, and the dimensionality reduction methods used in different research backgrounds are different. Similarly, there are also many different retrieval ideas and optimization methods in vector retrieval technology. By reviewing the main ideas and optimization methods of recent dimensionality reduction and retrieval algorithms, this paper helps to generate inspiring connections between the two and support the development and in-depth research of subsequent high-dimensional vector retrieval optimization algorithms.

Key words: high-dimensional data, dimensionality reduction, retrieval algorithm, approximate nearest neighbor search

中图分类号: 

  • TP391

表1

降维算法的分析与比较"

类别 算法名称 优点 缺点
线性降维方法 PCA 简单;可解释性好 不能区分不同类别的数据;保持的主分量个数难以确定
LDA 保留不同类别数据的差异性;分类效果强 数据服从高斯分布
MDS 计算简单;保留高维数据之间的相对关系 各个维度的贡献相同
MDP 保持高维数据的流形和判别结构;避免小样本问题
LLTSA 保持数据的几何结构;对噪声鲁棒;较好局部表达
RLGS 自适应发现高维数据集中的流形结构;不涉及到近邻参数的选择问题;识别性能对于正则化参数的选择是鲁棒的
PER 适合处理分层响应面;适合控制与多共线性、高维度和预测器稀疏性相关的问题 响应和预测变量之间的链接函数是单调的
非线性降维方法 ISOMAP 数据平移旋转时,降维的结果不会发生变化 不提供显式的映射
LLE 凸函数;计算量少 近邻数的选择对结果影响较大, 不提供显式映射
LTSA 探索高维数据集的非凸流形结构 不提供高维数据到低维数据的一个映射
LE 保持好数据的局部结构 不提供显性映射
MVU 有效地处理非凸函数
LPP 能进行分类;对局部信息的保存很好
LLA 解决了LLE的源数据稀疏时算法失效的问题
ONPP 具有LLE优势的前提下,能给出高维数据到低维数据的显式映射
GPP 具有判别能力;能学习模型内部的几何结构
DONPP 有较高的识别率和判别能力
Reg-SS-ISOMAP 能够提供显示映射;可利用大量无标签数据
MMC 更强的分类能力
DOEPP 分类性能强;稳定
IRobust KPCA 鲁棒;对离群样本不敏感
SDMA 具有良好判别能力;保留局部几何结构 假设每一类数据独立构成一个子流行
MLapRLS 兼具LDA与LPP的优点,更好地刻画样本空间
CRBM 提供双向映射
SLPC 分类,保留数据的局部结构
t-SNE 学习数据局部结构;可视化
E-t-SNE 分类效果强;良好的可视化
SVD 增强数据能力;减少冗余数据
DM 能够进行时间数据可视化
DFT 将数据的有效信息保留;无效数据丢弃
DWT 可以提供数据时间特征与频率特征
RMTSLA-SPDDR 保留原始数据的形式与属性
EW-PCA 速度较快;效果较好
OTDLDD-TDR 保留数据的局部结构与标签信息
FBRFE 保护数据间的距离与角度
FEM 适用面广;效果较好;速度较快

表2

检索算法的分析与比较"

类别 算法名称 优点 缺点
基于空间划分 VP Tree 较为精确地检索近邻; 结构简单 高维空间运行效率较低
Ball Tree 一定程度提高检索效率,降低重复检索概率
Annoy 多次划分子空间提高了查全率
Randomized KD-trees 从多个维度上划分子空间,检索效率较高
基于希哈散列 稳态分布的投影 通过稳态分布提高了相似数据的碰撞概率 散列结果需要二次映射
随机超平面的投影 优化稳态分布的方法,进一步提高检索效率
Multi-Probe LSH 检索时查找多个哈希桶,提高查全率与精度 近似检索点的选取易造成重复检索
QALSH 检索点与数据点一同构造索引,实现动态索引
PQ 子空间内向量量化,构造码本实现高效聚类与检索 有损压缩的方式使召回率较低
OPQ 优化原始数据的分布,提高PQ算法的效率
基于图 NSW 图结构实现近邻检索,提高了检索精度 图的度数较高,降低了检索效率
HNSW 由粗粒度到细粒度的检索方式,检索精度与效率均较高 多层图的构建易导致内存占用较高
1 LI Hailin . Asynchronism-based principal component analysis for time series data mining[J]. Expert Systems with Applications, 2014, 41 (6): 2842- 2850.
doi: 10.1016/j.eswa.2013.10.019
2 FISHER R A . The use of multiple measurements in taxonomic problems[J]. Annals of Eugenics, 1936, 7 (2): 179- 188.
doi: 10.1111/j.1469-1809.1936.tb02137.x
3 COX T F , COX M A A . Multidimensional scaling[M]. Boca Raton: CRC Press, 2000.
4 何进荣, 丁立新, 李照奎, 等. 基于边界判别投影的数据降维[J]. 软件学报, 2014, 25 (4): 826- 838.
HE Jinrong , DING Lixin , LI Zhaokui , et al. Margin discriminant projection for dimensionality reduction[J]. Journal of Software, 2014, 25 (4): 826- 838.
5 张田昊. 数据降维算法研究及其应用[D]. 上海: 上海交通大学, 2008.
ZHANG Tianhao. Research on dimensionality reduction algorithms and its applications[D]. Shanghai: Shanghai Jiao Tong University, 2008.
6 ZHANG Zhenyue , ZHA Hongyuan . Principal manifolds and nonlinear dimensionality reduction via tangent space alignment[J]. SIAM Journal on Scientific Computing, 2004, 26 (1): 313- 338.
doi: 10.1137/S1064827502419154
7 龚铁梁. 数据降维算法研究及其应用[D]. 武汉: 湖北大学, 2012.
GONG Tieliang. Research on dimensionality reduction algorithms and applications[D]. Wuhan: Hubei University, 2012.
8 SOALE A N . Projection expectile regression for sufficient dimension reduction[J]. Computational Statistics & Data Analysis, 2023, 180, 107666.
9 TENENBAUM J. Mapping a manifold of perceptual observations[C]//Proceedings of the 10th International Conference on Neural Information Processing Systems. Denver: ACM, 1997: 682-688.
10 LEE J A, ARCHAMBEAU C, VERLEYSEN M. Locally linear embedding versus isotop[C]//ESANN'2003 proceedings-European Symposium on Artificial Neural Networks. Bruges: D-Side, 2003: 527-534
11 BELKIN M, NIYOGI P. Laplacian eigenmaps and spectral techniques for embedding and clustering[C]//Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. Columbia: ACM, 2001: 585-591.
12 WEINBERGER K Q , SAUL L K . Unsupervised learning of image manifolds by semidefinite programming[J]. InternationalJournal of Computer Vision, 2006, 70 (1): 77- 90.
doi: 10.1007/s11263-005-4939-z
13 HE X, NIYOGI P. Locality preserving projections[C]//Proceedings of the 16th International Conference on Neural Information Processing Systems. Columbia: ACM, 2003: 153-160.
14 吴晓婷, 闫德勤. 改进的非线性数据降维方法及其应用[J]. 计算机工程与应用, 2011, 47 (2): 156- 159.
WU Xiaoting , YAN Deqin . Improved non-linear data dimensionality reduction method and its application[J]. Computer Engineering and Applications, 2011, 47 (2): 156- 159.
15 宋欣, 叶世伟. 基于直接估计梯度思想的数据降维算法[J]. 计算机工程, 2008, 34 (8): 205- 207.
doi: 10.3969/j.issn.1000-3428.2008.08.073
SONG Xin , YE Shiwei . Data dimensionality reduction algorithm based on direct estimate grads[J]. Computer Engineering, 2008, 34 (8): 205- 207.
doi: 10.3969/j.issn.1000-3428.2008.08.073
16 马思远, 贺萍. 改进的局部线性嵌入及其混成数据降维算法[J]. 计算机与数字工程, 2022, 50 (12): 2616- 2621.
doi: 10.3969/j.issn.1672-9722.2022.12.002
MA Siyuan , HE Ping . Improved locally linear embedding and hybrid algorithm for data dimensionality reduction[J]. Computer & Digital Engineering, 2022, 50 (12): 2616- 2621.
doi: 10.3969/j.issn.1672-9722.2022.12.002
17 DONOHO D L , GRIMES C . Hessian eigenmaps: locally linear embedding techniques for high-dimensional data[J]. Proceedings of the National Academy of Sciences of the United States of America, 2003, 100 (10): 5591- 5596.
18 邹艳. 高维数据降维方法的研究[D]. 成都: 西南交通大学, 2012.
ZOU Yan. Research on dimension reduction methods of high dimensional data[D]. Chengdu: Southwest Jiaotong University, 2012.
19 宋欣, 叶世伟. 一种在源数据稀疏情况下的数据降维算法[J]. 计算机工程与应用, 2007, 43 (28): 181- 183.
SONG Xin , YE Shiwei . Data dimensionality reduction algorithm when source data is spare[J]. Computer Engineering and Applications, 2007, 43 (28): 181- 183.
20 KOKIOPOULOU E , SAAD Y . Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29 (12): 2143- 2156.
doi: 10.1109/TPAMI.2007.1131
21 王宪保, 陈诗文, 姚明海. 基于正则化的半监督等距映射数据降维方法[J]. 电子与信息学报, 2016, 38 (1): 241- 245.
WANG Xianbao , CHEN Shiwen , YAO Minghai . Data dimensionality reduction method of semi-supervised isometric mapping based on regularization[J]. Journal of Electronics & Information Technology, 2016, 38 (1): 241- 245.
22 LI Haifeng, JIANG Tao, ZHANG Keshu. Efficient and robust feature extraction by maximum margin criterion[C]//Proceedings of the 16th International Conference on Neural Information Processing Systems. Columbia: ACM, 2003: 97-104.
23 罗廷金. 基于流形学习的数据降维算法研究[D]. 长沙: 国防科学技术大学, 2013.
LUO Tingjin. Research on dimension reduction algorithms based on manifold learning[D]. Changsha: National University of Defense Technology, 2013.
24 王雷. 基于全局统计与局部几何性质的数据降维算法研究[D]. 合肥: 中国科学技术大学, 2009.
WANG Lei. Research on global statistic and local geometry based dimensionality reduction algorithm[D]. Hefei: The University of Science and Technology of China, 2009.
25 胡昭华, 宋耀良. 基于Autoencoder网络的数据降维和重构[J]. 电子与信息学报, 2009, 31 (5): 1189- 1192.
HU Zhaohua , SONG Yaoliang . Dimensionality reduction and reconstruction of data based on autoencoder network[J]. Journal of Electronics & Information Technology, 2009, 31 (5): 1189- 1192.
26 申中华. 数据降维技术的建模研究与应用: 特征降维及其应用[D]. 无锡: 江南大学, 2008.
SHEN Zhonghua. Research and application of dimensionality reduction techniques: feature reduction and its applications[D]. Wuxi: Jiangnan University, 2008.
27 HINTON G E, ROWEIS S. Stochastic neighbor embeddingg[C]//Proceedings of the 15th International Conference on Neural Information Processing Systems. Cambridge: ACM, 2002: 857-864.
28 魏世超, 李歆, 张宜弛, 等. 基于E-t-SNE的混合属性数据降维可视化方法[J]. 计算机工程与应用, 2020, 56 (6): 66- 72.
WEI Shichao , LI Xin , ZHANG Yichi , et al. Dimension reduction and visualization of mixed-type data based on E-t-SNE[J]. Computer Engineering and Applications, 2020, 56 (6): 66- 72.
29 KHOSHROU A, DORSMAN A B, PAUWELS E J. SVD-based visualisation and approximation for time series data in smart energy systems[C]//2017 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe). Turin: IEEE, 2017: 1-6.
30 BOWALA I , FERNANDO M . A novel model for time-series data clustering based on piecewise SVD and BIRCH for stock data analysis on Hadoop platform[J]. Advances in Science, Technology and Engineering Systems Journal, 2017, 2 (3): 855- 864.
31 COIFMAN R R , LAFON S . Diffusion maps[J]. Applied and Computational Harmonic Analysis, 2006, 21 (1): 5- 30.
32 JUVONEN A , SIPOLA T , HÄMÄLÄINEN T . Online anomaly detection using dimensionality reduction techniques for HTTP log analysis[J]. Computer Networks, 2015, 91 (C): 46- 56.
33 SHATKAY H . The Fourier transform: a primer[M]. Providence: Brown University, 1995: 1- 20.
34 SIFUZZAMAN M , ISLAM M R , ALI M Z . Application of wavelet transform and its advantages compared to Fourier transform[J]. Journal of Physical Sciences, 2009, 13, 121- 134.
35 GAO Wenxu , MA Zhengming , XIONG Chenkui , et al. Dimensionality reduction of SPD data based on Riemannian manifold tangent spaces and local affinity[J]. Applied Intelligence, 2023, 53 (2): 1887- 1911.
36 FRANCIS D P , RAIMOND K , KATHRINE G J W . Dimensionality reduction of large datasets with explicit feature maps[J]. Intelligent Decision Technologies, 2023, 17 (2): 457- 470.
37 BERA D , PRATAP R , VERMA B D . Dimensionality reduction for categorical data[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35 (4): 3658- 3671.
38 HALLAJI E , FARAJZADEH-ZANJANI M , RAZAVI-FAR R , et al. Constrained generative adversarial learning for dimensionality reduction[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35 (3): 2394- 2405.
39 WEI Wei , YUE Qin , FENG Kai , et al. Unsupervised dimensionality reduction based on fusing multiple clustering results[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35 (3): 3211- 3223.
40 ISMKHAN H , IZADI M . Proposing a dimensionality reduction technique with an inequality for unsupervised learning from high-dimensional big data[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53 (6): 3880- 3889.
41 BADER J, NELSON D, CHAI-ZHANG T, et al. Neural network for nonlinear dimension reduction through manifold recovery[C]//2019 IEEE MIT Undergraduate Research Technology Conference (URTC). Cambridge: IEEE, 2019: 1-4.
42 CUI Yuming, FANG Yinglan. Research on PCA data dimension reduction algorithm based on entropy weight method[C]//2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). Taiyuan: IEEE, 2020: 392-396.
43 SHI Yang, ZHANG Aidong. A shrinking-based dimension reduction approach for multi-dimensional analysis[C]//Proceedings of 16th International Conference on Scientific and Statistical Database Management. Santorini Island: IEEE, 2004: 427-428.
44 GAO Weixu , MA Zhengming , YUAN Xuejing . Dimensionality reduction algorithm of tensor data based on orthogonal tucker decomposition and local discrimination difference[J]. Applied Intelligence, 2022, 52 (12): 14518- 14540.
45 BEHERA A P , SINGH J , VERMA S , et al. Data visualization through non linear dimensionality reduction using feature based Ricci flow embedding[J]. Multimedia Tools and Applications, 2022, 81 (11): 14831- 14850.
46 ISLAM M T , XING L . A data-driven dimensionality-reduction algorithm for the exploration of patterns in biomedical data[J]. Nature Biomedical Engineering, 2021, 5 (6): 624- 635.
47 VOGELSTEIN J T , BRIDGEFORD E W , TANG M , et al. Supervised dimensionality reduction for big data[J]. Nature Communications, 2021, 12 (1): 2872.
48 INDYK P, MOTWANI R. Approximate nearest neighbors: towards removing the curse of dimensionality[C]//Proceedings of the thirtieth annual ACM symposium on Theory of computing-STOC'98. Dallas: ACM, 1998: 604-613.
49 YIANILOS P N. Data structures and algorithms for nearest neighbor search in general metric spaces[C]//Proceedings of the Fourth Annual ACM-SIAM symposium on Discrete algorithms. Austin: ACM, 1993: 311-321.
50 CAYTON L. Fast nearest neighbor retrieval for bregman divergences[C]//Proceedings of the 25th international conference on Machine learning. Helsinki: ACM, 2008: 112-119.
51 BENTLEY J L . Multidimensional binary search trees used for associative searching[J]. Communications of the ACM, 1975, 18 (9): 509- 517.
52 SPOTIFY. Github-spotify/annoy: about approximate nearest neighbors in C++/Python optimized for memory usage and loading/saving to disk[EB/OL]. [2023-07-25]. https://github.com/spotify/annoy.
53 DASGUPTA S, FREUND Y. Random projection trees and low dimensional manifolds[C]//Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing. Columbia: ACM, 2008: 537-546.
54 SILPA-ANAN C, HARTLEY R. Optimised KD-trees for fast image descriptor matching[C]//2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage: IEEE, 2008: 1-8.
55 MUJA M , LOWE D G . Scalable nearest neighbor algorithms for high dimensional data[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36 (11): 2227- 2240.
56 FUKUNAGA K , NARENDRA P M . A branch and bound algorithm for computing k-nearest neighbors[J]. IEEE Transactions on Computers, 1975, 100 (7): 750- 753.
57 NAVARRO G . Searching in metric spaces by spatial approximation[J]. The VLDB Journal, 2002, 11 (1): 28- 46.
58 BEYGELZIMER A, KAKADE S, LANGFORD J. Cover trees for nearest neighbor[C]//Proceedings of the 23rd Internatio-nal Conference on Machine Learning. Pittsburgh: ACM, 2006: 97-104.
59 WANG H Y, CAO J, SHU L C, et al. Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Managementt-CIKM'13. San Francisco: ACM, 2013: 1969-1978.
60 DATAR M, IMMORLICA N, INDYK P, et al. Locality-sensitive hashing scheme based on p-stable distributions[C]//Proceedings of the Twentieth Annual Symposium on Computational Geometry. New York: ACM, 2004: 253-262.
61 LYU Q, JOSEPHSON W, WANG Z, et al. Multi-probe LSH: efficient indexing for high-dimensional similarity search[C]//Proceedings of the 33rd International Conference on Very Large Data Bases. Vienna: ACM, 2007: 950-961.
62 LYU Q , JOSEPHSON W , WANG Z , et al. Intelligent probing for locality sensitive hashing: multi-probe LSH and beyond[J]. Proceedings of the VLDB Endowment, 2017, 10 (12): 2021- 2024.
63 ZOLOTAREV V M . One-dimensional stable distributions[M]. Providence: American Mathematical Society, 1986.
64 CHARIKAR M S. Similarity estimation techniques from rounding algorithms[C]//Proceedings of the thiry-fourth annual ACM Symposium on Theory of Computing. Montreal: ACM, 2002: 380-388.
65 PANIGRAHY R. Entropy based nearest neighbor search in high dimensions[C]//Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm-SODA'06. Miami: ACM, 2006: 1186-1195.
66 LIU W, WANG J, KUMAR S, et al. Hashing with graphs[C]//Proceedings of the 28th International Conference on Machine Learning. Bellevue: International Machine Learning Society, 2011: 1-8.
67 WANG Jianfeng, WANG Jingdong, YU Nenghai, et al. Order preserving hashing for approximate nearest neighbor search[C]//Proceedings of the 21st ACM International Conference on Multimedia. Barcelona: ACM, 2013: 133-142.
68 WANG J, LIU W, SUN A X, et al. Learning hash codes with listwise supervision[C]///2013 IEEE International Conference on Computer Vision. Sydney: IEEE, 2013: 3032-3039.
69 JIN Zhongming, HU Yao, LIN Yue, et al. Complementary projection hashing[C]//Proceedings of the IEEE International Conference on Computer Vision. Sydney: IEEE, 2013: 257-264.
70 JOLY A, BUISSON O. Random maximum margin hashing[C]//CVPR. Colorado Springs: IEEE, 2011: 873-880.
71 JEGOU H , DOUZE M , SCHMID C . Product quantization for nearest neighbor search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33 (1): 117- 128.
72 GONG Y C , LAZEBNIK S , GORDO A , et al. Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35 (12): 2916- 2929.
73 KONG Weihao, LI Wujun. Isotropic hashing[C]//Proceedings of the Advances in Neural Information Processing Systems, [S. l. ]: Curran Associates, 2012: 1-9.
74 BABENKO A, LEMPITSKY V. Additive quantization for extreme vector compression[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 931-938.
75 ZHANG T, DU C, WANG J D. Composite quantization for approximate nearest neighbor search[C]//Proceedings of the 31st International Conference on Machine Learning-Volume 32. Beijing: ACM, 2014: 838-846.
76 SALAKHUTDINOV R , HINTON G . Semantic hashing[J]. International Journal of Approximate Reasoning, 2009, 50 (7): 969- 978.
77 GRAY R M , NEUHOFF D L . Quantization[J]. IEEE Transactions on Information Theory, 1998, 44 (6): 2325- 2383.
78 GE Tiezheng , HE Kaiming , KE Qifa , et al. Optimized product quantization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36 (4): 744- 755.
79 JEGOU H, DOUZE M, SCHMID C, et al. Aggregating local descriptors into a compact image representation[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE, 2010: 3304-3311.
80 WANG Mengzhao, XU Xiangliang, YUE Qiang, et al. A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search[EB/OL]. (2021-01-29)[2023-10-18]. http://arxiv.org/abs/2101.12631.
81 VIRMAJOKI O, FRANTI P. Divide-and-conquer algorithm for creating neighborhood graph for clustering[C]//Proceedings of the 17th International Conference on Pattern Recognition. Cambridge: IEEE, 2004: 264-267.
82 DONG W, MOSES C, LI K. Efficient k-nearest neighbor graph construction for generic similarity measures[C]//Proceedings of the 20th International Conference on World Wide Web. Hyderabad: ACM, 2011: 577-586.
83 HAJEBI K, ABBASI-YADKORI Y, SHAHBAZI H, et al. Fast approximate nearest-neighbor search with k-nearest neighbor graph[C]//Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Two. Barcelona: ACM, 2011: 1312-1317.
84 UNO T, SUGIYAMA M, TSUDA K. Efficient construction of neighborhood graphs by the multiple sorting method[EB/OL]. (2009-04-21)[203-10-18]. http://arxiv.org/abs/0904.3151.
85 FU Cong, CAI Deng. EFANNA: an extremely fast approximate nearest neighbor search algorithm based on kNN graph[EB/OL]. (2016-09-23)[2023-10-18]. http://arxiv.org/abs/1609.07228.
86 MALKOV Y , PONOMARENKO A , LOGVINOV A , et al. Approximate nearest neighbor algorithm based on navigable small world graphs[J]. Information Systems, 2014, 45, 61- 68.
87 MALKOV Y A , YASHUNIN D A . Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (4): 824- 836.
88 YAHOOJAPAN. Github-yahoojapan/NGT: neighborhood graph and tree for indexing high-dimensional data[EB/OL]. [2023-08-22]. https://github.com/yahoojapan/NGT.
89 FU Cong, XIANG Chao, WANG Changxu, et al. Fast approximate nearest neighbor search with the navigating spreading-out graph[EB/OL]. (2017-07-17)[2023-10-13]. https://arxiv.org/abs/1707.00143.
90 JAYARAM Subramanya S, DEVVRIT F, SIMHADRI H V, et al. DiskANN: fast accurate billion-point nearest neighbor search on a single node[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 13766-13776.
91 HARWOOD B, DRUMMOND T. FANNG: fast approximate nearest neighbour graphs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 5713-5722.
92 BARANCHUK D, PERSIYANOV D, SINITSIN A, et al. Learning to route in similarity graphs[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach: PMLR, 2019: 475-484.
93 LI C L, ZHANG M J, ANDERSEN D G, et al. Improving approximate nearest neighbor search through learned adaptive early termination[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. Portland: ACM, 2020: 2539-2554.
94 MUÑOZ J A V , DIAS Z , DA SILVA TORRES R . A genetic programming approach for searching on nearest neighbors graphs[J]. Multimedia Tools and Applications, 2022, 81 (16): 23449- 23472.
[1] 张凌,任雪芳. 数据智能分类与分类智能检索-识别[J]. 《山东大学学报(理学版)》, 2020, 55(10): 7-14.
[2] 曾雪强,叶震麟,左家莉,万中英,吴水秀. 一种块增量偏最小二乘算法[J]. 《山东大学学报(理学版)》, 2019, 54(3): 93-101.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 程智1,2,孙翠芳2,王宁1,杜先能1. 关于Zn的拉回及其性质[J]. J4, 2013, 48(2): 15 -19 .
[2] 张爱平,李刚 . LR拟正规Ehresmann半群[J]. J4, 2006, 41(5): 44 -47 .
[3] 王康 李华. 化学计量学方法用于蛤青注射色谱数据重叠峰的分辨[J]. J4, 2009, 44(11): 16 -20 .
[4] 陈 莉, . 非方广义系统带干扰抑制的奇异LQ次优控制问题[J]. J4, 2006, 41(2): 74 -77 .
[5] 赵同欣1,刘林德1*,张莉1,潘成臣2,贾兴军1. 紫藤传粉昆虫与花粉多型性研究[J]. 山东大学学报(理学版), 2014, 49(03): 1 -5 .
[6] 马建玲 . 菱体型消色差相位延迟器的光谱特性分析[J]. J4, 2007, 42(7): 27 -29 .
[7] 霍玉洪,季全宝. 一类生物细胞系统钙离子振荡行为的同步研究[J]. J4, 2010, 45(6): 105 -110 .
[8] 石长光 . Faddeev模型中的多孤立子解[J]. J4, 2007, 42(7): 38 -40 .
[9] 马继雄,江莉,祁驭矜,向凤宁,夏光敏 . 祁连龙胆愈伤组织和再生植株的生长及其两种药效成分分析[J]. J4, 2006, 41(6): 157 -160 .
[10] 边斐,何海伦,陈秀兰*,张玉忠 . 离子对深海适冷菌Pseudoalteromonas sp. SM9913胞外蛋白酶分泌的影响[J]. J4, 2006, 41(5): 166 -172 .