《山东大学学报(理学版)》 ›› 2025, Vol. 60 ›› Issue (1): 14-28.doi: 10.6040/j.issn.1671-9352.4.2023.0212
• • 上一篇
温柳英,吴俊,闵帆
WEN Liuying, WU Jun, MIN Fan
摘要: 针对微生物数据类内和类间不平衡、高稀疏性的问题,提出一种融合矩阵分解和空间划分的数据扩增算法。采用矩阵分解技术将原始数据空间分解为对象子空间和特征子空间,提取潜在空间表示,对象子空间划分为多个数据子空间,缓解了类内不平衡问题。为了解决类间不平衡问题,在每个数据子空间中生成合成样本,使用欧氏距离对合成样本进行过滤,获得高质量的样本。在9个微生物数据集上实验,再与9个采样算法进行性能对比。 结果表明,本文算法生成的样本在多样性上具有较大优势,采用多个分类器时,能识别出更多的阳性样本。
中图分类号:
[1] WEN Liuying, WANG Xi, MIN Fan. Cost-sensitive microbial data augmentation through matrix factorization[J]. Applied Intelligence, 2022, 53(10):12684-12700. [2] ZHANG Chong, TAN Kaychen, LI Haizhou, et al. A cost-sensitive deep belief network for imbalanced classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(1):109-122. [3] LAPIERRE N, WANG W, ZHOU G, et al. Metapheno: a critical evaluation of deep learning and machine learning in metagenome-based disease prediction[J]. Methods, 2019, 15(166):74-82. [4] ZHANG Yong, ZHANG Heping. Microbiota associated with type 2 diabetes and its related complications[J]. Food Science and Human Wellness, 2013, 2(3):167-172. [5] 张玉凤,荆功超,李劲华,等. 基于微生物组大数据搜索的疾病检测[J]. 科学, 2021, 73(2):24-30. ZHANG Yufeng, JING Gongchao, LI Jinhua, et al. Disease detection based on microbiome big data search[J]. Science, 2021, 73(2):24-30. [6] GARCIA V, SANCHEZ S J, MARTIN F R, et al. Surrounding neighborhood-based smote for learning from imbalanced data sets[J]. Progress in Artificial Intelligence, 2013, 1(4):347-362. [7] SPELMEN V S, PORKODI R. A review on handling imbalanced data[C] //International Conference on Current Trends Towards Converging Technologies(ICCTCT), Coimbatore. New Delhi: IEEE, 2018:1-11. [8] NGUYEN H T, TRAN T B, BUI M Q, et al. Enhancing disease prediction on imbalanced metagenomic dataset by cost-sensitive[J]. International Journal of Advanced Computer Science and Applications, 2020, 11(7):1-6. [9] PETROSINO J F. The microbiome in precision medicine: the way forward[J]. Genome Medicine, 2018, 10(1):1-4. [10] PENG Minlong, ZHANG Qi, XING Xiaoyu, et al. Trainable undersampling for class-imbalance learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1):4707-4714. [11] BARUA S, ISLAM M M, YAO X, et al. MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(2):405-425. [12] CHAWLA V N, BOWYER W K, HALL O L, et al. SMOTE: synthetic minority over-sampling technique[J]. The Journal of Artificial Intelligence Research, 2002, 16(1):321-357. [13] LI Wenjie. Imbalanced data optimization combining k-means and smote[J]. International Journal of Performability Engineering, 2019, 15(8):2173-2181. [14] LI Ma, FAN Suohai. Cure-smote algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests[J]. BMC Bioinformatics, 2017, 18(1):1-16. [15] BUNKHUMPORNPAT C, SINAPIROMSARAN K. DBMUTE: density-based majority under-sampling technique[J]. Knowledge and Information Systems, 2017, 50(3):827-850. [16] 赵增,李明勇,胡航飞. 基于邻居聚类的近似最近邻搜索[J]. 智能计算机与应用, 2020, 10(11):70-78. ZHAO Zeng, LI Mingyong, HU Hangfei. Approximate nearest neighbor search based on neighbor clustering[J]. Intelligent Computers and Applications, 2020, 10(11):70-78. [17] 卫泽刚, 侯一凡, 张小丹, 等. 微生物操作分类单元划分算法研究[J]. 宝鸡文理学院学报, 2022, 42(1):80-88. WEI Zegang, HOU Yifan, ZHANG Xiaodan, et al. Research on the algorithm for division of microbial operation taxonomic units[J]. Journal of Baoji University of Arts and Sciences, 2022, 42(1):80-88. [18] ANDREAS H, VAKHTANG K. SVD approach to data unfolding[J]. Nuclear Instruments & Methods in Physics Research Section(Aaccelerators Spectrometers Detectors and Associated Equipment), 1996, 372(3):469-481. [19] LI Wuzhou, LIANG Zhiwen, CAO Yi, et al. Estimating intrafraction tumor motion during fiducial-based liver stereotactic radiotherapy via an iterative closest point(ICP)algorithm[J]. Radiation Oncology, 2019, 14(1):1-8. [20] YUE Xiaokui, LIU Qicheng. Improved funkSVD algorithm based on RMSProp[J]. Journal of Circuits, Systems and Computers, 2022, 31(8):1-14. [21] RAJEEV K, VERMA B K, SHYAM R S. Social popularity based SVD++ recommender system[J]. International Journal of Computer Applications, 2014, 87(14):33-37. [22] 徐彭娜,魏静,林劼,等. 基于位置信息熵的局部敏感哈希聚类方法[J]. 计算机应用与软件, 2018, 35(3):230-235. XU Pengna, WEI Jing, LIN Jie, et al. Locality-sensitive hash clustering method based on location information entropy [J]. Computer Applications and Software, 2018, 35(3):230-235. [23] RAM P, GRAY A G. Which space partitioning tree to use for search?[C] //Annual Conference on Neural Information Processing Systems. Lake Tahoe: NeurIPS, 2013:1-9. [24] SANJOY D, YOAV F. Random projection trees and low dimensional manifolds[C] //Annual ACM symposium on Theory of Computing. Baltimore, MD: Dove Medical Press, 2008:537-546. [25] JIANG Kun, LU Jingshu, XIA Kuiliang. A novel algorithm for imbalance data classification based on genetic algorithm improved smote[J]. Arabian Journal for Science and Engineering, 2016, 41(8):3255-3266. [26] WEN Liuying, ZHANG Xiaomin, MIN Fan, et al. KGA: integrating KPCA and GAN for microbial data augmentation[J]. International Journal of Machine Learning and Cybernetics, 2022, 14(4):1427-1444. [27] 王曦,温柳英,闵帆. 融合矩阵分解和代价敏感的微生物数据扩增算法[J]. 数据采集与处理, 2023, 38(2):1-12. WANG Xi, WEN Liuying, MIN Fan. Combining matrix decomposition and cost-sensitive microbial data augmentation algorithm [J]. Journal of Data Acquisition & Processing, 2023, 38(2):1-12. [28] BATISTA G E A P A, PRATI C R, MONARD C R. A study of the behavior of several methods for balancing machine learning training data[J]. Association for Computing Machinery, 2004, 6(1):20-29. [29] HE H B, BAI Y, EDWARDO A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C] //IEEE International Joint Conference on Neural Networks. Atlanta, GA: IEEE, 2008:1322-1328. [30] WANG Juanjuan, XU Mantao, WANG Hui, et al. Classification of imbalanced data by using the SMOTE algorithm and locally linear embedding[C] //International Conference on Signal Processing. Guilin: IEEE, 2006:1-4. |
[1] | 吴贤君,唐绍诗,王明秋. 融合基础属性和通信行为的移动用户个性化推荐[J]. 《山东大学学报(理学版)》, 2023, 58(9): 81-93. |
[2] | 韦芳,王长鹏. 基于双高斯先验的低秩矩阵分解模型[J]. 《山东大学学报(理学版)》, 2023, 58(3): 101-108. |
[3] | 李心雨,范辉,刘惊雷. 基于自适应图调节和低秩矩阵分解的鲁棒聚类[J]. 《山东大学学报(理学版)》, 2022, 57(8): 21-38. |
[4] | 柳利芳,马园园. 基于多视角对称非负矩阵分解的跨模态信息检索方法[J]. 《山东大学学报(理学版)》, 2022, 57(7): 65-72. |
[5] | 晏燕,郝晓弘. 差分隐私密度自适应网格划分发布方法[J]. 山东大学学报(理学版), 2018, 53(9): 12-22. |
[6] | 黄淑芹,徐勇,王平水. 基于概率矩阵分解的用户相似度计算方法及推荐应用[J]. 山东大学学报(理学版), 2017, 52(11): 37-43. |
[7] | 杨元慧,李国栋,吴春富,王小龙. 单目视觉SLAM车载摄像机快速位姿估计及景物重构[J]. 山东大学学报(理学版), 2016, 51(12): 116-124. |
[8] | 唐庆顺, 吴春富, 李国栋, 王小龙, 周风余. 移动机器人车载摄像机位姿的高精度快速求解[J]. 山东大学学报(理学版), 2015, 50(03): 32-39. |
[9] | 吴春富1,唐庆顺1,谢煌生1,周风余2*. 一种新型的本质矩阵解析分解算法[J]. 山东大学学报(理学版), 2014, 49(03): 31-36. |
[10] | 杜吉祥1,2,余庆1,翟传敏1. 基于稀疏性约束非负矩阵分解的人脸年龄估计方法[J]. J4, 2010, 45(7): 65-69. |
|