JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2025, Vol. 60 ›› Issue (1): 14-28.doi: 10.6040/j.issn.1671-9352.4.2023.0212

Previous Articles    

Fusing matrix factorization and space partition microbial data augmentation algorithm

WEN Liuying, WU Jun, MIN Fan   

  1. School of Computer and Software, Southwest Petroleum University, Chengdu 610500, Sichuan, China
  • Published:2025-01-10

Abstract: Aiming at the problems of intra-class imbalance and inter-class imbalance and high sparsity of microbial data, a data augmentation method that fuses matrix factorization and space partition is proposed. Matrix factorization technology is used to decompose the original data space into object subspace and feature subspace to extract the latent space representation. The object subspace is divided into multiple data subspaces to alleviate the intra-class imbalance problem. Synthetic samples are then generated in each data subspace to address the inter-class imbalance. Synthetic samples are filtered using Euclidean distance to obtain high-quality samples. The experiment is conducted on 9 microbial data sets, and the performance is compared with 9 sampling algorithms. The results show that the samples generated by the proposed method have great advantages in diversity, and more positive samples can be identified under multiple classifiers.

Key words: matrix factorization, space partition, intra-class imbalance, inter-class imbalance, object subspace, feature subspace

CLC Number: 

  • TP391
[1] WEN Liuying, WANG Xi, MIN Fan. Cost-sensitive microbial data augmentation through matrix factorization[J]. Applied Intelligence, 2022, 53(10):12684-12700.
[2] ZHANG Chong, TAN Kaychen, LI Haizhou, et al. A cost-sensitive deep belief network for imbalanced classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(1):109-122.
[3] LAPIERRE N, WANG W, ZHOU G, et al. Metapheno: a critical evaluation of deep learning and machine learning in metagenome-based disease prediction[J]. Methods, 2019, 15(166):74-82.
[4] ZHANG Yong, ZHANG Heping. Microbiota associated with type 2 diabetes and its related complications[J]. Food Science and Human Wellness, 2013, 2(3):167-172.
[5] 张玉凤,荆功超,李劲华,等. 基于微生物组大数据搜索的疾病检测[J]. 科学, 2021, 73(2):24-30. ZHANG Yufeng, JING Gongchao, LI Jinhua, et al. Disease detection based on microbiome big data search[J]. Science, 2021, 73(2):24-30.
[6] GARCIA V, SANCHEZ S J, MARTIN F R, et al. Surrounding neighborhood-based smote for learning from imbalanced data sets[J]. Progress in Artificial Intelligence, 2013, 1(4):347-362.
[7] SPELMEN V S, PORKODI R. A review on handling imbalanced data[C] //International Conference on Current Trends Towards Converging Technologies(ICCTCT), Coimbatore. New Delhi: IEEE, 2018:1-11.
[8] NGUYEN H T, TRAN T B, BUI M Q, et al. Enhancing disease prediction on imbalanced metagenomic dataset by cost-sensitive[J]. International Journal of Advanced Computer Science and Applications, 2020, 11(7):1-6.
[9] PETROSINO J F. The microbiome in precision medicine: the way forward[J]. Genome Medicine, 2018, 10(1):1-4.
[10] PENG Minlong, ZHANG Qi, XING Xiaoyu, et al. Trainable undersampling for class-imbalance learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1):4707-4714.
[11] BARUA S, ISLAM M M, YAO X, et al. MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(2):405-425.
[12] CHAWLA V N, BOWYER W K, HALL O L, et al. SMOTE: synthetic minority over-sampling technique[J]. The Journal of Artificial Intelligence Research, 2002, 16(1):321-357.
[13] LI Wenjie. Imbalanced data optimization combining k-means and smote[J]. International Journal of Performability Engineering, 2019, 15(8):2173-2181.
[14] LI Ma, FAN Suohai. Cure-smote algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests[J]. BMC Bioinformatics, 2017, 18(1):1-16.
[15] BUNKHUMPORNPAT C, SINAPIROMSARAN K. DBMUTE: density-based majority under-sampling technique[J]. Knowledge and Information Systems, 2017, 50(3):827-850.
[16] 赵增,李明勇,胡航飞. 基于邻居聚类的近似最近邻搜索[J]. 智能计算机与应用, 2020, 10(11):70-78. ZHAO Zeng, LI Mingyong, HU Hangfei. Approximate nearest neighbor search based on neighbor clustering[J]. Intelligent Computers and Applications, 2020, 10(11):70-78.
[17] 卫泽刚, 侯一凡, 张小丹, 等. 微生物操作分类单元划分算法研究[J]. 宝鸡文理学院学报, 2022, 42(1):80-88. WEI Zegang, HOU Yifan, ZHANG Xiaodan, et al. Research on the algorithm for division of microbial operation taxonomic units[J]. Journal of Baoji University of Arts and Sciences, 2022, 42(1):80-88.
[18] ANDREAS H, VAKHTANG K. SVD approach to data unfolding[J]. Nuclear Instruments & Methods in Physics Research Section(Aaccelerators Spectrometers Detectors and Associated Equipment), 1996, 372(3):469-481.
[19] LI Wuzhou, LIANG Zhiwen, CAO Yi, et al. Estimating intrafraction tumor motion during fiducial-based liver stereotactic radiotherapy via an iterative closest point(ICP)algorithm[J]. Radiation Oncology, 2019, 14(1):1-8.
[20] YUE Xiaokui, LIU Qicheng. Improved funkSVD algorithm based on RMSProp[J]. Journal of Circuits, Systems and Computers, 2022, 31(8):1-14.
[21] RAJEEV K, VERMA B K, SHYAM R S. Social popularity based SVD++ recommender system[J]. International Journal of Computer Applications, 2014, 87(14):33-37.
[22] 徐彭娜,魏静,林劼,等. 基于位置信息熵的局部敏感哈希聚类方法[J]. 计算机应用与软件, 2018, 35(3):230-235. XU Pengna, WEI Jing, LIN Jie, et al. Locality-sensitive hash clustering method based on location information entropy [J]. Computer Applications and Software, 2018, 35(3):230-235.
[23] RAM P, GRAY A G. Which space partitioning tree to use for search?[C] //Annual Conference on Neural Information Processing Systems. Lake Tahoe: NeurIPS, 2013:1-9.
[24] SANJOY D, YOAV F. Random projection trees and low dimensional manifolds[C] //Annual ACM symposium on Theory of Computing. Baltimore, MD: Dove Medical Press, 2008:537-546.
[25] JIANG Kun, LU Jingshu, XIA Kuiliang. A novel algorithm for imbalance data classification based on genetic algorithm improved smote[J]. Arabian Journal for Science and Engineering, 2016, 41(8):3255-3266.
[26] WEN Liuying, ZHANG Xiaomin, MIN Fan, et al. KGA: integrating KPCA and GAN for microbial data augmentation[J]. International Journal of Machine Learning and Cybernetics, 2022, 14(4):1427-1444.
[27] 王曦,温柳英,闵帆. 融合矩阵分解和代价敏感的微生物数据扩增算法[J]. 数据采集与处理, 2023, 38(2):1-12. WANG Xi, WEN Liuying, MIN Fan. Combining matrix decomposition and cost-sensitive microbial data augmentation algorithm [J]. Journal of Data Acquisition & Processing, 2023, 38(2):1-12.
[28] BATISTA G E A P A, PRATI C R, MONARD C R. A study of the behavior of several methods for balancing machine learning training data[J]. Association for Computing Machinery, 2004, 6(1):20-29.
[29] HE H B, BAI Y, EDWARDO A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C] //IEEE International Joint Conference on Neural Networks. Atlanta, GA: IEEE, 2008:1322-1328.
[30] WANG Juanjuan, XU Mantao, WANG Hui, et al. Classification of imbalanced data by using the SMOTE algorithm and locally linear embedding[C] //International Conference on Signal Processing. Guilin: IEEE, 2006:1-4.
[1] Xianjun WU,Shaoshi TANG,Mingqiu WANG. Personalized recommendation of mobile users by integrating basic information and communication behavior [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(9): 81-93.
[2] LIU Li-fang, MA Yuan-yuan. Cross-modal information retrieval method based on multi-view symmetric nonnegative matrix factorization [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(7): 65-72.
[3] HUANG Shu-qin, XU Yong, WANG Ping-shui. User similarity calculation method based on probabilistic matrix factorization and its recommendation application [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(11): 37-43.
[4] DU Ji-xiang1,2, YU Qing1, ZHAI Chuan-ming1. Age estimation of facial images based on non-negative matrix factorization with sparseness constraints [J]. J4, 2010, 45(7): 65-69.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!