您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2021, Vol. 56 ›› Issue (12): 84-93.doi: 10.6040/j.issn.1671-9352.0.2021.065

• • 上一篇    

基于spike-and-slab先验分布的贝叶斯变量选择方法

张宪友,李东喜Symbolj@@   

  1. 太原理工大学数学学院, 山西 太原 030024
  • 发布日期:2021-11-25
  • 作者简介:张宪友(1995— ),男,硕士研究生,研究方向为高维数据分析. E-mail:jiyoudog@sina.com*通信作者简介:李东喜(1982— ),男,博士,副教授,研究方向为数据挖掘. E-mail:dxli0426@126.com
  • 基金资助:
    国家自然科学基金资助项目(11571009);山西省应用基础研究计划资助项目(201901D111086)

A Bayesian approach for variable selection using spike-and-slab prior distribution

ZHANG Xian-you, LI Dong-xi*   

  1. College of Mathematics, Taiyuan University of Technology, Taiyuan 030024, Shanxi, China
  • Published:2021-11-25

摘要: 针对超高维数据,提出一种基于spike-and-slab先验分布的超高维线性回归模型的贝叶斯变量选择方法。该方法继承了弹性网方法和EM算法的优点,以较快的收敛速度来获得稀疏的预测模型。特别地,针对系数的spike-and-slab先验分布设置上,该方法允许系数从不同坐标借力、自动适应已知数据的稀疏信息以及进行多重调整。通过与常用方法的比较,证明了该方法的准确性和有效性。

关键词: 变量选择, 超高维, spike-and-slab先验分布, 弹性网, 稀疏模型

Abstract: For ultra-high dimensional data, a Bayesian approach using a novel spike-and-slab prior for variable selection in high-dimensional linear regression models is presented. The proposed method aims to inherit the advantages of the elastic net and the EM algorithm to obtain sparse prediction models with faster convergence speed. Furthermore, a spike-and-slab setting of coefficients which allows for borrowing strength across coordinates, adjust to data sparsity information and exert multiplicity adjustment is proposed. Finally, the accuracy and efficiency of the proposed method are demonstrated via comparisons and analyses with common methods.

Key words: variable selection, high dimensional, spike-and-slab prior distribution, elastic net, sparse model

中图分类号: 

  • O212.1
[1] BÜHLMANN P, DRINEAS P, KANE M, et al. Handbook of big data[M]. New York: CRC Press, 2016.
[2] LEE K E, SHA N, DOUGHERTY E R, et al. Gene selection: a Bayesian variable selection approach[J]. Bioinformatics, 2003, 19(1):90-97.
[3] TIBSHIRANI R. Regression shrinkage and selection via the Lasso[J]. Journal of the Royal Statistical Society: Series B, 1996, 58(1):267-288.
[4] FAN Jianqing, LI Runze. Variable selection via nonconcave penalized likelihood and its Oracle properties[J]. Journal of the American Statistical Association, 2001, 96(456):1348-1360.
[5] ZOU H, HASTIE T. Regularization and variable selection via the elastic net[J]. Journal of the Royal Statistical Society: Series B, 2005, 67(2):301-320.
[6] ZOU Hui. The adaptive Lasso and its oracle properties[J]. Journal of the American Statistical Association, 2006, 101(476):1418-1429.
[7] 曾津,周建军.高维数据变量选择方法综述[J].数理统计与管理,2017,36(4):678-692. ZENG Jin, ZHOU Jianjun. A review of variable selection methods for high dimensional data[J]. Journal of Applied Statistics and Management, 2017, 36(4):678-692.
[8] 王小燕,谢邦昌,马双鸽,等.高维数据下群组变量选择的惩罚方法综述[J].数理统计与管理, 2015, 34(6):978-988. WANG Xiaoyan, XIE Bangchang, MA Shuangge, et al. Summary of punishment methods for group variable selection under high dimensional data[J]. Journal of Applied Statistics and Management, 2015, 34(6):978-988.
[9] DONOHO D L, HUO X. Uncertainty principles and ideal atomic decomposition[J]. IEEE Transactions on Information Theory, 2001, 47(7):2845-2862.
[10] WANG H, LI G, TSAI C L. Regression coefficient and autoregressive order shrinkage and selection via the Lasso[J]. Journal of the Royal Statistical Society: Series B, 2007, 69(1):63-78.
[11] PARK T, CASELLA G. The Bayesian Lasso[J]. Journal of the American Statistical Association, 2008, 103(482):681-686.
[12] ALHAMZAWI R, YU K, BENOIT D F. Bayesian adaptive Lasso quantile regression[J]. Statistical Modelling, 2012, 12(3):279-297.
[13] HUANG Anhui, XU Shizhong, CAI Xiaodong. Empirical Bayesian elastic net for multiple quantitative trait locus mapping[J]. Heredity, 2015, 114(1):107-115.
[14] FISHER C K, MEHTA P. Bayesian feature selection for high-dimensional linear regression via the ising approximation with applications to genomics[J]. Bioinformatics, 2015, 31(11):1754-1761.
[15] GEORGE E I, MCCULLOCH R E. Variable selection via Gibbs sampling[J]. Journal of the American Statistical Association, 1993, 88(423):881-889.
[16] ROCKOVA V, GEORGE E I. The spike-and-slab Lasso[J]. Journal of the American Statistical Association, 2018, 113(521):431-444.
[17] UEDA N, NAKANO R. Deterministic annealing EM algorithm[J]. Neural Networks, 1998, 11(2):271-282.
[18] JOHNSON R W. Fitting percentage of body fat to simple body measurements[J]. Journal of Statistics Education, 1996, 4(1):265-266.
[19] LENG C, TRAN M N, NOTT D. Bayesian adaptive Lasso[J]. Annals of the Institute of Statistical Mathematics, 2014, 66(2):221-244.
[1] 王秀丽. 基于DP算法的变量选择[J]. 《山东大学学报(理学版)》, 2021, 56(9): 81-86.
[2] 甘信军, 杨维强. 证据权重方法与信用风险控制[J]. 山东大学学报(理学版), 2014, 49(12): 55-59.
[3] 李锋1,卢一强2. 部分线性模型的LASSO估计及其渐近性[J]. J4, 2012, 47(3): 93-97.
[4] 王树云1,宋云胜2. 线性模型下基于AIC准测的Bayes变量选择[J]. J4, 2010, 45(6): 43-45.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!