《山东大学学报(理学版)》 ›› 2019, Vol. 54 ›› Issue (3): 85-92, 101.doi: 10.6040/j.issn.1671-9352.0.2018.261
徐炜娜*(),张广乐,李仕红,陈园园,李强,杨涛,许明敏,乔宁,张良云()
Wei-na XU*(),Guang-le ZHANG,Shi-hong LI,Yuan-yuan CHEN,Qiang LI,Tao YANG,Ming-min XU,Ning QIAO,Liang-yun ZHANG()
摘要:
为了深入了解和探索lincRNA的调控机制,建立了lincRNA高效识别模型,有助于为后续研究提供数据源。依据最小自由能(minimum free energy, MFE)和信噪比(signal-noise ratio, SNR)等特征,并通过特征贡献度大小剔除冗余特征,构建随机森林(random forest, RF)分类模型,有效地识别lincRNAs。经检验,模型的灵敏度、特异性和精确度分别达到94.1%、93.2%和93.7%,高于现有PhyloCSF、LncRNA-ID和CPC方法的各项识别指标。模型在识别过程中表现出较好的鲁棒性,可准确识别lincRNA。
中图分类号:
1 |
PONTING C , OLIVER P , REIK W . Evolution and functions of long noncoding RNAs[J]. Cell, 2009, 136 (4): 629- 641.
doi: 10.1016/j.cell.2009.02.006 |
2 |
CABILI M N , TRAPNELL C , GOFF L , et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses[J]. Genes, 2011, 25 (18): 1915- 1927.
doi: 10.1101/gad.17446611 |
3 | ØROM UA , THOMAS D , MALTE B , et al. Long noncoding RNAs with enhancer-like function in human cells[J]. Cell, 2011, 27 (4): 46- 58. |
4 | GUTTMAN M , AMIT I , GARBER M , et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals[J]. Nature, 2009, 458 (12): 223- 227. |
5 |
ULITSKY I , SHKUMATAVA A , JAN C H , et al. Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution[J]. Cell, 2011, 147 (7): 1537- 1550.
doi: 10.1016/j.cell.2011.11.055 |
6 |
CAO C H , ZHANG D , GUO X . The long intergenic noncoding RNA UFC1, a target of microRNA 34a, interacts with the mRNA stabilizing protein HuR to increase levels of β-catenin in Hcc cells[J]. Gastroenterology, 2015, 148 (2): 415- 426.
doi: 10.1053/j.gastro.2014.10.012 |
7 | 翁侠, 洪晓明. LincRNA-PVT1在甲状腺癌组织中的表达及意义[J]. 实用肿瘤杂志, 2017, 32 (1): 57- 61. |
WENG Xia , HONG Xiaoming . Expression of lincRNA-PVT1 in thyroid carcinoma and its clinicopathological significance[J]. Journal of Practical Oncology, 2017, 32 (1): 57- 61. | |
8 |
TSENG Y Y , MORIARITY B S , GONG W , et al. PVT1 dependence in cancer with MYC copy-number increase[J]. Nature, 2014, 512 (7512): 82- 86.
doi: 10.1038/nature13311 |
9 |
PAULI A , RINN J L , SCHIER A F . Non-coding RNAs as regulators of embryo genesis[J]. Nat Rev Genet, 2011, 12 (2): 136- 149.
doi: 10.1038/nrg2904 |
10 |
PAULI A , VALEN E , LIN M F , et al. Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis[J]. Genome Res, 2012, 22 (3): 577- 591.
doi: 10.1101/gr.133009.111 |
11 |
CABILI M N , TRAPNELL C , GOFF L , et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses[J]. Genes, 2011, 25 (18): 1915- 1927.
doi: 10.1101/gad.17446611 |
12 | SUN K , CHEN X N , JIANG P Y , et al. iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data[J]. BMC Genomics, 2013, 14 (S2): 13- 23. |
13 | 施伟, 赵健, 宋晓峰, 等. LincRNA的研究进展[J]. 现代生物医学进展, 2016, 16 (9): 1762- 1765. |
SHI Wei , ZHAO Jian , SONG Xiaofeng , et al. Research progress of LincRNA[J]. Progress in Modern Biomedicine, 2016, 16 (9): 1762- 1765. | |
14 |
LIN M F , JUNGREIS I , KELLIS M . PhyloCSF:a comparative genomics method to distinguish protein coding and non-coding regions[J]. Bioinformatics, 2011, 27 (13): i275- i282.
doi: 10.1093/bioinformatics/btr209 |
15 |
PIAN C , ZHANG G , CHEN Z , et al. LncRNApred:classification of long non-coding RNAs and protein-coding transcripts by the ensemble algorithm with a new hybrid feature[J]. PLOS ONE, 2016, 11 (5): e0154567.
doi: 10.1371/journal.pone.0154567 |
16 | ACHAWANANTAKUN R , CHEN J , SUN Y , et al. LncRNA-ID: long non-coding RNA Identification using balanced random forests[J]. Bioinformatics, 2015, 31 (24): 3897- 390. |
17 | KONG L , ZHANG Y , YE Z , et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector[J]. Nucleic Acids Res, 2007, 35 (Web Server issue): 345- 349. |
18 | BU D , YU K , SUN S , et al. NONCODE v3.0:integrative annotation of long noncoding RNAs[J]. Nucleic Acids Res, 2012, 36 (8): 210- 215. |
19 |
SPEIR M L , ZWEIG A S , ROSENBLOOM K R , et al. The UCSC genome browser database:2016 update[J]. Nucleic Acids Res, 2016, 44 (D1): D717.
doi: 10.1093/nar/gkv1275 |
20 |
TINOCO I , BORER P N , DENGLER B , et al. Improved estimation of secondary structure in ribonucleic acids[J]. Nat New Biol, 1973, 246 (150): 40- 41.
doi: 10.1038/newbio246040a0 |
21 |
BONNET E , WUYTS J , PIERRE Y , et al. Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences[J]. Bioinformatics, 2004, 20 (17): 2911- 2917.
doi: 10.1093/bioinformatics/bth374 |
22 |
DING X , ZHU L , JI T , et al. Long intergenic Non-Coding RNAs(LincRNAs) identified by RNA-Seq in breast cancer[J]. PLOS ONE, 2014, 9 (8): e103270.
doi: 10.1371/journal.pone.0103270 |
23 |
HUANG T , CHANG H Y . Long noncoding RNA in genome regulation: prospects and mechanisms[J]. RNA Biol, 2010, 7 (5): 582- 585.
doi: 10.4161/rna.7.5.13216 |
24 |
YAN M , LIN Z S , ZHANG C T . A new fourier transform approach for protein coding measure based on the format of the Z-curve[J]. Bioinformatics, 1998, 14 (8): 685- 690.
doi: 10.1093/bioinformatics/14.8.685 |
25 |
LIU G , LUAN Y . An adaptive integrated algorithm for noninvasive fetal ECG separation and noise reduction based on ICA-EEMD-WS[J]. Med Biol Eng Comput, 2015, 53 (11): 1113- 1127.
doi: 10.1007/s11517-015-1389-1 |
26 |
YIN C , YAU S S . Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence[J]. Theor Biol, 2007, 247 (4): 687- 694.
doi: 10.1016/j.jtbi.2007.03.038 |
27 |
KAPRANOV P , CHENG J , DIKE S , et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription[J]. Science, 2007, 316 (5830): 1484- 1488.
doi: 10.1126/science.1138341 |
28 |
COMPEAU P , PEVZNER P , TESLER G . How to apply de Bruijn graphs to genome assembly[J]. Nat Biotechnology, 2011, 29 (11): 987- 991.
doi: 10.1038/nbt.2023 |
29 |
HURST L D , MERCHANT A R . High guanine-cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes[J]. The Royal Society, 2001, 268 (1466): 493- 497.
doi: 10.1098/rspb.2000.1397 |
30 | FREYHULT E , GARDNER P P , MOULTON V . A comparison of RNA folding measures[J]. BMC Bioinformatics, 2005, 6 (1): 241. |
31 | SOPHIA S , LEE F , SUN L , et al. EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis[J]. Bioinformatics, 2008, 5 (21): 1603- 1610. |
32 | ROBIN G , JEAN-MICHEL P , CHRISTINE T M . VSURF: an R package for variable selection using random forests[J]. Computing, 2016, 7 (2): 19- 31. |
33 | HUANG G B , ZHU Q Y , SIEW C K . Extreme learning machine: a new learning scheme of feed forward neural networks[J]. Proc Int Joint Conf Neural Netw, 2004, 2 (2): 985- 990. |
34 | VLADIMIR V , CORINNA C . Support-vector networks[J]. Machine Learning, 1995, 20 (3): 273- 297. |
35 |
BREIMAN L . Random forest[J]. Machine Learning, 2001, 45 (1): 5- 32.
doi: 10.1023/A:1010933404324 |
36 | JESSE D , MARK G . The relationship between Precision-Recall and ROC curves[J]. ICML, 2006, 6 (23): 233- 240. |
37 |
ATAPATTU S , TELLAMBURA C , JIANG H , et al. Analysis of area under the ROC curve of energy detection[J]. IEEE Transactions on Wireless Communications, 2010, 9 (3): 1216- 1225.
doi: 10.1109/TWC.2010.03.091085 |
No related articles found! |
|