JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2016, Vol. 51 ›› Issue (3): 104-110.doi: 10.6040/j.issn.1671-9352.1.2015.025

Previous Articles     Next Articles

Semantic output output-based disease-protein knowledge extraction

LI Zhi-heng, YANG Zhi-hao, LIN Hong-fei*   

  1. School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
  • Received:2015-11-10 Online:2016-03-20 Published:2016-04-07

Abstract: With the rapid development of genomics and high-throughput technologies, large amount of biomedical literatures about genes and proteins appear. Meanwhile, the use of text mining technology discovery and excavation of new, valuable knowledge of protein from the mass of medical texts has become possible. This paper presents a system which extracts the relations between proteins and certain diseases from biomedical literature based on semantic output generated by SemRep, and then extracts novel, valuable protein knowledge. The system summarizes the salient relations by the salient extraction algorithm using the specific subject MEDLINE corpus. Subsequently, the results extracted by the system are compared with data from KEGG database. Implementation of the system has important significance for understanding the major causes of many diseases, protein function prediction and drug-aided design.

Key words: KEGG, semantic relation, SemRep, information extraction

CLC Number: 

  • TP391
[1] GOLDER S, MCINTOSH H M, DUFFY S, et al. Developing efficient search strategies to identify reports of adverse effects in MEDLINE and EMBASE[J]. Health Information & Libraries Journal, 2006, 23(1):3-12.
[2] KILICOGLU H, FISZMAN M, RODRIGUEZ A, et al. Semantic MEDLINE: a web application for managing the results of PubMed Searches[C] // Proceedings of the Third international Symposium for Semantic Mining in Biomedicine, 2008, 2008:69-76.
[3] TSURUOKA Y, MIWA M, HAMAMOTO K, et al. Discovering and visualizing indirect associations between biomedical concepts[J]. Bioinformatics, 2011, 27(13):i111-i119.
[4] FISZMAN M, DEMNER-FUSHMAN D, KILICOGLU H, et al. Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation[J]. Journal of Biomedical Informatics, 2009, 42(5):801-813.
[5] WORKMAN T E, HURDLE J F. Dynamic summarization of bibliographic-based data[J]. BMC Medical Informatics and Decision Making, 2011, 11(1):6.
[6] CAMERON D, KAVULURU R, BODENREIDER O, et al. Semantic predications for complex information needs in biomedical literature[C] // 2011 IEEE International Conference on Bioinformatics and Biomedicine(BIBM)Los Alamitos: IEEE Computer Society, 2011: 512-519.
[7] WORKMAN T E, FISZMAN M, HURDLE J F. Text summarization as a decision support aid[J]. BMC Medical Informatics and Decision Making, 2012, 12(1):41.
[8] ZHANG H, FISZMAN M, SHIN D, et al. Clustering cliques for graph-based summarization of the biomedical research literature[J]. BMC Bioinformatics, 2013, 14(1):182.
[9] RINDFLESCH T C, FISZMAN M, LIBBUS B. Semantic interpretation for the biomedical research literature[M] // CHEN H, FULLER WHS, FRIEDMAN C. Medical Informatics: Advances in Knowledge Management and Data Mining in Biomedicine. Springer US, 2005: 399-422.
[10] 商玥, 林鸿飞, 杨志豪. 利用语义关系抽取生成生物医学文摘的算法[J]. 计算机科学与探索, 2011, 5(11):1027-1036. SHANG Yue, LIN Hongfei, YANG Zhihao. Automatic summarization algorithm for biomedical literature based on semantic relation extraction[J]. Journal of Frontiers of Computer Science and Technology, 2011, 5(11):1027-1036.
[11] KULLBACK S, LEIBLER R A. On information and sufficiency[J]. The Annals of Mathematical Statistics, 1951, 22(1):79-86.
[12] COVER T M, THOMAS J A. Elements of information theory[M]. [S.l.] : John Wiley & Sons, 2012.
[13] RILOFF E. Automatically generating extraction patterns from untagged text[C] // Proceedings of the 13th National Conference on Artificial Intelligence and the 8th Znnovative Applications of Artificial Intelligence Conference. [S.l.] : AAAI, 1996: 1044-1049.
[14] KANEHISA M, GOTO S, SATO Y, et al. Data, information, knowledge and principle: back to metabolism in KEGG[J]. Nucleic Acids Research, 2014, 42(D1):D199-D205.
[15] KOTERA M, HIRAKAWA M, TOKIMATSU T, et al. The KEGG databases and tools facilitating omics analysis: latest developments involving human diseases and pharmaceuticals[M] // Next Generation Microarray Bioinformatics: Methods and Protocols. New York: Springer Press, 2012: 19-39.
[16] KLUKAS C, SCHREIBER F. Dynamic exploration and editing of KEGG pathway diagrams[J]. Bioinformatics, 2007, 23(3):344-350.
[1] SU Feng-long, XIE Qing-hua, HUANG Qing-quan, QIU Ji-yuan, YUE Zhen-jun. Semi-supervised method for attribute extraction based on transductive learning [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(3): 111-115.
[2] ZHU Li-ping, LI Hong-qi, YANG Zhong-guo, LIU Qiang. An information extraction method for scientific literature introduction [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 23-30.
[3] WANG Hui, CHEN Guang. Feature extraction method based on Bootstrapping in English product comment [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(12): 23-29.
[4] GUAN Mian, MA Jun. Automatic structured data extraction from Web forums [J]. J4, 2010, 45(5): 42-47.
[5] WANG Jing,YAO Yong,LIU Zhi-jing . Web information extraction based on a generalized hidden Markov model [J]. J4, 2007, 42(11): 49-52 .
[6] WANG Lei,CHEN Zhi-ping,LI Zhi-cheng . Using text blocks based on multiple templates hidden markov model for text information extraction [J]. J4, 2006, 41(3): 19-24 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!