JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2015, Vol. 50 ›› Issue (03): 11-19.doi: 10.6040/j.issn.1671-9352.3.2014.122

Previous Articles     Next Articles

Scientific literature information extraction based on semantic pattern and reference distribution

YANG Zhong-guo1,2, LI Hong-qi1,2, ZHU Li-ping1,2, LIU Qiang1,2   

  1. 1. Beijing Key Lab of Petroleum Data Mining, China University of Petroleum(Beijing), Beijing 102249, China;
    2. College of Geophysics and Information Engineering, China University of Petroleum(Beijing), Beijing 102249, China
  • Received:2014-09-19 Revised:2014-12-31 Online:2015-03-20 Published:2015-03-13

Abstract: In the scientific and technology literature, the review of previous research results, analysis of existing problems, propose solutions and other language fragment are part of the innovation of this information. The logical thinking pattern of problem analysis information in the paper and the discourse relation were analyzed. A utilization of reference distribution, discourse relation characteristics, negative emotional characteristics was made to construct universal semantic pattern of information extraction. The problem analysis information was extracted from the original text by matching the defined semantic pattern. At the same time, the guide words feature and semantic similarity were used to extract the mainly work information from papers. Focusing on the science and technology literature of the data mining field, the proposed method was evaluated by contrasting with the artificial extraction results. The results show that this method can accurately extract the corresponding information, provide the basic data source for clustering of scientific papers and the paper recommends.

Key words: reference distribution, textual relations, negative emotion, guide words, semantic pattern

CLC Number: 

  • TP391
[1] 温有奎,温浩.关键词与创新点词句群分布分析[J]. 情报学报,2007, 26(1):50-55. WEN Youkui, WEN Hao. Sentence group distribution of keywords and innovation idea words[J]. Journal of the China Society for Scientific and Technical Information, 2007, 26(1):50-55.
[2] 温有奎,温浩,徐端颐,等.基于创新点的知识元挖掘[J].情报学报,2005,24(6):663-668. WEN Youkui, WEN Hao, XU Duanyi, et al. Knowledge element mining in knowledge management[J].Journal of the China Society for Scientific and Technical Information, 2005, 24(6):663-668.
[3] 盛杰.期刊编辑对科技论文创新性的把握[J].编辑学报,2011,23(3):215-217. SHENG Jie. Academic innovation controlling of scientific papers by editors[J].Acta Editologica, 2011, 23(3):215-217.
[4] GRISHMAN R. Information extraction: techniques and challenges[R]. New York: New York University Press, 1997.
[5] FRIJTERS R, VAN VUGT M, SMEETS R, et al. Literature mining for the discovery of hidden connections between drugs, genes and diseases[J]. Los Computational Biology, 2010, 6(9):e1000943.
[6] KIM J D, NGUYEN N, WANG Yue, et al.The genia event and protein coreference tasks of the BioNLP shared task 2011[J]. BMC Bioinformatics, 2012, 13(11):S1.1-S1.12.
[7] GARTEN Y, COULET A, ALTMAN R B. Recent progress in automatically extracting information from the pharmacogenomic literature[J].Pharmacogenomics, 2010, 11(10):1467-1489.
[8] ANANIADOU S, PYYSALO S, TSUJII J, et al. Event extraction for systems biology by text mining the literature[J].Trends in Biotechnology, 2010, 28: 381-390.
[9] Chikashi Nobata, Paul D Dobson, Syed A Iqbal, et al. Mining metabolites: extracting the yeast metabolome from the literature[J].Metabolomics, 2011, 7(1):94-101.
[10] 钱伟中,王娟,傅狲,等.融合浅层句法分析的蛋白质互作用信息抽取方法[J].计算机应用研究,2011,28(3):972-975. QIAN Weizhong, WANG Juan, FU Chong, et al. Prote in-protein interaction extraction method using shallow parsing[J].Application Research of Computer,2011,28(3):972-975.
[11] 黄泽武. 基于语义的科技文献共享平台的信息抽取系统[D].武汉: 华中科技大学,2007. HUANG Zewu. Information extraction system in semantic based scientific literature sharing platform[D].Wuhan: Huazhong University of Science and Technology, 2007.
[12] 欧阳辉,禄乐滨.基于证据理论的论文元数据抽取算法研究[J].电子设计工程,2010,18(4):66-69. OUYANG Hui, LU Lebin. Research of paper metadata extraction algorithm based on theory of evidence [J].Electronic Design Engineering, 2010, 18(14):66-69.
[13] 于亮.科技文献的文本特征抽取研究与应用[D].北京: 北京邮电大学,2009. YU Liang. Research and applications on text feathers extraction from science and technical literatures [D]. Beijing: Beijing University of Posts and Telecommunications, 2009.
[14] 倪娜,刘凯,李耀东.科技文献关键词自动标注算法研究[J].计算机科学,2012,39(9):175-179. NI Na, LIU Kai, LI Yaodong. Study of automatic keywords labeling for scientific literature [J].2012,39(9):175-179.
[15] 叶春蕾,冷伏海.基于引文-主题概率模型的科技文献主题识别方法研究[J].情报理论与实践, 2013,9(36):100-103. YE Chunlei, LENG Fuhai. Research on literature topic identification method based on probability model of citation-theme from science and technical literatures[J].Information Studies: Theory and Application, 2013, 9(36):100-103.
[16] 冷伏海,白如江,祝清松.面向科技文献的混合语义信息抽取方法研究[J].图书情报工作,2013,57(11):112-119. LENG Fuhai, BAI Rujiang, ZHU Qingsong. Research on hybrid semantic information extraction methods for science and technology literature[J].Library and Information Service, 2013, 57(11):112-119.
[17] 朱大明.参考文献引证在研究型论文中的分布特征[J].编辑学报,2008,20(6):481-483. ZHU Daming. Distribution of cited references in each part of research papers [J].Acta Editologica, 2008, 20(6):481-483.
[18] 高时阔,黎文丽,郭开选,等.科技论文文体结构所体现的美学特征[J]. 编辑学报,2006,18(3):173-175. GAO Shikuo, LI Wenli, GUO Kaixuan, et al. Aesthetic characteristics of scientific papers [J]. Acta Editologica, 2006, 18(3):481-483.
[19] 陈浩元. 科技书刊标准化18讲[M]. 北京: 北京师范大学出版社, 1998. CHEN Haoyuan. Science and technology periodicals standardization 18 leture[M].Beijing: Beijing Normal University Press, 1998.
[20] 朱大明. 学术论文引言中的参考文献简析[J].编辑学报,2005,17(3):190. ZHU Daming. Analyses of references in introduction part of academic papers[J]. Acta Editologica, 2005, 17(3):190.
[21] 杨江,侯敏,王宁.基于浅层篇章结构的评论文倾向性分析[J].中文信息学报,2011,25(2):83-88. YANG Jiang, HOU Min, WANG Ning. Sentiment polarity analysis of reviews based on shallow text structure[J].Journal of Chinese Information Processing, 2011, 25(2):83-88.
[22] 郭冲,王振宇.面向细粒度意见挖掘的情感本体树及自动构建[J].中文信息学报,2013,27(5):75-82. GUO Chong, WANG Zhenyu. Auto-construct of sentiment ontology tree for fine-grained opinion mining[J]. Journal of Chinese Information Processing, 2013, 27(5):75-82.
[23] 李晓霞. 科技论文引言的撰写[J].商洛师范专科学校学报,2004,18(2):62-64. LI Xiaoxia. The writing of the introduction of scientific papers[J].Journal of Shangluo Teachers College,2004,18(2):62-64.
[24] 邓建元. 科技论文引言的内容与形式[J].编辑学报,2003, 15(5) : 347-348. DENG Jianyuan. Contents and forms of introduction part of academic papers[J].Acta Editologica,2003,15(5):347-348.
[25] PITIER E, RAGHUPATHY M, MEHTA H, et al.Easily identifiable discourse relations[C]// Proceedings of COLING. [S.l.]:DBLP, 2008:87-90.
[26] 郑黎晓,许智武,陈海明.基于文法分支覆盖的短句子生成算法[J].软件学报,2011,22(11):2564-2576. ZHENG Lixiao, XU Zhiwu, CHEN Haiming. Algorithm for generating short sentences from grammars based on branch coverage criterion[J]. Journal of Software, 2011, 22(11),2564-2576.
[27] GABRILOVICH E, MARKOVITCH S. Computing semantic relatedness using Wikipedia-based explicit semantic analysis[C]// Proceedings of the 20th International Joint Conference on Artificial Intelligence. Freiburg: IJCAI, 2007:1606-1611.
[28] 张俊溪,吴晓军.一种新的基于进化计算的聚类算法[J].计算机工程与应用,2011,47(24):111-114. ZHANG Junxi, WU Xiaojun. New clustering algorithm based on evolutionary computation[J].Computer Engineering and Application, 2011, 47(24):111-114.
[1] ZHANG Xiaoyuan, TIAN Yi, REN Zihan, DUAN Tianyu, YANG Siyuan, ZHANG Yuexuan. Application of topology neighborhood bases in density clustering algorithm [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(5): 55-64.
[2] . Based on multi-scale feature fusion and improved attention for rusty bolt and nut detection [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 1-14.
[3] ZHONG Shang, MA Li, LIU Wenzhe, LI Yuhao. Lightweight water surface small object detection model with multi-scale attention mechanism and improved feature fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 15-25.
[4] YU Lei, SUN Yi, HUA Jinming, LI Laquan. Analysis of the prediction model based on deep neural networks for mortality risk prediction for sepsis patients in intensive care units [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 26-35.
[5] . Fuzzy mathematical morphology edge detection method derived from general overlap functions [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 36-48.
[6] . Fuzzy rough c-means based on the knowledge measure [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 49-64.
[7] SUN Qing, YE Jun, ZENG Guangcai, SONG Suyang, WANG Yixin. Three-way K-means algorithm combining the bat algorithm and the improved compactness [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 65-75.
[8] ZOU Zheng, LEI Yusheng, LIU Shijian, WANG Dingyi, QIU Xuewei, SHI Wenwen, ZHOU Xiaotong. Precise morphological recognition with zonal micro-direction for termites [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 76-84.
[9] Xia LIANG,Jie GUO. A method of online teaching platform selection based on online reviews [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(9): 108-118.
[10] Chao LI,Wei LIAO. Chinese disease text classification model driven by medical knowledge [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 122-130.
[11] Jie JI,Chengjie SUN,Lili SHAN,Boyue SHANG,Lei LIN. A prompt learning approach for telecom network fraud case classification [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 113-121.
[12] Qi LUO,Gang GOU. Multimodal conversation emotion recognition based on clustering and group normalization [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 105-112.
[13] Fengxu ZHAO,Jian WANG,Yuan LIN,Hongfei LIN. Probability distribution optimization model for learning to rank [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 95-104.
[14] Xingyu HUANG,Mingyu ZHAO,Ziyu LYU. Category-wise knowledge probers for representation learning of graph neural networks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 85-94.
[15] Liang GUI,Yao XU,Shizhu HE,Yuanzhe ZHANG,Kang LIU,Jun ZHAO. Factual error detection in knowledge graphs based on dynamic neighbor selection [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 76-84.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!