JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2015, Vol. 50 ›› Issue (03): 11-19.doi: 10.6040/j.issn.1671-9352.3.2014.122

Previous Articles     Next Articles

Scientific literature information extraction based on semantic pattern and reference distribution

YANG Zhong-guo1,2, LI Hong-qi1,2, ZHU Li-ping1,2, LIU Qiang1,2   

  1. 1. Beijing Key Lab of Petroleum Data Mining, China University of Petroleum(Beijing), Beijing 102249, China;
    2. College of Geophysics and Information Engineering, China University of Petroleum(Beijing), Beijing 102249, China
  • Received:2014-09-19 Revised:2014-12-31 Online:2015-03-20 Published:2015-03-13

Abstract: In the scientific and technology literature, the review of previous research results, analysis of existing problems, propose solutions and other language fragment are part of the innovation of this information. The logical thinking pattern of problem analysis information in the paper and the discourse relation were analyzed. A utilization of reference distribution, discourse relation characteristics, negative emotional characteristics was made to construct universal semantic pattern of information extraction. The problem analysis information was extracted from the original text by matching the defined semantic pattern. At the same time, the guide words feature and semantic similarity were used to extract the mainly work information from papers. Focusing on the science and technology literature of the data mining field, the proposed method was evaluated by contrasting with the artificial extraction results. The results show that this method can accurately extract the corresponding information, provide the basic data source for clustering of scientific papers and the paper recommends.

Key words: reference distribution, textual relations, negative emotion, guide words, semantic pattern

CLC Number: 

  • TP391
[1] 温有奎,温浩.关键词与创新点词句群分布分析[J]. 情报学报,2007, 26(1):50-55. WEN Youkui, WEN Hao. Sentence group distribution of keywords and innovation idea words[J]. Journal of the China Society for Scientific and Technical Information, 2007, 26(1):50-55.
[2] 温有奎,温浩,徐端颐,等.基于创新点的知识元挖掘[J].情报学报,2005,24(6):663-668. WEN Youkui, WEN Hao, XU Duanyi, et al. Knowledge element mining in knowledge management[J].Journal of the China Society for Scientific and Technical Information, 2005, 24(6):663-668.
[3] 盛杰.期刊编辑对科技论文创新性的把握[J].编辑学报,2011,23(3):215-217. SHENG Jie. Academic innovation controlling of scientific papers by editors[J].Acta Editologica, 2011, 23(3):215-217.
[4] GRISHMAN R. Information extraction: techniques and challenges[R]. New York: New York University Press, 1997.
[5] FRIJTERS R, VAN VUGT M, SMEETS R, et al. Literature mining for the discovery of hidden connections between drugs, genes and diseases[J]. Los Computational Biology, 2010, 6(9):e1000943.
[6] KIM J D, NGUYEN N, WANG Yue, et al.The genia event and protein coreference tasks of the BioNLP shared task 2011[J]. BMC Bioinformatics, 2012, 13(11):S1.1-S1.12.
[7] GARTEN Y, COULET A, ALTMAN R B. Recent progress in automatically extracting information from the pharmacogenomic literature[J].Pharmacogenomics, 2010, 11(10):1467-1489.
[8] ANANIADOU S, PYYSALO S, TSUJII J, et al. Event extraction for systems biology by text mining the literature[J].Trends in Biotechnology, 2010, 28: 381-390.
[9] Chikashi Nobata, Paul D Dobson, Syed A Iqbal, et al. Mining metabolites: extracting the yeast metabolome from the literature[J].Metabolomics, 2011, 7(1):94-101.
[10] 钱伟中,王娟,傅狲,等.融合浅层句法分析的蛋白质互作用信息抽取方法[J].计算机应用研究,2011,28(3):972-975. QIAN Weizhong, WANG Juan, FU Chong, et al. Prote in-protein interaction extraction method using shallow parsing[J].Application Research of Computer,2011,28(3):972-975.
[11] 黄泽武. 基于语义的科技文献共享平台的信息抽取系统[D].武汉: 华中科技大学,2007. HUANG Zewu. Information extraction system in semantic based scientific literature sharing platform[D].Wuhan: Huazhong University of Science and Technology, 2007.
[12] 欧阳辉,禄乐滨.基于证据理论的论文元数据抽取算法研究[J].电子设计工程,2010,18(4):66-69. OUYANG Hui, LU Lebin. Research of paper metadata extraction algorithm based on theory of evidence [J].Electronic Design Engineering, 2010, 18(14):66-69.
[13] 于亮.科技文献的文本特征抽取研究与应用[D].北京: 北京邮电大学,2009. YU Liang. Research and applications on text feathers extraction from science and technical literatures [D]. Beijing: Beijing University of Posts and Telecommunications, 2009.
[14] 倪娜,刘凯,李耀东.科技文献关键词自动标注算法研究[J].计算机科学,2012,39(9):175-179. NI Na, LIU Kai, LI Yaodong. Study of automatic keywords labeling for scientific literature [J].2012,39(9):175-179.
[15] 叶春蕾,冷伏海.基于引文-主题概率模型的科技文献主题识别方法研究[J].情报理论与实践, 2013,9(36):100-103. YE Chunlei, LENG Fuhai. Research on literature topic identification method based on probability model of citation-theme from science and technical literatures[J].Information Studies: Theory and Application, 2013, 9(36):100-103.
[16] 冷伏海,白如江,祝清松.面向科技文献的混合语义信息抽取方法研究[J].图书情报工作,2013,57(11):112-119. LENG Fuhai, BAI Rujiang, ZHU Qingsong. Research on hybrid semantic information extraction methods for science and technology literature[J].Library and Information Service, 2013, 57(11):112-119.
[17] 朱大明.参考文献引证在研究型论文中的分布特征[J].编辑学报,2008,20(6):481-483. ZHU Daming. Distribution of cited references in each part of research papers [J].Acta Editologica, 2008, 20(6):481-483.
[18] 高时阔,黎文丽,郭开选,等.科技论文文体结构所体现的美学特征[J]. 编辑学报,2006,18(3):173-175. GAO Shikuo, LI Wenli, GUO Kaixuan, et al. Aesthetic characteristics of scientific papers [J]. Acta Editologica, 2006, 18(3):481-483.
[19] 陈浩元. 科技书刊标准化18讲[M]. 北京: 北京师范大学出版社, 1998. CHEN Haoyuan. Science and technology periodicals standardization 18 leture[M].Beijing: Beijing Normal University Press, 1998.
[20] 朱大明. 学术论文引言中的参考文献简析[J].编辑学报,2005,17(3):190. ZHU Daming. Analyses of references in introduction part of academic papers[J]. Acta Editologica, 2005, 17(3):190.
[21] 杨江,侯敏,王宁.基于浅层篇章结构的评论文倾向性分析[J].中文信息学报,2011,25(2):83-88. YANG Jiang, HOU Min, WANG Ning. Sentiment polarity analysis of reviews based on shallow text structure[J].Journal of Chinese Information Processing, 2011, 25(2):83-88.
[22] 郭冲,王振宇.面向细粒度意见挖掘的情感本体树及自动构建[J].中文信息学报,2013,27(5):75-82. GUO Chong, WANG Zhenyu. Auto-construct of sentiment ontology tree for fine-grained opinion mining[J]. Journal of Chinese Information Processing, 2013, 27(5):75-82.
[23] 李晓霞. 科技论文引言的撰写[J].商洛师范专科学校学报,2004,18(2):62-64. LI Xiaoxia. The writing of the introduction of scientific papers[J].Journal of Shangluo Teachers College,2004,18(2):62-64.
[24] 邓建元. 科技论文引言的内容与形式[J].编辑学报,2003, 15(5) : 347-348. DENG Jianyuan. Contents and forms of introduction part of academic papers[J].Acta Editologica,2003,15(5):347-348.
[25] PITIER E, RAGHUPATHY M, MEHTA H, et al.Easily identifiable discourse relations[C]// Proceedings of COLING. [S.l.]:DBLP, 2008:87-90.
[26] 郑黎晓,许智武,陈海明.基于文法分支覆盖的短句子生成算法[J].软件学报,2011,22(11):2564-2576. ZHENG Lixiao, XU Zhiwu, CHEN Haiming. Algorithm for generating short sentences from grammars based on branch coverage criterion[J]. Journal of Software, 2011, 22(11),2564-2576.
[27] GABRILOVICH E, MARKOVITCH S. Computing semantic relatedness using Wikipedia-based explicit semantic analysis[C]// Proceedings of the 20th International Joint Conference on Artificial Intelligence. Freiburg: IJCAI, 2007:1606-1611.
[28] 张俊溪,吴晓军.一种新的基于进化计算的聚类算法[J].计算机工程与应用,2011,47(24):111-114. ZHANG Junxi, WU Xiaojun. New clustering algorithm based on evolutionary computation[J].Computer Engineering and Application, 2011, 47(24):111-114.
[1] GONG Shuang-shuang, CHEN Yu-feng, XU Jin-an, ZHANG Yu-jie. Extraction of Chinese multiword expressions based on Web text [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(9): 40-48.
[2] YU Chuan-ming, ZUO Yu-heng, GUO Ya-jing, AN Lu. Dynamic discovery of authors research interest based on the combined topic evolutional model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(9): 23-34.
[3] . Reader emotion classification with news and comments [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(9): 35-39.
[4] . Design and implementation of topic detection in Russian news based on ontology [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(9): 49-54.
[5] LIAO Xiang-wen, ZHANG Ling-ying, WEI Jing-jing, GUI Lin, CHENG Xue-qi, CHEN Guo-long. User influence analysis of social media with temporal characteristics [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 1-12.
[6] YU Chuan-ming, FENG Bo-lin, TIAN Xin, AN Lu. Deep representative learning based sentiment analysis in the cross-lingual environment [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 13-23.
[7] ZHANG Jun, LI Jing-fei, ZHANG Rui, RUAN Xing-mao, ZHANG Shuo. Community detection algorithm based on effective resistance of network [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 24-29.
[8] PANG Bo, LIU Yuan-chao. Fusion of pointwise and deep learning methods for passage ranking [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 30-35.
[9] CHEN Xin, XUE Yun, LU Xin, LI Wan-li, ZHAO Hong-ya, HU Xiao-hui. Text feature extraction method for sentiment analysis based on order-preserving submatrix and frequent sequential pattern mining [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 36-45.
[10] WANG Tong, MA Yan-zhou, YI Mian-zhu. Speech recognition of Russian short instructions based on DTW [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(11): 29-36.
[11] ZHANG Xiao-dong, DONG Wei-guang, TANG Min-an, GUO Jun-feng, LIANG Jin-ping. gOMP reconstruction algorithm based on generalized Jaccard coefficient for compressed sensing [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(11): 23-28.
[12] SUN Jian-dong, GU Xiu-sen, LI Yan, XU Wei-ran. Chinese entity relation extraction algorithms based on COAE2016 datasets [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 7-12.
[13] WANG Kai, HONG Yu, QIU Ying-ying, WANG Jian, YAO Jian-min, ZHOU Guo-dong. Study on boundary detection of users query intents [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 13-18.
[14] ZHANG Fan, LUO Cheng, LIU Yi-qun, ZHANG Min, MA Shao-ping. User preference prediction in heterogeneous search environment [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 26-34.
[15] YANG Yan, XU Bing, YANG Mu-yun, ZHAO Jing-jing. An emotional classification method based on joint deep learning model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 19-25.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!