JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2015, Vol. 50 ›› Issue (07): 23-30.doi: 10.6040/j.issn.1671-9352.3.2014.307

Previous Articles     Next Articles

An information extraction method for scientific literature introduction

ZHU Li-ping1,2, LI Hong-qi1,2, YANG Zhong-guo1,2, LIU Qiang1,2   

  1. 1. Beijing Key Lab of Petroleum Data Mining, China University of Petroleum(Beijing), Beijing 102249, China;
    2. College of Geophysics and Information Engineering, China University of Petroleum(Beijing), Beijing 102249, China
  • Received:2014-09-19 Online:2015-07-20 Published:2015-07-31

Abstract: The introduction of the scientific literature could be classified as three categories: background knowledge, problem analysis and work description based on analyses of write model. Each part of the three categories could be depicted by guide words, sentence structure, clue words and sentence position. These features of sentence were used to construct a rule which could distinguish the type of sentences. A rule bank was generated by features extracted from a mount of scientific article sentences. The information of the tree categories could be extracted by simply matching the three types of rules. A text information extraction experiment was studied in the fields of petroleum exploration and data mining,in which the automatically extracted result was compared to human work. The result shows that all three types of information could be extracted effectively.

Key words: scientific literature, clue words, information extraction, background knowledge

CLC Number: 

  • TP391
[1] GRISHMAN R. Information extraction: techniques and challenges[M]. Berlin, Germany: Springer-Verlag, 1997.
[2] AGARWAL S, YU Hong. Automatically classifying sentences in full-text biomedical articles into introduction, methods, results and discussion[J]. Bioinformatics, 2009, 25(23):3174-3180.
[3] Su Nam Kim, David Martinez, Lawrence Cavedon, et al. Automatic classification of sentences to support evidence based medicine[J]. BMC Bioinformatics, 12(Suppl 2):S5.1-S5.10.
[4] Abeed Sarker, Diego Molla. A rule-based approach for automatic identification of publication types of medical papers[C]//Proceedings of the 15th Australasian Document Computing Symposium.[S.l]:[s.n.], 2011.
[5] Patrick Davis-Desmond, Diego Molla. Detection of evidence in clinical research papers[C]//Proceedings of the Australasian Workshop on Health Informatics and Knowledge Management (HIKM). Darlinghurst: Australian Computer Society, 2012: 13-20.
[6] IBEKWE-SANJUAN F, CHEN CHAOMEI, PINHO R. Identifying strategic information from scientific articles through sentence classification[C]//Proceedings of the 6th International Conference on Language Resources and Evaluation Conference (LREC-08). Marrakesh, Morocco, 2008: 5.
[7] TORRES J A S, CRUZES D S, SALVADOR L N. Automatic results Identification in software engineering papers. Is it possible? [C]//Proceedings of the12th International Conference on Computational Science and Its Applications (ICCSA). Piscataway: IEEE, 2012: 108-122.
[8] TORRES J A S. Automatic summarization of software engineering papers to support the systematic review process[D]. Salvador: Salvador University, 2011.
[9] 黄泽武. 基于语义的科技文献共享平台的信息抽取系统[D].武汉: 华中科技大学,2007. HUANG Zewu. Information extraction system in semantic based scientific literature sharing platform[D].Wuhan: Huazhong University of Science and Technology, 2007.
[10] 于亮.科技文献的文本特征抽取研究与应用[D]. 北京:北京邮电大学, 2009. YU Liang. Research and applications on text feathers extraction from science and technical literatures[D]. Beijing: Beijing University of Posts and Telecommunications, 2009.
[11] 何新贵,彭甫阳.中文文本的关键词自动抽取和模糊分类[J].中文信息学报,1998,13(1):10-16 HE Xingui, PENG Puyang. Fuzzy classification and automatic extraction of keywords from Chinese text[J]. Journal of Chinese Information, 1998, 13(1):10-16.
[12] 何婷婷, 许婷, 瞿国忠,等.基于主题词对的文档重排方法[J].计算机工程与应用, 2007,43(11):161-163. HE Tingting, XU Ting, QU Guozhong, et al. Re-ranking based on topic word pairs[J].Computer Engineering and Applications, 2007, 43(11):161-163.
[13] 侯跃芳, 崔雷, 朱利娜. 应用主题词/副主题词关联规则对专题知识的挖掘分析及评价[J].情报理论与实践, 2008(2):234-236. HOU Yuefang, CUI Lei, ZHU Lina. Analysing and evaluating the thematic knowledge mining using association rules of subject headings or subheadings[J]. Information Studies: Theory and Application, 2008(2):234-236.
[14] 温有奎,温浩.关键词与创新点词句群分布分析[J]. 情报学报,2007, 26(1): 50-55. WEN Youkui, WEN Hao. Sentence group distribution of keywords and innovation idea words[J]. Journal of the China Society for Scientific and Technical Information, 2007, 26(1):50-55.
[15] 温有奎,温浩,徐端颐,等.基于创新点的知识元挖掘[J].情报学报, 2005, 24(6):663-668. WEN Youkui, WEN Hao, XU Duanyi, et al. Knowledge element mining in knowledge management[J]. Journal of the China Society for Scientific and Technical Information, 2005, 24(6):663-668.
[16] 孙荣,周文,刘宗田.用规则抽取句子中事件信息[J].小型微型计算机系统, 2011(11):2309-2314. SUN Rong, ZHOU Wen, LIU Zongtian. Extracting event information using rules from sentences[J]. Journal of Chinese Computer Systems, 2011(11):2309-2314.
[17] 唐惠丽,郑小妹.正则表达式的研究及在Web中的应用[J].计算机技术与发展, 2013, 23(2):82-85. TANG Huili, ZHENG Xiaomei. Research of regular expressions and application in Web[J]. Computer Technology and Development, 2013, 23(2):82-85.
[18] 冷伏海,白如江,祝清松.面向科技文献的混合语义信息抽取方法研究[J].图书情报工作,2013,57(11):112-119. LENG Fuhai, BAI Rujiang, ZHU Qingsong. Research on hybrid semantic information extraction methods for science and technology literature[J]. Library and Information Service,2013, 57(11):112-119.
[19] 李晓霞. 科技论文引言的撰写[J]. 商洛师范专科学校学报, 2004, 18(2):62-64. LI Xiaoxia. The writing of the introduction of scientific papers[J]. Journal of Shangluo Teachers College, 2004, 18(2):62-64.
[20] 邓建元. 科技论文引言的内容与形式[J]. 编辑学报,2003, 15(5):347-348. DENG Jianyuan. Contents and forms of introduction part of academic papers[J]. Acta Editologica, 2003, 15(5):347-348.
[21] 王小唯,吕雪梅,杨波. 学术论文引言的结构模型化研究[J].编辑学报, 2003(04). WANG Xiaowei, L Xuemei, YANG Bo. Structure modeling research on introduction of scientific papers[J]. Acta Editologica, 2003(04).
[22] 朱大明. 学术论文引言中的参考文献简析[J].编辑学报,2005,17(3):190. ZHU Daming. Analyses of references in introduction part of academic papers[J]. Acta Editologica, 2005, 17(3):190.
[23] 刘豹, 张桂平, 蔡东风. 基于统计和规则相结合的科技术语自动抽取研究[J]. 计算机工程与应用, 2008,44(23): 147-150. LIU Bao, ZHANG Guiping, CAI Dongfeng. Techical term automatic extraction research based on statistics and rules[J].Computer Engineering and Applications, 2008, 44(23):147-150.
[24] 张平,潘保芝,张莹,等.自组织神经网络在火成岩岩性识别中的应用[J].石油物探, 2009, 48(1):54-56. ZHANG Ping, PAN Baozhi, ZHANG Ying, et al. Application of self organizing neural network in lithology identification of igneous rock[J]. Geophysical Prospecting for Petroleum,2009, 48(1):54-56.
[1] SU Feng-long, XIE Qing-hua, HUANG Qing-quan, QIU Ji-yuan, YUE Zhen-jun. Semi-supervised method for attribute extraction based on transductive learning [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(3): 111-115.
[2] LI Zhi-heng, YANG Zhi-hao, LIN Hong-fei. Semantic output output-based disease-protein knowledge extraction [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(3): 104-110.
[3] WANG Hui, CHEN Guang. Feature extraction method based on Bootstrapping in English product comment [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(12): 23-29.
[4] GUAN Mian, MA Jun. Automatic structured data extraction from Web forums [J]. J4, 2010, 45(5): 42-47.
[5] WANG Jing,YAO Yong,LIU Zhi-jing . Web information extraction based on a generalized hidden Markov model [J]. J4, 2007, 42(11): 49-52 .
[6] WANG Lei,CHEN Zhi-ping,LI Zhi-cheng . Using text blocks based on multiple templates hidden markov model for text information extraction [J]. J4, 2006, 41(3): 19-24 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!