山东大学学报(理学版) ›› 2015, Vol. 50 ›› Issue (07): 23-30.doi: 10.6040/j.issn.1671-9352.3.2014.307
朱丽萍1,2, 李洪奇1,2, 杨中国1,2, 刘蔷1,2
ZHU Li-ping1,2, LI Hong-qi1,2, YANG Zhong-guo1,2, LIU Qiang1,2
摘要: 分析了引言部分写作模型,将文本按照句子级别划分为背景知识、问题分析、工作描述三个类别。统计每个部分句子的引导词、句型表达、线索词、所处位置的特征,并构建相应规则库。在分词和词性标注基础上,利用规则匹配每个句子得出所属的类别,从而抽取出三个部分的信息。以石油勘探开发类科技文献和数据挖掘类科技文献为例,进行人工判别和本文方法抽取试验,结果表明本文方法能准确获取相应信息。
中图分类号:
[1] GRISHMAN R. Information extraction: techniques and challenges[M]. Berlin, Germany: Springer-Verlag, 1997. [2] AGARWAL S, YU Hong. Automatically classifying sentences in full-text biomedical articles into introduction, methods, results and discussion[J]. Bioinformatics, 2009, 25(23):3174-3180. [3] Su Nam Kim, David Martinez, Lawrence Cavedon, et al. Automatic classification of sentences to support evidence based medicine[J]. BMC Bioinformatics, 12(Suppl 2):S5.1-S5.10. [4] Abeed Sarker, Diego Molla. A rule-based approach for automatic identification of publication types of medical papers[C]//Proceedings of the 15th Australasian Document Computing Symposium.[S.l]:[s.n.], 2011. [5] Patrick Davis-Desmond, Diego Molla. Detection of evidence in clinical research papers[C]//Proceedings of the Australasian Workshop on Health Informatics and Knowledge Management (HIKM). Darlinghurst: Australian Computer Society, 2012: 13-20. [6] IBEKWE-SANJUAN F, CHEN CHAOMEI, PINHO R. Identifying strategic information from scientific articles through sentence classification[C]//Proceedings of the 6th International Conference on Language Resources and Evaluation Conference (LREC-08). Marrakesh, Morocco, 2008: 5. [7] TORRES J A S, CRUZES D S, SALVADOR L N. Automatic results Identification in software engineering papers. Is it possible? [C]//Proceedings of the12th International Conference on Computational Science and Its Applications (ICCSA). Piscataway: IEEE, 2012: 108-122. [8] TORRES J A S. Automatic summarization of software engineering papers to support the systematic review process[D]. Salvador: Salvador University, 2011. [9] 黄泽武. 基于语义的科技文献共享平台的信息抽取系统[D].武汉: 华中科技大学,2007. HUANG Zewu. Information extraction system in semantic based scientific literature sharing platform[D].Wuhan: Huazhong University of Science and Technology, 2007. [10] 于亮.科技文献的文本特征抽取研究与应用[D]. 北京:北京邮电大学, 2009. YU Liang. Research and applications on text feathers extraction from science and technical literatures[D]. Beijing: Beijing University of Posts and Telecommunications, 2009. [11] 何新贵,彭甫阳.中文文本的关键词自动抽取和模糊分类[J].中文信息学报,1998,13(1):10-16 HE Xingui, PENG Puyang. Fuzzy classification and automatic extraction of keywords from Chinese text[J]. Journal of Chinese Information, 1998, 13(1):10-16. [12] 何婷婷, 许婷, 瞿国忠,等.基于主题词对的文档重排方法[J].计算机工程与应用, 2007,43(11):161-163. HE Tingting, XU Ting, QU Guozhong, et al. Re-ranking based on topic word pairs[J].Computer Engineering and Applications, 2007, 43(11):161-163. [13] 侯跃芳, 崔雷, 朱利娜. 应用主题词/副主题词关联规则对专题知识的挖掘分析及评价[J].情报理论与实践, 2008(2):234-236. HOU Yuefang, CUI Lei, ZHU Lina. Analysing and evaluating the thematic knowledge mining using association rules of subject headings or subheadings[J]. Information Studies: Theory and Application, 2008(2):234-236. [14] 温有奎,温浩.关键词与创新点词句群分布分析[J]. 情报学报,2007, 26(1): 50-55. WEN Youkui, WEN Hao. Sentence group distribution of keywords and innovation idea words[J]. Journal of the China Society for Scientific and Technical Information, 2007, 26(1):50-55. [15] 温有奎,温浩,徐端颐,等.基于创新点的知识元挖掘[J].情报学报, 2005, 24(6):663-668. WEN Youkui, WEN Hao, XU Duanyi, et al. Knowledge element mining in knowledge management[J]. Journal of the China Society for Scientific and Technical Information, 2005, 24(6):663-668. [16] 孙荣,周文,刘宗田.用规则抽取句子中事件信息[J].小型微型计算机系统, 2011(11):2309-2314. SUN Rong, ZHOU Wen, LIU Zongtian. Extracting event information using rules from sentences[J]. Journal of Chinese Computer Systems, 2011(11):2309-2314. [17] 唐惠丽,郑小妹.正则表达式的研究及在Web中的应用[J].计算机技术与发展, 2013, 23(2):82-85. TANG Huili, ZHENG Xiaomei. Research of regular expressions and application in Web[J]. Computer Technology and Development, 2013, 23(2):82-85. [18] 冷伏海,白如江,祝清松.面向科技文献的混合语义信息抽取方法研究[J].图书情报工作,2013,57(11):112-119. LENG Fuhai, BAI Rujiang, ZHU Qingsong. Research on hybrid semantic information extraction methods for science and technology literature[J]. Library and Information Service,2013, 57(11):112-119. [19] 李晓霞. 科技论文引言的撰写[J]. 商洛师范专科学校学报, 2004, 18(2):62-64. LI Xiaoxia. The writing of the introduction of scientific papers[J]. Journal of Shangluo Teachers College, 2004, 18(2):62-64. [20] 邓建元. 科技论文引言的内容与形式[J]. 编辑学报,2003, 15(5):347-348. DENG Jianyuan. Contents and forms of introduction part of academic papers[J]. Acta Editologica, 2003, 15(5):347-348. [21] 王小唯,吕雪梅,杨波. 学术论文引言的结构模型化研究[J].编辑学报, 2003(04). WANG Xiaowei, L Xuemei, YANG Bo. Structure modeling research on introduction of scientific papers[J]. Acta Editologica, 2003(04). [22] 朱大明. 学术论文引言中的参考文献简析[J].编辑学报,2005,17(3):190. ZHU Daming. Analyses of references in introduction part of academic papers[J]. Acta Editologica, 2005, 17(3):190. [23] 刘豹, 张桂平, 蔡东风. 基于统计和规则相结合的科技术语自动抽取研究[J]. 计算机工程与应用, 2008,44(23): 147-150. LIU Bao, ZHANG Guiping, CAI Dongfeng. Techical term automatic extraction research based on statistics and rules[J].Computer Engineering and Applications, 2008, 44(23):147-150. [24] 张平,潘保芝,张莹,等.自组织神经网络在火成岩岩性识别中的应用[J].石油物探, 2009, 48(1):54-56. ZHANG Ping, PAN Baozhi, ZHANG Ying, et al. Application of self organizing neural network in lithology identification of igneous rock[J]. Geophysical Prospecting for Petroleum,2009, 48(1):54-56. |
[1] | 苏丰龙,谢庆华,黄清泉,邱继远,岳振军. 基于直推式学习的半监督属性抽取[J]. 山东大学学报(理学版), 2016, 51(3): 111-115. |
[2] | 李智恒,杨志豪,林鸿飞. 基于语义的疾病相关蛋白质知识抽取[J]. 山东大学学报(理学版), 2016, 51(3): 104-110. |
[3] | 王辉, 陈光. 基于Bootstrapping的英文产品评论属性词抽取方法[J]. 山东大学学报(理学版), 2014, 49(12): 23-29. |
[4] | 关冕,马军. 针对Web论坛的一种结构化数据自动抽取方法[J]. J4, 2010, 45(5): 42-47. |
[5] | 王 静,姚 勇,刘志镜 . 基于广义隐马尔可夫模型的网页信息抽取方法[J]. J4, 2007, 42(11): 49-52 . |
[6] | 王 雷,陈治平,李志成 . 基于文本分块的多模板隐马尔可夫模型的文本信息抽取[J]. J4, 2006, 41(3): 19-24 . |
|