您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

J4 ›› 2012, Vol. 47 ›› Issue (5): 68-72.

• 电子技术与信息 • 上一篇    下一篇

基于分类算法的专利摘要文本分割技术

丁长林,蔡东风,王裴岩   

  1. 沈阳航空航天大学知识工程研究中心, 辽宁 沈阳 110136
  • 收稿日期:2011-11-10 出版日期:2012-05-20 发布日期:2012-06-01
  • 作者简介:丁长林(1987- ),女,硕士研究生,研究方向为信息抽取和信息检索.Email:dcl19871208@126.com

Text segmentation of patent summary based on a classification algorithm

DING Chang-lin, CAI Dong-feng, WANG Pei-yan   

  1. Knowledge Engineering Research Center of Shenyang Aerospace University, Shenyang 110136, Liaoning, China
  • Received:2011-11-10 Online:2012-05-20 Published:2012-06-01

摘要:

专利摘要是对专利的浓缩表述,将专利摘要按内容分割后,能更准确地定位对应的专利。由于专利摘要长度较短,而且不同内容间没有明显标志,使其分割不能使用传统的文本分割方法。本文将专利摘要的分割问题转化为句子分类问题,并尝试采用分类算法解决该问题。通过分析不同分类算法以及不同特征对本问题的解决效果,最终验证了利用句子分类方法进行专利摘要分割的可行性。

关键词: 专利摘要;文本分割;句子单元;分类算法;词性

Abstract:

Patent summaries are condensed representation of the patents, and if  patent summaries are divided by using their contents, the corresponding patents will be more accurately positioned. Because the length of each patent summary is too short and there are no signs between two different contents, the traditional text segmentation methods cannot be used. In this paper, the problem of text segmentation of a patent summary was changed into sentence classification, and the classification algorithms attempted to solve the problem. The effects of solving the problem with different classification algorithms and different features were analyzed, and the results proved that the segmentation method of the patent summaries by using the methods of sentence classification is feasible.

Key words: patent summary; text segmentation; sentence unit; classification algorithm; part of speech

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!