J4

• Articles • Previous Articles     Next Articles

Using text blocks based on multiple templates hidden markov model for text information extraction

WANG Lei,CHEN Zhi-ping,LI Zhi-cheng   

  1. 1. Department of Computer & Information Science, Fujian University of Technology, Fuzhou 350014, Fujian, China;
  • Received:2006-04-01 Revised:1900-01-01 Online:2006-10-24 Published:2006-10-24
  • Contact: WANG Lei

Abstract: Since varied training data sources are not profitable for the learning of optimal model parameters, then a novel text information extraction algorithm based on hidden Markov model with multiple templates is proposed, which makes use of the information of format and list separators to segment text, and then extracts text information through combining theparameters of releasing probability for universal training, using multiple form templates to train the parameters of initial probability and transition probability for hidden Markov mode. Experimental results show better performance in precision and recall over simple hidden Markov model.

Key words: text block , multiple templates, hidden markov model, text information extraction

[1] WANG Jing,YAO Yong,LIU Zhi-jing . Web information extraction based on a generalized hidden Markov model [J]. J4, 2007, 42(11): 49-52 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!