J4 ›› 2010, Vol. 45 ›› Issue (5): 42-47.

• Articles • Previous Articles     Next Articles

Automatic structured data extraction from Web forums

GUAN Mian, MA Jun   

  1. School of Computer Science and Technology, Shandong University, Jinan 250101, Shandong, China
  • Received:2009-09-26 Online:2010-05-16 Published:2010-05-24

Abstract:

Because of both complex page layout designs and unrestricted user created posts, extracting structured data from Web forum pages is a very challenging task and not easily solved.  A general solution to automatically extract structured data from any forum site was proposed. By analyzing page structure, a group of data records were found from both list page and post page, and then a set of production rules was used to extract structured data from these data records. Experimental results showed that the proposed approach significantly outperformed some existing methods in extracting data records and achieved high accuracy in extracting some metadata of Web forums such as title, author, time and content.

Key words: Web forums; structured data; information extraction; Web mining

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!