J4 ›› 2010, Vol. 45 ›› Issue (5): 42-47.
• Articles • Previous Articles Next Articles
GUAN Mian, MA Jun
Received:
Online:
Published:
Abstract:
Because of both complex page layout designs and unrestricted user created posts, extracting structured data from Web forum pages is a very challenging task and not easily solved. A general solution to automatically extract structured data from any forum site was proposed. By analyzing page structure, a group of data records were found from both list page and post page, and then a set of production rules was used to extract structured data from these data records. Experimental results showed that the proposed approach significantly outperformed some existing methods in extracting data records and achieved high accuracy in extracting some metadata of Web forums such as title, author, time and content.
Key words: Web forums; structured data; information extraction; Web mining
GUAN Mian, MA Jun. Automatic structured data extraction from Web forums[J].J4, 2010, 45(5): 42-47.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: http://lxbwk.njournal.sdu.edu.cn/EN/
http://lxbwk.njournal.sdu.edu.cn/EN/Y2010/V45/I5/42
Cited