J4

• Articles • Previous Articles     Next Articles

The study on automitic classification of digital documents of scientific papers

LI Sen,MA Jun,ZHAO Yan,LEI Jing-sheng   

  1. School of Computer Science and Technology, Shandong Univ., Jinan 250061, Shandong, China;
  • Received:2006-03-29 Revised:1900-01-01 Online:2006-10-24 Published:2006-10-24
  • Contact: LI Sen

Abstract: Abstract: Since scientific papers are usually semistructural documents, a hierarchy classification model based on the metadata of scientific papers is proposed, where the metadata include the titles, keyword sets, abstracts and so on.Experiments show the precision of the classification based on the metadata of papers is close to that of the classification based on the full text of papers. Furthermore, the classification precisions are better than the best known classification algorithm if the papers are classified based on taxonomy of application domains as follows: first, the metadata are used to classify paper roughly based on the higher evels of taxonomy, then full texts are utilized to classify these papers on the lower levels of taxonomy. Since the size of metadata is less than that of full text and the number of papers classified in a subclass is less than that of total number of papers, the new model enhances the efficiency of paper classification when the number of classes is bigger and the documents are distributed averagely in the given taxonomy.

Key words: efficiency , accuracy, hierarchy, text categorization, technical literature

[1] SU Bin-ting, XU Li, FANG He, WANG Feng. Fast authentication mechanism based on Diffie-Hellman for wireless mesh networks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 101-105.
[2] WANG Lei, XIE Jiang-ning. Color constancy using hierarchy segments [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(1): 101-105.
[3] LI Sheng-dong, LÜ Xue-qiang, SUN Jun, SHI Shui-cai. Improvement of Lucene full-text indexing efficiency [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 76-79.
[4] LU Qing-ping1, SU Shou-bao1,2*, YU Shu-hao1,3,4, YANG Liu1. An expanded RBAC model of multi-granularity based on trust hierarchies [J]. J4, 2013, 48(7): 51-55.
[5] YU Li. ε-strongly subdifferential of set-valued mapping and application [J]. J4, 2013, 48(3): 99-105.
[6] LIU Wu-ying, YI Mian-zhu, ZHANG Xing. A space-time-efficient multi-category text categorization algorithm [J]. J4, 2013, 48(11): 99-104.
[7] GUO Xiao-dong1, DU Peng1, ZHANG Xue-fen2. A energy-efficient distributed detection and power allocation algorithm in wireless sensor networks [J]. J4, 2012, 47(9): 60-64.
[8] LI Wei, XU Wen-feng, LI Hong-yu. Study on fuzzy DEA model based on independent subsystems [J]. J4, 2012, 47(9): 78-83.
[9] XIE Wei-qi1,2, LI Xiao-xin1. The hierarchy internal P-relation and internal P-cluster algorithms [J]. J4, 2012, 47(2): 123-126.
[10] MA Yun-yan, LUAN Yi-hui*. Detecting sparse signal segments by local LRS method [J]. J4, 2012, 47(12): 1-5.
[11] NIU Yu-qi1, WANG Fen-ling1, SHI Dong-wei2. Higher accuracy analysis for the bilinear element solution of  nonlinear viscoelasticity type equations [J]. J4, 2011, 46(8): 31-37.
[12] HE Shi-zhu, WANG Ming-wen, ZHOU Jun-jun, SHI Song. Research on large-scale text hierarchies combining relevant category information [J]. J4, 2011, 46(5): 58-62.
[13] JIANG Sheng-yi1, PANG Guan-song2, ZHANG Jian-jun3. Research on spam detection techniques based on clustering [J]. J4, 2011, 46(5): 71-76.
[14] CUI Yu-quan1, MA Li-jie2, ZHAO Jing3, BAI Jin-yan4. Application of DEA method on identifying a portfolio [J]. J4, 2011, 46(2): 82-88.
[15] LI Zhi-yi, LIU Ji-qin*. Two-direction S-probability rough sets and its application [J]. J4, 2011, 46(12): 114-119.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!