J4

• Articles • Previous Articles     Next Articles

Entry page search algorithm based on URLtype prior probabilities

HU Jungang,DONG Shou-bin,CHEN Xiao-zhi,ZHANG Yuan-feng   

  1. Guangdong Key Laboratory of Computer Network, South China University of Technology,
  • Received:2006-03-29 Revised:1900-01-01 Online:2006-10-24 Published:2006-10-24
  • Contact: HU Jungang

Abstract: Entry page (home page) retrieval has the goal to retrieve just one right document, and the queries are usually short Web page names. As a result, finding precisely an entry page with a high initial is quite difficult. According to unigram language model, the authors extract the field of Web page contents for baseline retrieval, which are useful for finding Chinese entry page, and then we build a new model combined contentfield and noncontents features of Web pages (e.g. URLtype prior ,proved to have the strongest predictive power). According to the prior probabilities of URLtype, the relationship between entry page and its subpages is discovered. Based on the relationship, we propose a new algorithm that entry page is extracted from relevant subpages (PERS). At last, we get the result from rerank, and achieve a great advance on performance of entry page retrieval by using PERS.

Key words: information retrieval , URLtype priority, Entry page retrieval

[1] WANG Kai, HONG Yu, QIU Ying-ying, WANG Jian, YAO Jian-min, ZHOU Guo-dong. Study on boundary detection of users query intents [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 13-18.
[2] CAO Rong, HUANG Jin-zhu, YI Mian-zhu. Information retrieval: the final direction of human language technology research in DARPA [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 11-17.
[3] MENG Ye, ZHANG Peng, SONG Da-wei. Study on collection statistics for parameter selection in pseudo relevance feedback [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(7): 18-22.
[4] LI Sheng-dong, LÜ Xue-qiang, SUN Jun, SHI Shui-cai. Improvement of Lucene full-text indexing efficiency [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 76-79.
[5] XU Jie-ping1, YIN Hong-yu1, FAN Zi-wen2. Study on cover songs identification based on phrase content [J]. J4, 2013, 48(7): 68-71.
[6] SUN Jing-yu, CHEN Jun-jie, YU Xue-li, LI Xian-hua. A survey of collaborative Web search [J]. J4, 2011, 46(5): 9-15.
[7] PANG Guan-song, ZHANG Li-sha, JIANG Sheng-yi*, KUANG Li-min, WU Mei-ling. A multi-level clustering approach based on noun phrases for search results [J]. J4, 2010, 45(7): 39-44.
[8] WANG Tai-feng,Yuan Ping-bo,JIA Ji-min,Yu Meng-hai . Portrait retrieval based on news environment [J]. J4, 2006, 41(3): 5-10 .
[9] CAO Ying,WANG Ming-wen,TAO Hong-liang . Information retrieval model based on Markov Network [J]. J4, 2006, 41(3): 126-130 .
[10] WANG Wei-dong,SONG Dan,SONG Ren-jie . Web news retrieval based on splited vector space model [J]. J4, 2006, 41(3): 135-138 .
[11] HE Jing . An approach to generate boolean query in question andanswering retrieval system [J]. J4, 2006, 41(3): 13-17 .
[12] SONG Chun-fang,SHI Bing . An algorithm to cluster the search results basedon the association rules [J]. J4, 2006, 41(3): 61-65 .
[13] GAO Xiang,WANG Min . Applying fuzzy cluster algorithm to Web information retrieval [J]. J4, 2006, 41(3): 11-12 .
[14] WAN Hai-ping,HE Hua-can . Dimensionality reduction based on spectral graph and its application [J]. J4, 2006, 41(3): 58-60 .
[15] FU Xue-feng,LIU Qiu-yun,WANG Ming-wen . Rough sets information retrieval model based on multual information [J]. J4, 2006, 41(3): 116-119 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!