JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2016, Vol. 51 ›› Issue (7): 23-29.doi: 10.6040/j.issn.1671-9352.1.2015.069

Previous Articles     Next Articles

An ontology-based readability model for vertical search

ZHANG Wen-ya, SONG Da-wei*, ZHANG Peng   

  1. School of Computer Science and Technology, Tianjin University, Tianjin 300072, China
  • Received:2015-11-14 Online:2016-07-20 Published:2016-07-27

Abstract: As an emerging evaluation criteria of information retrieval(IR), readability plays an important role in accessing documents relevance, utility and quality. How to provide different users with relevant and readable documents has been an urgent problem in vertical search. In order to solve this problem, we propose a new ontology-based readability method. Based on users’ reading process, we measure documents readability from surface and conceptual levels. In this model, three readability indicator shave been introduced, i.e., Concept Topography, Concept Scope and Document Coherence. Specifically, the readability of a document that computed by individual or combined indicators can be used to re-rank the initial lists of documents which are returned by a conventional search engine. In medical domain, the user-oriented evaluations show that our model has good correlation with humans’ judgments in readability prediction. And our model is also competitive compared with one of the state-of-the-artreadability models in system-orient edevaluation.

Key words: readability, documents re-ranking, vertical search

CLC Number: 

  • TP393
[1] KIM J Y, COLLINS-THOMPSON K, BENNETT P N, et al. Characterizing web content, user interests, and searchbehavior by reading level and topic[C] // Proceedings of the 5 ACM International Conference on Web Search and Data Mining. New York: ACM, 2012: 213-222.
[2] ZHANG Y, ZHANG J, LEASE M, et al. Multidimensional relevance modeling via psychometrics and crowdsourcing[C] // Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. New York: ACM, 2014:435-444.
[3] ZUCCON G, KOOPMAN B. Integrating understandability in the evaluation of consumer health search engines[C] // Proceedings of the SIGIR Workshop on Medical Information Retrieval. New York: MedIR@SIGIR, 2014: 32-35.
[4] BENDERSKY M, CROFT W B, DIAO Y. Quality-biased ranking of web documents[C] // Proceedings of the 4 ACM International Conference on Web Search and Data Mining. New York: ACM, 2011: 95-104.
[5] YILMAZ E, VERMA M, CRASWELL N, et al. Relevance and effort: an analysis of document utility[C] // Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. New York: ACM, 2014: 91-100.
[6] TENENBAUM J B, KEMP C, GRIFFITHS T L, et al. How to grow a mind: statistics, structure, and abstraction[J]. Science, 2011, 331(6022):1279-1285.
[7] CHALL J S, DALE E. Readability revisited: the new Dale-Chall readability formula[M]. Cambridge: Massachusetts: Brookline Books, 1995.
[8] SCHWARM S E, OSTENDORFM. Reading level assessment using support vector machines and statistical language models[C] // Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics. New York: ACM, 2005: 523-530.
[9] PETERSEN S E, OSTENDORF M. A machine learning approach to reading level assessment[J]. Computer Speech & Language, 2009, 23(1):89-106.
[10] CROSSLEY S A, DUFTY D F, MCCARTHY P M, et al. Toward a new readability: a mixed model approach[C] // Proceedings of the 29th Annual Conference of the Cognitive Science Society. New York: ACM, 2007: 197-202.
[11] PITLER E, NENKOVA A. Revisiting readability: a unified framework for predicting text quality[C] // Proceedings of the Conference on Empirical Methods in Natural Language Processing Association for Computational Linguistics. New York: ACM, 2008: 186-195.
[12] HEILMAN M J, COLLINS-THOMPSON K, CALLAN J, et al. Combining lexical and grammatical features to improve readability measures for first and second language texts[J]. Proceedings of NAACL HLT.[S.l.] :[s.n.] , 2007: 460-467.
[13] KIM H, GORYACHEV S, ROSEMBLAT G, et al. Beyond surface characteristics: a new health text-specific readability measurement[J]. AMIA Annual Symposium Proceedings. American Medical Informatics Association, 2007, 2007: 418.
[14] SHOAIB J, QIAN X, LAM W. N-gram fragment sequence based unsupervised domain-specific document readability[J]. Proceedings of COLING.[S.l.] :[s.n.] , 2012: 1309-1326.
[15] YAN X, LAU R Y K, SONG D, et al. Toward a semantic granularity model for domain-specific information retrieval[J]. ACM Transactions on Information Systems(TOIS), 2011, 29(3): 15. DOI: 10.1145/1993036.1993039.
[16] YAN X, SONG D, LI X. Concept-based document readability in domain specific information retrieval[C] // Proceedings of the 15th ACM International Conference on Information and Knowledge Management. New York: ACM, 2006: 540-549.
[1] LI Yan-ping, QI Yan-jiao, ZHANG Kai, WEI Xu-guang. Multi-authority and revocable attribute-based encryption scheme [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(7): 75-84.
[2] ZHANG Guang-zhi, CAI Shao-bin, MA Chun-hua, ZHANG Dong-qiu. Application of maximum distance separable codes in the error correction of the network coding [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(1): 75-82.
[3] LI Yang, CHENG Xiong, TONG Yan, CHEN Wei, QIN Tao, ZHANG Jian, XU Ming-di. Method for threaten users mining based on traffic statistic characteristics [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(1): 83-88.
[4] ZHAO Guang-yuan, QIN Feng-lin, GUO Xiao-dong. Design and implementation of P2P-based network measurement cloud platform [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(12): 104-110.
[5] HUANG Shu-qin, XU Yong, WANG Ping-shui. User similarity calculation method based on probabilistic matrix factorization and its recommendation application [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(11): 37-43.
[6] WANG Ya-qi, WANG Jing. Rumor spreading on dynamic complex networks with curious psychological mechanism [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(6): 99-104.
[7] CHEN Guang-rui, CHEN Xing-shu, WANG Yi-tong, GE Long. A software update mechanism for virtual machines in IaaS multi-tenant environment [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(3): 60-67.
[8] ZHUANG Zheng-mao, CHEN Xing-shu, SHAO Guo-lin, YE Xiao-ming. A time-relevant network traffic anomaly detection approach [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(3): 68-73.
[9] SONG Yuan-zhang, LI Hong-yu, CHEN Yuan, WANG Jun-jie. P2P botnet detection method based on fractal and adaptive data fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(3): 74-81.
[10] ZHU Sheng, ZHOU Bin, ZHU Xiang. EIP: discovering influential bloggers by user similarity and topic timeliness [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 113-120.
[11] . An approach of detecting LDoS attacks based on the euclidean distance of available bandwidth in cloud computing [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 92-100.
[12] LI Yu-xi, WANG Kai-xuan, LIN Mu-qing, ZHOU Fu-cai. A P2P network privacy protection system based on anonymous broadcast encryption scheme [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 84-91.
[13] SU Bin-ting, XU Li, FANG He, WANG Feng. Fast authentication mechanism based on Diffie-Hellman for wireless mesh networks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 101-105.
[14] LIN Li. News event extraction based on kernel dependency graph [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 121-126.
[15] . Construction of expert relationship network based on random walk strategy [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(7): 30-34.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!