您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2018, Vol. 53 ›› Issue (3): 54-62.doi: 10.6040/j.issn.1671-9352.1.2017.040

• • 上一篇    下一篇

基于单指令级并行的快速求交算法

宋省身1,杨岳湘1,江宇2   

  1. 1. 国防科学技术大学计算机学院, 湖南 长沙 410000;2. 西北核技术研究所, 陕西 西安 710024
  • 收稿日期:2017-07-04 出版日期:2018-03-20 发布日期:2018-03-13
  • 作者简介:宋省身(1990— ),男,博士研究生,研究方向为搜索引擎数据结构设计与查询效率优化. E-mail:songxingshen@nudt.edu.cn
  • 基金资助:
    湖南省自然科学基金资助项目(2016JJ2007)

Efficient multiple sets intersection using SIMD instructions

SONG Xing-shen1, YANG Yue-xiang1, JIANG Yu2   

  1. 1. College of Computer, National University of Defense Technology, Changsha 410000, Hunan, China;
    2. Northwest Institute of Nuclear Technology, Xian 710024, Shaanxi, China
  • Received:2017-07-04 Online:2018-03-20 Published:2018-03-13

摘要: 布尔查询中的求交操作被广泛应用于各种信息系统中,是进行文档检索的基本操作之一。其基本形式可以视作多个有序整数序列的交集问题,而提高求交运算的效率是当前研究的重点。在传统求交算法的基础上,利用单指令多数据流(single instruction multiple data, SIMD)并行指令集,针对其核心的搜索步骤,提出了两种基于SIMD的跳跃式搜索算法。该算法在提高性能的同时,能有效适配在传统多倒排链求交算法中。实验证明,优化后的算法相比未使用SIMD的情况下有了很大的提升,甚至优于SIMD优化后的两两相交算法,性能最高提升37.3%。

关键词: 求交算法, 并行处理, 倒排索引, 性能评价

Abstract: Conjunctive Boolean query is one fundamental operation for document retrieval and widely used in many information systems and databases. In its most basic and popular form, a conjunctive query can be seen as the intersection problem of multiple sets of sorted integers, and how to improve its efficiency is becoming one important research highlight. Based on the traditional intersection algorithms, this paper proposes two optimizations on the essential searching algorithms using SIMD instructions. The optimized search algorithms are able to be adopted into various multiple sets intersection methods while improving intersection efficiency. Experiments show that the optimized algorithms performs much better than the traditional ones, even outperform the recent SIMD intersection algorithms,and the improvement is up to 37.3% at most.

Key words: inverted index, vectorized processing, performance evaluation, set intersection

中图分类号: 

  • TP301
[1] CULPEPPER J S, MOFFAT A. Efficient set intersection for inverted indexing[J]. ACM Transactions on Information Systems, 2010, 29(1):1-25.
[2] ZOBEL J, MOFFAT A. Inverted files for text search engines[J]. ACM Computing Surveys, 2006, 38(2):6.
[3] BOŽA V. Experimental comparison of set intersection algorithms for inverted indexing[C/OL] // ITAT Proceedings, CEUR Workshop Proceedings. 2013, 1003: 58-64.[2017-02-05].http://www.ceur-ws.org/Vol-1003/58.pdf
[4] SANDERS P, TRANSIER F. Intersection ininteger inverted indices[C] //Proceedings of the 9th Workshop on Algorithm Engineering and Experiments/4th Workshop on Analytic Algorithmics and Combinatorics.Philadelphia:SIAM, 2007, Article 71.
[5] BAEZA-YATES R. Afast set intersection algorithm for sorted sequences[C] // Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching. Berlin: Springer-Verlag, 2004:400-408.
[6] JÉRÉMY BARBAY, LU T, SALINGER A. An experimental investigation of set intersection algorithms for text searching [J]. Journal of Experimental Algorithmics, 2009, 14:37. DOI:10.1145/1498698.1564507.
[7] BLELLOCHG E, REID-MILLER M. Fast set operations using treaps[C] // Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures. New York: ACM, 1998:16-26.
[8] NAVARROG, PUGLISI S J. Dual-sorted inverted lists[C] // Proceedings of the 17th International Conference on String Processing and Information Retrieval. Berlin: Springer-Verlag, 2010:309-321.
[9] CULPEPPER J S, MOFFAT A. Compact set representation for information retrieval[C] //Proceedings of the 14th International Symposium on String Processing and Information Retrieval. Berlin: Springer-Verlag, 2007:137-148.
[10] DING B. Fast set intersection in memory[J]. Proceedings of the VLDB Endowment, 2011, 4(4):255-266.
[11] AO N, ZHANG F, WU D, et al. Efficient parallel lists intersection and index compression algorithms using graphics processing units[J]. Proceedings of the Vldb Endowment, 2011, 4(8):470-481.
[12] TATIKONDAS, JUNQUEIRA F, CAMBAZOGLU B B, et al. On efficient posting list intersection with multicore processors[C] // Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2009:738-739.
[13] TAKUMA D, YANAGISAWA H. Faster upper bounding of intersection sizes[C] //International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013:703-712.
[14] 闫宏飞, 张旭东, 单栋栋,等. 基于指令级并行的倒排索引压缩算法[J]. 计算机研究与发展, 2015, 52(5):995-1004. YAN Hongfei, ZHANG Xudong, SHAN Dongdong.SIMD-based inverted index compression algorithms[J]. Journal of Computer Research and Development, 2015, 52(5):995-1004.
[15] DEMAINE E D, LÓPEZ-ORTIZ A, MUNRO J I. Adaptive set intersections, unions, and differences[C] // Proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms. New York: ACM, 2000:743-752.
[16] DEMAINE E D, LÓPEZ-ORTIZ A, MUNRO J I. Experiments on adaptive set intersections for text retrieval systems[C] //Proceedings of Revised Papers from the 3rd International Workshop on Algorithm Engineering and Experimentation. London: Springer-Verlag, 2001:91-104.
[17] BARBAY J, PEZ-ORTIZ A, LU T. Faster adaptive set intersections for text searching[C] //Proceedings of the 5th International Workshop on Experimental Algorithms(WEA 2006). Berlin: Springer-Verlag, 2006: 146-157.
[18] SCHLEGEL B, GEMULLA R, LEHNER W. Fast integer compression using SIMD instructions[C] // Proceedings of the 6th International Workshop on Data Management on New Hardware(DaMoN'10).New York: ACM, 2010:34-40.
[19] LEMIRE D, BOYTSOV L, KURZ N. SIMD compression and the intersection of sorted integers[J]. Software Practice & Experience, 2014, 46(6):723-749.
[20] INOUE H, OHARA M, TAURA K. Faster set intersection with SIMD instructions by reducing branch mispredictions[J]. Proceedings of the Vldb Endowment, 2014, 8(3):293-304.
[1] 毛福林, 瞿有利. 一种变长编码压缩倒排索引算法[J]. 山东大学学报(理学版), 2014, 49(12): 30-35.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!