您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2014, Vol. 49 ›› Issue (09): 50-55.doi: 10.6040/j.issn.1671-9352.2.2014.436

• 论文 • 上一篇    下一篇

基于Remus的双机热备份优化机制研究

邹德清1, 向军1, 张晓旭2, 苑博阳2, 冯明路2   

  1. 1. 华中科技大学计算机科学与技术学院, 湖北 武汉 430074;
    2. 中电华通通信有限公司, 北京 100022
  • 收稿日期:2014-06-24 修回日期:2014-08-27 出版日期:2014-09-20 发布日期:2014-09-30
  • 作者简介:邹德清(1975-),男,教授,博士,研究方向为系统安全和容错计算.E-mail:deqingzou@hust.edu.cn
  • 基金资助:
    国家高技术研究发展计划项目(2012AA012600)

Optimization research of hot standy with Remus

ZOU De-qing1, XIANG Jun1, ZHANG Xiao-xu2, YUAN Bo-yang2, FENG Ming-lu2   

  1. 1. School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China;
    2. CECT-China COMM Communications Co., Ltd, Beijing 100022, China
  • Received:2014-06-24 Revised:2014-08-27 Online:2014-09-20 Published:2014-09-30

摘要: 双机热备技术是虚拟化系统提高其高可用性的一种可靠的解决方案。双机热备技术一般都采用不断生成检查点的方法,将主节点上实时的状态更新传送到备份节点上面以实现主备份节点之间的状态同步。该系统在热备过程中如果遇到内存密集型应用时,传统的检查点技术会造成带宽的延迟,进而影响虚拟机双机热备的高可用性,或者在热备过程中心跳线失效而导致系统对主备虚拟机状态发生误判,从而导致系统不能正常运行。基于Remus系统的双机热备方式,提出了两种优化方案:第一是增量检查点压缩机制,第二是客户端辅助判断主虚拟机状态机制。实验测试表明,基于XOR-RLE的增量检查点压缩算法有效降低了内存密集型应用导致的带宽延迟,同时也验证了面向客户端的热备机制可大幅减少热备过程中系统误判的产生。

关键词: 内存密集型, 双机热备, 高可用性, 虚拟机, 增量检查点压缩

Abstract: Hot Standby is a reliable solution for virtualization system to increase it's high availability.To achieve state synchronization and data synchronization between host node and backup node, traditional hot standby technology constantly generate checkpoint, send real-time status updates of host virtual machine to backup virtual machine. However, traditional checkpoint technology will cause bandwidth delay which will affect the high availability of this technology when it encountered intensive applications during backup. In addition, during the process of backup heartbeat may lose effectiveness which led to false positives about virtual machine's state and infect system's normal operation. Therefore, two optimizations based on Remus hot standby mode were proposed. One is to compress the memory checkpoint, the other is to design a client-oriented hot standby mechanism. Through the comparison of bandwidth-delay before and after the improvement, checkpoint compression based on XOR-RLE algorithm effectively reduces the bandwidth delay caused by the intensive applications, and the client-oriented hot standby mechanism substantially reduces false positives during hot standby process.

Key words: incremental checkpoint compression, memory intensive, high availability, virtual machine, hot standby

中图分类号: 

  • TP309
[1] MELL P, GRANCE T. The NIST definition of cloud computing[J]. Communications of the ACM, 2010, 53(6):50-56.
[2] ARMBRUST M, FOX A, GRIFFITH R, et al. Above the clouds: a berkeley view of cloud computing [EB/OL]. [201-02-24].http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf.
[3] 刘鹏程,陈榕.面向云计算的虚拟机动态迁移框架[J].计算机工程,2010, 36(5):37-39. LIU Pengcheng, CHEN Rong. Cloud computing-oriented live migration framework for virtual machine[J]. Computer Engineering, 2010, 36(5):37-39.
[4] HINES M, GOPALAN K. Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning[C]//Proceedings of 2009 ACMSIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'09). New York: ACM Press, 2009:51-60.
[5] LUO Yingwei, ZHANG Binbin, WANG Xiaolin. Live and incremental whole-system migration of virtual machines using block-bitmap[C]//Proceedings of IEEE International Conference on Cluster Computing (Cluster'08). Washington: IEEE Computer Society, 2008:99-106.
[6] LAGAR-CAVILLA H, WHITNEY J, SCANNELL A, et al. SnowFlock: rapid virtual machine cloning for cloud computing[C]//Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys'09). New York: ACM Press, 2009: 1-12.
[7] PARASHAR B, TANEJA G. Reliability and profit evaluation of a PLC hot standby system based on a Master-Slave concept and two types of repair facilities [J]. IEEE Transactions on Reliability, 2007, 56(3): 534-539.
[8] Warrier Chandra, XU Yingchun, Saxena Narothum. Hot standby protocol for wireless devices: US, 6795705[P]. 2004-09-21.
[9] 史文路,胡平.双机热备份系统的研究与改进[J].微处理机,2008, 29(3):180-182. SHI Wenlu, HU Ping. The research and improvement in duplex Hot-Backup system [J]. Microprocessors, 2008, 29(3):180-182.
[10] CULLY B, LEFEBVRE G, MEYER D, et al. Remus: high availability via asynchronous virtual machine replication[C]//Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI'08).
[S.l.]:[s.n.]2008: 161-174.
[11] MINHAS U F, RAJAGOPALAN S, CULLY B, et al. Remusdb: transparent high availability for database systems [J]. VLDB Journal, 2013, 22(SI): 29-45.
[12] 潘晓东.基于内存压缩的虚拟机实时迁移机制研究[D].武汉:华中科技大学,2009. PAN Xiaodong. A study of live virtual machine migration mechanism based on compression [D]. Wuhan: Huazhong University of Science and Technology, 2009.
[13] GEROFI B, VASS Z, ISHIKAWA Y. Utilizing memory content similarity for improving the performance of replicated virtual machines[C]//Proceedings of the 4th IEEE International Conference on Utility and Cloud Computing. Washington: IEEE Computer Society, 2011:73-80.
[14] DENG Li, JIN Hai, WU Song, et al. Fast saving and restoring virtual machines with page compression[C]// Proceedings of 2011 International Conference on Cloud and Service Computing (CSC). Washington: IEEE Computer Society, 2011: 150-157.
[15] VALLEE G, NAUGHTON T, ONG H, et al. Checkpoint/restart of virtual machines based on Xen [C]//Proceedings of the High Availability and Performance Computing Workshop (HAPCW'06).[S.l.]:[s.n.],2006: 1-6.
[16] ZHU JUN, DONG Wei, JIANG Zhefu, et al. Improving the performance of hypervisor-based fault tolerance[C]//Proceedings of 2010 IEEE International Symposium on Parallel Distributed Processing. Washington: IEEE Computer Society, 2010: 1-10.
[17] PARK E, EGGER B, LEE J. Fast and space-efficient virtual machine check pointing[C]//Proceedings of the 7th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'11). New York: ACM Press, 2011:75-86.
[1] 赵丹丹,陈兴蜀,金鑫. KVM Hypervisor安全能力增强技术研究[J]. 山东大学学报(理学版), 2017, 52(3): 38-43.
[2] 纪祥敏, 赵波, 向騻, 夏忠林. 基于扩展LS2的VMM动态度量形式化分析[J]. 山东大学学报(理学版), 2014, 49(09): 1-8.
[3] 邹德清, 杨凯, 张晓旭, 苑博阳, 冯明路. 虚拟域内访问控制系统的保护机制研究[J]. 山东大学学报(理学版), 2014, 49(09): 135-141.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!