面向身份信息保持的肖像发型移除研究

doi:10.6040/j.issn.1671-9352.0.2026.045

摘要/Abstract

摘要： 肖像发型移除技术能高效移除现有发型,生成高保真度的光头图像,为用户提供便捷的虚拟发型更换体验。该技术同时可为3D人脸重建提供无遮挡面部纹理数据,提升3D人脸模型的真实感和细节表现力。然而,由于发型几何结构复杂多变、存在帽饰等物品的遮挡干扰,以及缺乏成对训练数据集,实现高质量的肖像发型移除仍面临重大挑战。现有方法往往难以兼顾身份信息保持和遮挡物去除的双重需求。因此,本文提出一种面向身份信息保持的肖像发型移除框架,用于从肖像图像中移除发型和帽饰等遮挡物,生成自然真实的光头图像。该框架首先采用SegFace人脸语义分割模型获取头发与帽子的掩膜区域,随后训练一个光头生成器专注于掩膜区域内容生成,确保新生成的内容在肤色、阴影效果及语义等方面与原始面部和背景高度兼容,通过增加身份损失约束,在实现发型移除的同时保持身份一致性。针对发饰遮挡这一技术难点(包括长度可变性和样式多样性),本文方法结合面部关键点与Bézier曲线对眉毛下方区域进行拟合,从而减少对身份相关面部区域的干扰。实验结果表明,本文方法能够高效去除各类发型和帽饰遮挡,提升发型迁移效果。

关键词: 发型移除, 扩散模型, 遮挡, 语义分割, Bézier曲线

Abstract: Portrait hairstyle removal aims to eliminate existing hairstyles from portrait images and generate high-fidelity bald images. It not only provides users with a flexible tool for virtual hairstyle editing, but also supplies unobstructed facial texture information for 3D face reconstruction, thereby improving the realism and geometric detail of reconstructed face models. However, achieving high-quality hairstyle removal remains challenging due to the complex and highly variable geometry of hairstyles, interference from occlusions such as hats and hair accessories, and the scarcity of paired training data. Existing methods often struggle to balance effective occlusion removal with faithful identity preservation. To address these issues, this paper proposes an identity-preserving portrait hairstyle removal framework for removing hairstyles and hat-related occlusions while generating natural and realistic bald portraits. First, the SegFace semantic segmentation model is employed to extract mask regions corresponding to hair and hats. A bald generator is then trained to focus on content synthesis within the masked regions, so that the generated content remains consistent with the original face and background in terms of skin tone, illumination, and semantic continuity. In addition, an identity loss is introduced to preserve facial identity during hairstyle removal. To further handle hair accessory occlusions with diverse shapes and spatial extents, facial landmarks are combined with Bézier curve fitting to refine the region below the eyebrows, thereby reducing interference with identity-related facial areas. Experimental results demonstrate that the proposed method effectively removes a wide range of hairstyles and hat-related occlusions while maintaining natural visual quality and identity consistency.

Key words: hairstyle removal, diffusion models, occlusion, semantic segmentation, Bézier curve

中图分类号:

TP391

姚勋祥,徐华,徐英城,张鹏,赵建敏. 面向身份信息保持的肖像发型移除研究[J]. 《山东大学学报(理学版)》, 2026, 61(6): 95-106.

YAO Xunxiang, XU Hua, XU Yingcheng, ZHANG Peng, ZHAO Jianmin. Research on identity-preserving portrait hairstyle removal[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(6): 95-106.

参考文献

[1] 陈彦名. 浅析帽饰在服装搭配设计中的创意与表现[J]. 轻纺工业与技术,2020,49(3):31-32. CHEN Yanming. Creative exploration and expression of head wear in fashion styling design[J]. Light and Textile Industry and Technology, 2020, 49(3):31-32.
[2] 赵丹妮. 帽饰在女性服饰搭配设计中的应用研究[J]. 明日风尚,2017(10):57. ZHAO Danni. Appliedresearch on the integration of headwear in womens fashion styling and design[J]. Ming Ri Feng Shang, 2017(10):57.
[3] ZHONG Y, ZHANG X, ZHAO Y, et al.Dreamlcm:towards high quality text-to-3D generation via latent consistency model[C] //Proceedings of the 32nd ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2024:1731-1740.
[4] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[J]. IEEE Trans Pattern Anal Mach Intell, 2021, 43(12):4217-4228.
[5] ABDAL R, ZHU P, MITRA N J, et al. Styleflow: attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows[J]. ACM Transactions on Graphics(ToG), 2021, 40(3):1-21.
[6] PATASHNIK O, WU Z, SHECHTMAN E, et al. Styleclip: text-driven manipulation of stylegan imagery[C] //Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021:2085-2094.
[7] SHEN Y, GU J, TANG X, et al. Interpreting the latent space ofgans for semantic face editing[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020:9243-9252.
[8] WU Y, YANG Y L, XIAO Q, et al. Coarse-to-fine: facial structure editing of portrait images via latent space classifications[J]. ACM Transactions on Graphics(ToG), 2021, 40(4):1-13.
[9] SHEN Y, ZHOU B. Closed-form factorization of latent semantics ingans[C] //Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Nashville: Computer Vision Foundation/IEEE, 2021:1532-1540.
[10] LOU X, LIU Y, LI X. Tecm-clip: text-based controllable multi-attribute face image manipulation[C] //Proceedings of the Asian Conference on Computer Vision. Macao: Springer, 2022:1942-1958.
[11] TOV O, ALALUF Y, NITZAN Y, et al. Designing an encoder for style gan image manipulation[J]. ACM Transactions on Graphics(ToG), 2021, 40(4):1-14.
[12] SONG J, MENG C, ERMON S. Denoising diffusion implicit models[C] //International Conference on Learning Representations, OpenReview.net, 2021:12-44.
[13] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[J]. Advances in Neural Information Processing Systems, 2020, 33:6840-6851.
[14] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022:10684-10695.
[15] RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with cliplatents[EB/OL].(2022-04-13)[2026-04-27]. https://doi.org/10.48550/arXiv.2204.06125.
[16] SAHARIA C, CHAN W, SAXENA S, et al. Photorealistic text-to-image diffusion models with deep language understanding[J]. Advances in Neural Information Processing Systems, 2022, 35:36479-36494.
[17] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C] //Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014:2672-2680.
[18] WU Y, YANG Y L, JIN X.Hairmapper:removing hair from portraits using gans[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022:4227-4236.
[19] ZHANG Y, ZHANG Q, SONG Y, et al. Stable-hair:real-world hair transfer via diffusion model[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2025, 39(10):10348-10356.
[20] KARRAS T, AITTALA M, HELLSTEN J, et al. Training generative adversarial networks with limited data[J]. Advances in Neural Information Processing Systems, 2020, 33:12104-12114.
[21] KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality ofstylegan[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020:8110-8119.
[22] ABDAL R, QIN Y, WONKA P. Image2stylegan: how to embed images into the stylegan latent space?[C] //Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019:4432-4441.
[23] ABDAL R, QIN Y, WONKA P. Image2stylegan++: how to edit the embedded images?[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020:8296-8305.
[24] RICHARDSON E, ALALUF Y, PATASHNIK O, et al. Encoding in style: a stylegan encoder for image-to-image translation[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021:2287-2296.
[25] LIU L, REN Y, LIN Z, et al. Pseudo numerical methods for diffusion models on manifolds[C] //International Conference on Learning Representations, OpenReview.net, 2022:12-40.
[26] KINGMA D P, WELLING M. Auto-encoding variationalbayes[J]. Stat, 2014, 1050:1.
[27] ZENG Y, ZHANG Y, LIU J, et al. HairDiffusion: vivid multi-colored hair editing via latent diffusion[J]. Advances in Neural Information Processing Systems, 2024, 37:5048-5073.
[28] GAL R, ALALUF Y, ATZMON Y, et al. An image is worth one word: personalizing text-to-image generation using textual inversion[EB/OL].(2022-08-02)[2026-04-27]. https://arxiv.org/abs/2208.01618.
[29] ZHANG L, RAO A, AGRAWALA M. Adding conditional control to text-to-image diffusion models[C] //Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023:3836-3847.
[30] YANG B, GU S, ZHANG B, et al. Paint by example: exemplar-based image editing with diffusion models[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023:18381-18391.
[31] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C] //International Conference on Medical Image Computing and Computer-assisted Intervention. Cham: Springer International Publishing, 2015:234-241.
[32] NARAYAN K, VS V, PATEL V M. Segface: face segmentation of long-tail classes[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2025, 39(6):6182-6190.
[33] DENG J, GUO J, XUE N, et al. Arcface: additive angular margin loss for deep face recognition[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: Computer Vision Foundation/IEEE, 2019:4690-4699.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed