JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2026, Vol. 61 ›› Issue (6): 95-106.doi: 10.6040/j.issn.1671-9352.0.2026.045

Previous Articles    

Research on identity-preserving portrait hairstyle removal

YAO Xunxiang1, XU Hua2, XU Yingcheng3, ZHANG Peng2, ZHAO Jianmin4   

  1. 1. School of Computer Science and Artificial Intelligence, Shandong University of Finance and Economics, Jinan 250014, Shandong, China;
    2. School of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, Shandong, China;
    3. School of Management Science and Engineering, Shandong University of Finance and Economics, Jinan 250014, Shandong, China;
    4. Weifang Financial Media Center, Weifang 261000, Shandong, China
  • Published:2026-06-04

Abstract: Portrait hairstyle removal aims to eliminate existing hairstyles from portrait images and generate high-fidelity bald images. It not only provides users with a flexible tool for virtual hairstyle editing, but also supplies unobstructed facial texture information for 3D face reconstruction, thereby improving the realism and geometric detail of reconstructed face models. However, achieving high-quality hairstyle removal remains challenging due to the complex and highly variable geometry of hairstyles, interference from occlusions such as hats and hair accessories, and the scarcity of paired training data. Existing methods often struggle to balance effective occlusion removal with faithful identity preservation. To address these issues, this paper proposes an identity-preserving portrait hairstyle removal framework for removing hairstyles and hat-related occlusions while generating natural and realistic bald portraits. First, the SegFace semantic segmentation model is employed to extract mask regions corresponding to hair and hats. A bald generator is then trained to focus on content synthesis within the masked regions, so that the generated content remains consistent with the original face and background in terms of skin tone, illumination, and semantic continuity. In addition, an identity loss is introduced to preserve facial identity during hairstyle removal. To further handle hair accessory occlusions with diverse shapes and spatial extents, facial landmarks are combined with Bézier curve fitting to refine the region below the eyebrows, thereby reducing interference with identity-related facial areas. Experimental results demonstrate that the proposed method effectively removes a wide range of hairstyles and hat-related occlusions while maintaining natural visual quality and identity consistency.

Key words: hairstyle removal, diffusion models, occlusion, semantic segmentation, Bézier curve

CLC Number: 

  • TP391
[1] 陈彦名. 浅析帽饰在服装搭配设计中的创意与表现[J]. 轻纺工业与技术,2020,49(3):31-32. CHEN Yanming. Creative exploration and expression of head wear in fashion styling design[J]. Light and Textile Industry and Technology, 2020, 49(3):31-32.
[2] 赵丹妮. 帽饰在女性服饰搭配设计中的应用研究[J]. 明日风尚,2017(10):57. ZHAO Danni. Appliedresearch on the integration of headwear in womens fashion styling and design[J]. Ming Ri Feng Shang, 2017(10):57.
[3] ZHONG Y, ZHANG X, ZHAO Y, et al.Dreamlcm:towards high quality text-to-3D generation via latent consistency model[C] //Proceedings of the 32nd ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2024:1731-1740.
[4] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[J]. IEEE Trans Pattern Anal Mach Intell, 2021, 43(12):4217-4228.
[5] ABDAL R, ZHU P, MITRA N J, et al. Styleflow: attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows[J]. ACM Transactions on Graphics(ToG), 2021, 40(3):1-21.
[6] PATASHNIK O, WU Z, SHECHTMAN E, et al. Styleclip: text-driven manipulation of stylegan imagery[C] //Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021:2085-2094.
[7] SHEN Y, GU J, TANG X, et al. Interpreting the latent space ofgans for semantic face editing[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020:9243-9252.
[8] WU Y, YANG Y L, XIAO Q, et al. Coarse-to-fine: facial structure editing of portrait images via latent space classifications[J]. ACM Transactions on Graphics(ToG), 2021, 40(4):1-13.
[9] SHEN Y, ZHOU B. Closed-form factorization of latent semantics ingans[C] //Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Nashville: Computer Vision Foundation/IEEE, 2021:1532-1540.
[10] LOU X, LIU Y, LI X. Tecm-clip: text-based controllable multi-attribute face image manipulation[C] //Proceedings of the Asian Conference on Computer Vision. Macao: Springer, 2022:1942-1958.
[11] TOV O, ALALUF Y, NITZAN Y, et al. Designing an encoder for style gan image manipulation[J]. ACM Transactions on Graphics(ToG), 2021, 40(4):1-14.
[12] SONG J, MENG C, ERMON S. Denoising diffusion implicit models[C] //International Conference on Learning Representations, OpenReview.net, 2021:12-44.
[13] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[J]. Advances in Neural Information Processing Systems, 2020, 33:6840-6851.
[14] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022:10684-10695.
[15] RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with cliplatents[EB/OL].(2022-04-13)[2026-04-27]. https://doi.org/10.48550/arXiv.2204.06125.
[16] SAHARIA C, CHAN W, SAXENA S, et al. Photorealistic text-to-image diffusion models with deep language understanding[J]. Advances in Neural Information Processing Systems, 2022, 35:36479-36494.
[17] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C] //Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014:2672-2680.
[18] WU Y, YANG Y L, JIN X.Hairmapper:removing hair from portraits using gans[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022:4227-4236.
[19] ZHANG Y, ZHANG Q, SONG Y, et al. Stable-hair:real-world hair transfer via diffusion model[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2025, 39(10):10348-10356.
[20] KARRAS T, AITTALA M, HELLSTEN J, et al. Training generative adversarial networks with limited data[J]. Advances in Neural Information Processing Systems, 2020, 33:12104-12114.
[21] KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality ofstylegan[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020:8110-8119.
[22] ABDAL R, QIN Y, WONKA P. Image2stylegan: how to embed images into the stylegan latent space?[C] //Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019:4432-4441.
[23] ABDAL R, QIN Y, WONKA P. Image2stylegan++: how to edit the embedded images?[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020:8296-8305.
[24] RICHARDSON E, ALALUF Y, PATASHNIK O, et al. Encoding in style: a stylegan encoder for image-to-image translation[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021:2287-2296.
[25] LIU L, REN Y, LIN Z, et al. Pseudo numerical methods for diffusion models on manifolds[C] //International Conference on Learning Representations, OpenReview.net, 2022:12-40.
[26] KINGMA D P, WELLING M. Auto-encoding variationalbayes[J]. Stat, 2014, 1050:1.
[27] ZENG Y, ZHANG Y, LIU J, et al. HairDiffusion: vivid multi-colored hair editing via latent diffusion[J]. Advances in Neural Information Processing Systems, 2024, 37:5048-5073.
[28] GAL R, ALALUF Y, ATZMON Y, et al. An image is worth one word: personalizing text-to-image generation using textual inversion[EB/OL].(2022-08-02)[2026-04-27]. https://arxiv.org/abs/2208.01618.
[29] ZHANG L, RAO A, AGRAWALA M. Adding conditional control to text-to-image diffusion models[C] //Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023:3836-3847.
[30] YANG B, GU S, ZHANG B, et al. Paint by example: exemplar-based image editing with diffusion models[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023:18381-18391.
[31] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C] //International Conference on Medical Image Computing and Computer-assisted Intervention. Cham: Springer International Publishing, 2015:234-241.
[32] NARAYAN K, VS V, PATEL V M. Segface: face segmentation of long-tail classes[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2025, 39(6):6182-6190.
[33] DENG J, GUO J, XUE N, et al. Arcface: additive angular margin loss for deep face recognition[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: Computer Vision Foundation/IEEE, 2019:4690-4699.
[1] Chengcheng ZHONG,Heng ZHOU,Zitong ZHANG,Chunlei ZHANG. LAC-UNet: semantic segmentation model based on capsules for representing part-whole hierarchical features [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(11): 116-126.
[2] LIU Hua-yong, XIE Xin-ping, LI Lu, ZHANG Da-ming, WANG Huan-bao. A class of trigonometric Bézier curve and surface which satisfy G2 continuity [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(10): 65-71.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!