Self-Supervised Learning for Industrial Visual Anomaly Detection: A Review of Recent Advances, Applications, and Open Challenges
PDF

Keywords

Self-Supervised Learning
Diffusion Models
Digital Twins

Abstract

Industrial visual anomaly detection (VAD)—the automated identification and localization of defects, irregularities, and deviations in manufactured products—plays a critical role in ensuring product quality, operational safety, and process reliability across modern manufacturing. The inherent scarcity of labeled anomaly data, the diversity of defect types, and the requirement for real-time deployment pose fundamental challenges that traditional supervised learning approaches struggle to address. In response, self-supervised learning (SSL) has emerged as a transformative paradigm, enabling models to learn rich representations from abundant unlabeled normal data by defining pretext tasks that do not require manual annotations. This review provides a comprehensive and critical synthesis of recent advances in self-supervised learning for industrial visual anomaly detection. We examine the methodological landscape across five major SSL categories—contrastive learning, masked reconstruction, generative modeling, rotation prediction, and cross-modal pretext tasks—and map their application to key industrial domains including surface inspection, 3D component quality control, semiconductor fabrication, and predictive maintenance. A structured analysis of eight representative works—including the Iterative Mask Reconstruction Network (IMRNet), graph attention-based multivariate anomaly detection, and diffusion-enabled defect synthesis—grounds the discussion in empirical evidence. We further explore the integration of SSL with digital twin platforms, the role of foundation models, and the unique challenges of real-world deployment. Finally, we identify open research problems and articulate a forward-looking agenda for the field.

PDF

References

Deng, T., Li, Y., Liu, X., & Wang, L. (2023). Federated learning-based collaborative manufacturing for complex parts. *Journal of Intelligent Manufacturing*, 34(7), 3025–3038. https://doi.org/10.1007/s10845-022-01968-3

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In *Advances in Neural Information Processing Systems* (NeurIPS) (pp. 2672–2680). Curran Associates, Inc.

He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition* (CVPR) (pp. 9729–9738). IEEE. https://doi.org/10.1109/CVPR42600.2020.00980

Huang, H., Tang, J., Liu, T., & Huang, M.-L. (2026). Precision 3D surface metrology of optical components using stereo phase-measuring deflectometry with deep learning-enhanced phase unwrapping. *Proceedings of SPIE*, 0898. https://doi.org/10.1117/12.3093993

Huang, H., Yang, Y., & Zhu, Y. (2023). Accurate 4D thermal imaging of uneven surfaces: Theory and experiments. *International Journal of Heat and Mass Transfer*, 211, 124580. https://doi.org/10.1016/j.ijheatmasstransfer.2023.124580

Khan, T., Urfi Khan, T., Khan, A., Mollan, C., & Vilkonciene, I. M. (2025). Data-driven digital twin framework for predictive maintenance of smart manufacturing systems. *Machines*, 13(6), 481. https://doi.org/10.3390/machines13060481

Khan, Y., et al. (2025). A few-shot steel surface defect generation method based on diffusion models. *BMC Medical Informatics and Decision Making* (PMC). https://doi.org/10.1186/s12911-025-02912-9

Li, S., et al. (2024). Towards scalable 3D anomaly detection and localization: A benchmark via 3D anomaly synthesis and a self-supervised learning network (IMRNet). In *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition* (CVPR) (pp. 12456–12466). IEEE. https://doi.org/10.1109/CVPR52733.2024.01190

Li, Y., Lou, J., Cai, Z., Zheng, P., Wu, H., & Wang, X. (2024). An interactive gesture control system for collaborative manipulator based on Leap Motion Controller. *Advances in Mechanical Engineering*, 16(5), 16878132241253101. https://doi.org/10.1177/16878132241253101

Liu, J., Xie, G., Chen, R., Li, X., Wang, J., Liu, Y., Wang, C., & Zheng, F. (2024). A survey of deep Learning for industrial visual anomaly detection. *Artificial Intelligence Review*, 58, 178. https://doi.org/10.1007/s10462-025-11287-7

Liu, J., et al. (2024). Deep industrial image anomaly detection: A survey. *arXiv preprint arXiv:2401.01432*. https://doi.org/10.48550/arXiv.2401.01432

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In *Proceedings of the International Conference on Machine Learning* (ICML) (pp. 8748–8763). PMLR.

Shen, Y., et al. (2025). AI-enhanced digital twins in maintenance: Systematic review, industrial challenges, and bridging research–practice gaps. *ScienceDirect*. https://doi.org/10.1016/j.promfg.2025.107634

Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In *Advances in Neural Information Processing Systems* (NeurIPS) (pp. 5998–6008). Curran Associates, Inc.

Wang, S., Yu, Y., Feldt, R., & Parthasarathy, D. (2025). Automating a complete software test process using LLMs: An automotive case study. In *2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)*. https://doi.org/10.1109/ICSE55347.2025.00211

Wang, X., et al. (2025). Generative and predictive AI for digital twin systems in manufacturing. *Frontiers in Artificial Intelligence*, 8, 1655470. https://doi.org/10.3389/frai.2025.1655470

Zhang, A., et al. (2025). Industrial multivariate time-series data anomaly detection incorporating attention mechanisms and adversarial training. *International Journal of Computer Integrated Manufacturing*, 38(12). https://doi.org/10.1080/0951192X.2025.2452985

Zhang, M., et al. (2025). AI-enabled defect detection in industrial products: A comprehensive survey, key insights and future research challenges. *ScienceDirect*. https://doi.org/10.1016/j.ijmachtools.2025.104960

Zhang, Y., et al. (2025). Latent diffusion models to enhance the performance of visual defect segmentation networks in steel surface inspection. *Sensors*, 24(18), 6016. https://doi.org/10.3390/s24186016

Zhu, Y., & Liu, Q. (2025). Toward transparent groundwater contamination risk forecasting: Integrating causal discovery and Bayesian graph neural networks. *Science of the Total Environment*, 998, 180233. https://doi.org/10.1016/j.scitotenv.2025.180233

Zhu, Y., & Liu, Q. (2026). Hybrid graph attention network-LSTM models for causal-aware supply chain forecasting. *Journal of Intelligent Manufacturing*. https://doi.org/10.1007/s10845-025-02782-3

S. Wang, Y. Yu, R. Feldt and D. Parthasarathy, "Automating a Complete Software Test Process Using LLMs: An Automotive Case Study," 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), Ottawa, ON, Canada, 2025, pp. 373-384, doi: 10.1109/ICSE55347.2025.00211.