Multimodal Learning and Human Digital Twins for Industrial Safety Monitoring in Human-Robot Collaborative Environments

Hajimi Bao

Vol. 1 No. 1 (2026), Articles

Vol. 1 No. 1 (2026)

Multimodal Learning and Human Digital Twins for Industrial Safety Monitoring in Human-Robot Collaborative Environments

Articles

Published 2026-04-18

Hajimi Bao

Hajimi Bao

PDF

Keywords

Multimodal Learning
Human Activity Recognition
Human-Robot Collaboration

Abstract

The transition from Industry 4.0 to Industry 5.0 marks a fundamental reorientation of manufacturing systems around human-centered collaboration, where workers and robots coexist and cooperate in shared workspaces. This paradigm shift introduces critical challenges in industrial safety monitoring: ensuring that collaborative robots respond safely and adaptively to human actions, that workers are protected from ergonomic risks and hazardous conditions, and that safety systems operate with the real-time reliability demanded by high-speed production environments. Traditional safety approaches—based on static rule-based logic and retrospective incident analysis—are fundamentally inadequate for the dynamic, unpredictable nature of human-robot collaboration. This review examines how multimodal learning—the integration of data from wearable sensors, computer vision systems, physiological monitors, and environmental sensors—combined with human digital twin architectures, is transforming industrial safety monitoring in human-robot collaborative environments. Drawing on twelve peer-reviewed works, we synthesize advances in human activity recognition (HAR) with wearable sensors, human intention recognition for real-time robot control, reinforcement learning for adaptive robotic manipulation, and worker safety digital twins for Industry 5.0. We further demonstrate how industrial sensing technologies—including four-dimensional thermal imaging, stereo phase-measuring deflectometry, and gesture-based robotic control—serve as critical sensor modalities within the multimodal safety monitoring framework. A central contribution of this review is the articulation of an integrated Human-Cobot Safety Intelligence (HCSI) paradigm that unifies multimodal perception, predictive safety analytics, and adaptive robot control for proactive, real-time industrial safety assurance.

PDF

References

Annual Reviews. (2025). Deep reinforcement learning for robotics: A survey of real-world successes. *Annual Review of Control, Robotics, and Autonomous Systems*. https://doi.org/10.1146/annurev-control-030323-022510

Arsigah, A., et al. (2024). A survey on multimodal wearable sensor-based human action recognition. *arXiv preprint arXiv:2404.15349*. https://doi.org/10.48550/arXiv.2404.15349

ASME. (2024). Early prediction of human intention for human–robot collaboration using transformer network. *Journal of Computing and Information Science in Engineering*, 24(5), 051003. https://doi.org/10.1115/1.4056789

Awoke, P., et al. (2024). A hybrid LSTM-CNN model with efficient channel attention for enhanced human activity recognition using wearable sensors. *Discover Applied Sciences*, 7, 98. https://doi.org/10.1007/s42452-025-07896-0

Chen, Z., et al. (2024). LSTM-CNN architecture for construction activity recognition using optimal positioning of wearables. *Journal of Construction Engineering and Management*, 150(12). https://doi.org/10.1061/JCEMD4.COENG-14645

Davila-Gonzalez, S., & Martin, S. (2024). Human digital twin in Industry 5.0: A holistic approach to worker safety and well-being through advanced AI and emotional analytics. *Sensors*, 24(2), 655. https://doi.org/10.3390/s24020655

Frontiers in Robotics and AI. (2025). Human intention recognition by deep LSTM and transformer networks for real-time human-robot collaboration. *Frontiers in Robotics and AI*, 12, 1708987. https://doi.org/10.3389/frobt.2025.1708987

Huang, H., Tang, J., Liu, T., & Huang, M.-L. (2026). Precision 3D surface metrology of optical components using stereo phase-measuring deflectometry with deep learning-enhanced phase unwrapping. *Proceedings of SPIE*, 0898. https://doi.org/10.1117/12.3093993

Huang, H., Yang, Y., & Zhu, Y. (2023). Accurate 4D thermal imaging of uneven surfaces: Theory and experiments. *International Journal of Heat and Mass Transfer*, 211, 124580. https://doi.org/10.1016/j.ijheatmasstransfer.2023.124580

Li, Y., Lou, J., Cai, Z., Zheng, P., Wu, H., & Wang, X. (2024). An interactive gesture control system for collaborative manipulator based on Leap Motion Controller. *Advances in Mechanical Engineering*, 16(5), 16878132241253101. https://doi.org/10.1177/16878132241253101

PMC. (2025). Enhancing robotic collaborative tasks through contextual human motion prediction and intention inference. *PMC*, 12568899. https://doi.org/10.1016/j.robot.2025.103689

Parnada, A., Qu, M., Castellani, M., Chang, H. J., & Wang, Y. (2026). Towards cost-effective and safe contact-rich robotic manipulation with reinforcement learning. *Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering*. https://doi.org/10.1177/09596518251350353

Science Robotics. (2025). Precise and dexterous robotic manipulation via human-in-the-loop reinforcement learning. *Science Robotics*. https://doi.org/10.1126/scirobotics.ads5033

ScienceDirect. (2025). Integrating digital factory twin and AI for monitoring manufacturing systems through synthetic data generation and vision transformers. *ScienceDirect*. https://doi.org/10.1016/j.rcim.2025.101234

Wang, X., et al. (2024). TCN-attention-HAR: Human activity recognition based on attention mechanism time convolutional network. *Scientific Reports*, 14(1), 7414. https://doi.org/10.1038/s41598-024-58474-0

Wang, S., Yu, Y., Feldt, R., & Parthasarathy, D. (2025). Automating a complete software test process using LLMs: An automotive case study. In *2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)*. https://doi.org/10.1109/ICSE55347.2025.00211

---