MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices

Anonymous Authors

Abstract

Existing neural head avatars methods have achieved significant progress in the image quality and motion range of portrait animation. However, these methods neglect the computational overhead, and to the best of our knowledge, none is designed to run on mobile devices. This paper presents MobilePortrait, a lightweight one-shot neural head avatars method that reduces learning complexity by integrating external knowledge into both the motion modeling and image synthesis, enabling real-time inference on mobile devices. Specifically, we introduce a mixed representation of explicit and implicit keypoints for precise motion modeling and precomputed visual features for enhanced foreground and background synthesis. With these two key designs and using simple U-Nets as backbones, our method achieves state-of-the-art performance with less than one-tenth the computational demand. It has been validated to reach speeds of over 100 FPS on mobile devices and support both video and audio-driven inputs.

Video-Driven Results

Video-Driven Comparisons

Audio-Driven Comparisons

Analysis of Failure Cases

Although MobilePortrait, as demonstrated in the main text, maintains good robustness when handling most images and motions, from the above Figures, we can observe that it still struggles when dealing with extreme angles of motion or styles that differ significantly from the training data. We speculate that this is because, for the image synthesis network, there is a need to inpaint a large amount of content in these scenarios, which often includes patterns that are difficult to learn from the training data, such as large areas of profile faces or cartoon styles. One solution is to address this issue by increasing the diversity of the training dataset, while another is to rely on robust intermediate representations, such as 3D facial structures. We leave this as future work to be tackled.