1. Google’s researchers have developed a new AI model called VLOGGER that can create a controllable avatar from a still image and audio input.
2. VLOGGER is still in the research stage and has limitations such as potential inaccuracies in movement and struggles with large motions or diverse environments.
3. Potential use cases for VLOGGER include video translation, virtual assistants, chatbots, and low-bandwidth video communication in VR environments.
Google’s researchers have been actively developing new models, with their latest project being VLOGGER, a way to turn a still image into a controllable avatar. This AI model can create an animated avatar from a photo and synchronize the movements with an audio file of the person speaking. Though currently just a research project, VLOGGER could potentially revolutionize communication in platforms like Teams or Slack.
The model behind VLOGGER is built on diffusion architecture, allowing it to predict motion for the face, body, pose, gaze, and expressions over time. The model goes through various steps to generate the avatar, including a 3D motion generation process and a “temporal diffusion” model to determine timings and movement. However, there are limitations to the technology, such as struggling with large motions, diverse environments, and only being able to handle relatively short videos.
Despite its limitations, Google’s researchers see various use cases for VLOGGER, including translation of videos, creating animated avatars for virtual assistants or chatbots, and providing low-bandwidth video communication. This technology could make the process of creating virtual avatars much easier and could be particularly useful for VR environments on headsets like the Meta Quest or the Apple Vision Pro. While similar tools exist, VLOGGER offers a simpler and more efficient way to create animated avatars for various purposes.