position: EnglishChannel  > News> Upload a Photo, Get a Video

Upload a Photo, Get a Video

Source: Science and Technology Daily | 2025-06-11 11:29:11 | Author: LI LInxu

The rapid developments in AI have unlocked new possibilities for digital representation. With the help of AI models, you can now achieve a remarkable feat: bringing characters to life with just an image and an audio clip.

Jointly developed by Tencent Hunyuan and Tencent Music, the newly released HunyuanVideo-Avatar, a multimodal diffusion transformer-based model, is capable of simultaneously generating dynamic, emotion-controllable, and multi-character dialogue videos. This capability supports head-and-shoulder, half-body, and full-body views, encompassing multiple styles, species, and even dual-character scenes.

To put it simply, you just upload a photo and a voice clip, and the model figures out the context, emotion and lip movements to create a realistic animated video.

For instance, if you upload an image of a woman sitting on a beach with a guitar, along with a piece of lyrical music,  the model understands the scene as "a woman playing the guitar and singing a lyrical song by the sea," and subsequently generates a video of the woman performing the song.

The model provides video creators with highly consistent and dynamic video generation capabilities. Its versatility can unlock a myriad of applications in fields like entertainment, media, e-commerce, advertising and education.

It has already been applied in multiple scenarios within Tencent Music, such as AI companions for music listening, long-form audio podcasts, and music videos (MVs).

For example, on the app QQ Music, when users listen to songs by "AI Leehom" (a fully AI-driven singer created by Tencent Music and Team Leehom), a lively and adorable AI Leehom image synchronizes its singing in real-time on the player.

On WeSing, a popular karaoke singing app, users can upload their images to generate personalized MVs of themselves singing.

In subject consistency and audio-video synchronization, the HunyuanVideo-Avatar shows top-tier industry performance. For video dynamics and natural body movements, it exceeds open-source solutions and rivals closed-source ones.

Currently, the model supports audio uploads of up to 14 seconds for video generation, with more capabilities to be released and open-sourced in the future.

Editor:李林旭

Top News

Innovation as Engine: How Local Brands Are Going Global

At the forefront of this change are companies that have transformed from local workshops into global leaders through relentless innovation and smart manufacturing.

SKAO Director-General:Science Breaks Down Borders

During the Second Belt and Road Conference on Science and Technology Exchange, SKAO Director-General Philip Diamond said that science breaks down borders.

抱歉,您使用的浏览器版本过低或开启了浏览器兼容模式,这会影响您正常浏览本网页

您可以进行以下操作:

1.将浏览器切换回极速模式

2.点击下面图标升级或更换您的浏览器

3.暂不升级,继续浏览

继续浏览