文本到视频生成：研究现状、进展和挑战.pdfVIP

下载本文档

1
0
约5.13万字
约 13页
2025-10-21 发布于海南
举报

文本到视频生成：研究现状、进展和挑战.pdf

第46卷第5期电

子

与

信

息

学

报Vol.

46No.

2024年5月Journal

Electronics

Information

TechnologyMay

2024

文本到视频生成：研究现状、进展和挑战

邓梓焌

何相腾

彭宇新*

(北京大学王选计算机研究所

北京

100080)

摘

要：文本到视频生成旨在根据用户给定的文本描述生成语义一致、内容真实、时序连贯且符合逻辑的视频。

该文首先介绍了文本到视频生成领域的研究现状，详细介绍了3类主流的文本到视频生成方法：基于循环网络与

生成对抗网络(GAN)的生成方法，基于Transformer的生成方法和基于扩散模型的生成方法。这3类生成方法在视

频生成任务上各有优劣：基于循环网络与生成对抗网络的生成方法能生成较高分辨率和时长的视频，但难以生成

复杂的开放域视频；基于Transformer的生成方法有能力生成复杂的开放域视频，但受限于Transformer模型单向

偏置、累计误差等问题，难以生成高保真视频；扩散模型具有很好的泛化性，但受制于推理速度和高昂的内存消

耗，难以生成高清的长视频。然后，该文介绍了文本到视频生成领域的评测基准和指标，并分析比较了现有主流

方法的性能。最后，展望了未来可能的研究方向。

关键词：文本到视频生成；扩散模型；生成对抗网络.

中图分类号：TN911.6;

TP18文献标识码：A文章编号：1009-5896(2024)05-1632-13

DOI:

10.11999/JEIT240074

Text-to-videoGeneration:ResearchStatus,ProgressandChallenges

DENG

Zijun

Xiangteng

PENG

Yuxin

(WangxuanInstituteofComputerTechnology,PekingUniversity,Beijing100080,China)

Abstract:

The

generation

video

from

text

aims

produce

semantically

consistent,

photo-realistic,

temporal

consistent,

and

logically

coherent

videos

based

provided

textual

descriptions.

Firstly,

the

current

state

research

the

field

text-to-video

generation

elucidated

this

paper,

providing

detailed

overview

three

mainstream

approaches:

methods

based

recurrent

networks

and

Generative

Adversarial

Networks

(GAN),

methods

based

Transformers,

and

methods

based

diffusion

models.

Each

these

models

has

its

strengths

and

weaknesses

video

generation.

The

recurrent

networks

and

GAN-based

methods

can

generate

您可能关注的文档

文档评论（0）

1亿VIP精品文档

更多 >

文本到视频生成：研究现状、进展和挑战.pdfVIP