智能体AI:探索多模态交互的边界.pptxVIP

  • 0
  • 0
  • 约27.77万字
  • 约 10页
  • 2026-01-30 发布于湖南
  • 举报

Action

Figure1:OverviewofanAgentAIsystemthatcanperceiveandactindifferentdomainsandapplications.AgentAIisemergingasapromisingavenuetowardArtificialGeneralIntelligence(AGI).AgentAItraininghasdemonstratedthecapacityformulti-modalunderstandinginthephysicalworld.Itprovidesaframeworkforreality-agnostictrainingbyleveraginggenerativeAIalongsidemultipleindependentdatasources.Largefoundationmodelstrainedforagentandaction-relatedtaskscanbeappliedtophysicalandvirtualworldswhentrainedoncross-realitydata.WepresentthegeneraloverviewofanAgentAIsystemthatcanperceiveandactinmanydifferentdomainsandapplications,possiblyservingasaroutetowardsAGIusinganagentparadigm.

AGENTAI:

SURVEYINGTHEHORIZONSOFMULTIMODALINTERACTION

Learning

(pretraining,zero-shot,few-shotfromLLMandvLM,etc.)

TheEmergingAgentAlparadigmforMulti-modalandcross-RealityAGl

cognition

(Thinking,consciousness,

sensing,Empathy,and

overallcognitivesystem)

*EqualContribution.‡ProjectLead.tWorkdonewhileinterningatMicrosoftResearch,Redmond.

Multi-AgentplanningMemory

3DObjects/Environment

observationsInference

Memory

(knowledge,Logic,Reasoning,andInference)

Task-planningandskill-observation

socialnetworks

Audiosimulators

cloudservers

LLMsVLMs

planningcapabilities

Reasoningsystem

Autonomousvehicle

Manufacturing

Embodiedsystem

user/AgentInput

Task-

specific

Info

Environment

Datacompression

MLTheory

Informatics

perception

serviceRobots

Ambient

Intelligence

Agentparadigm

systemEfficiency

optimization

virtualworld

Application

MixedReality

AugmentedReality

Embodiment

GeneralistAgents

Assistant

virtualAvatar

GulAPP

product

Data

Theory

Multi-modal

sensors

Brain-computerInterface

Human

Interaction

Roboticscontroller

MicrophonesloT

Infra-

structure

smart

sensors

virtualReali

文档评论(0)

1亿VIP精品文档

相关文档