人工智能论文英文版-Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning.pdfVIP

  • 0
  • 0
  • 约17.55万字
  • 约 20页
  • 2025-06-13 发布于湖南
  • 举报

人工智能论文英文版-Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning.pdf

GradualTransitionfromBellmanOptimalityOperatortoBellmanOperatorin

OnlineReinforcementLearning

MotokiOmura1KazukiOta1TakayukiOsa2YusukeMukuta12TatsuyaHarada12

′′

Abstractcontinuousactionspaces,computingmaxa′Q(s,a)for

aninfinitenumberofactionsischallenging.Actor-critic-

Forcontinuousactionspaces,actor-criticmethods

basedalgorithmsaddressthisbyestimatingtheQ-value

5arewidelyusedinonlinereinforcementlearningforthecurrentpolicyusingtheBellmanoperator.Inthese

2(RL).However,unlikeRLalgorithmsfordiscrete

cases,policyimprovementisachievedsolelythroughpolicy

0actions,whichgenerallymodeltheoptimalvalue

updates,leadingtoslowerperformanceimprovementand

2functionusingtheBellmanoptimalityoperator,

reducedsampleefficiency(Jietal.,2024).Intaskswith

nRLalgorithmsforcontinuousactionstypically

continuousactionspaces,suchasroboticcontrol,sample

umodelQ-valuesforthecurrentpolicyusingthe

J

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档