文稿毅机器学习rl v5.pptxVIP

  • 11
  • 0
  • 约8.12千字
  • 约 39页
  • 2021-06-15 发布于北京
  • 举报
Deep Reinforcement LearningExample: Playing Video GameStart with observation ??Observation Observation ?Obtain reward ??Obtain reward ?Action : “fire” ?Action : “right” (kill an alien)Usually there is some randomness in the environmentExample: Playing Video GameStart with observation ??Observation Observation ?This is an episode.After many turnsGame Over(spaceship destroyed)Learn to maximize the expected cumulative reward per episode?Obtain reward ?Action ApproachesModel-free ApproachPolicy-basedValue-basedLearning a CriticLearning an ActorActor + CriticModel-based ApproachOn-policy v.s

文档评论(0)

1亿VIP精品文档

相关文档