深度学习课件:深度强化学习.pptVIP

  • 90
  • 0
  • 约3.11千字
  • 约 31页
  • 2020-07-27 发布于浙江
  • 举报
Introduction to Deep Reinforcement Learning Yen-Chen Wu 2015/12/11 Outline Reinforcement Learning Markov Decision Process How to Solve MDPs DP MC TD Q-learning (DQN) Paper Review Reinforcement Learning Branches of Machine Learning What makes different? There is no supervisor, only a reward signal Feedback is delayed, not instantaneous Time really matters (sequential, non i.i.d data) Agent’s actions affect the subsequent data it receives Goal: Maximize Cumulative Reward Actions may have long term consequences Reward may be delayed It may be better to sacrifice immediate reward to gain more long-term reward Agent Enviroment →←↑↓ Defense Attack Jump Full observability vs Partial observability Learning and Planning Exploration and Exploitation Prediction and Control Markov Decision Process Markov Processes Markov Reward Processes Markov Decision Processes Markov Process Markov Reward Processes Markov Decision Process Markov Decision Process(MDP) S : finite set of states (observations) A : finite set of actions P : transition probability R : immediate reward γ : discount factor Goal : Choose policy π Maximize expected return : How to Solve MDP Dynamic Programming Monte-Carlo Temporal-Difference Q-Learning Model-based Dynamic Programming Evaluate policy Update policy Model Free Unknown Transition Probability Reward MC vs TD Model Free: Q-learning Instead of tabular optimal action-value function (Q-learning) = Bellman equation Basic idea : iterative update (lack of generalization) In practical : function approximator Linear ? Using DNN ! Deep Q-network (DQN) Video /watch?v=LJ4oCb6u7kk Deep Q-Network compute Q-values for all actions Input : 84x84x4 Convolves 32 filters of 8x8 with stride 4 Convolves 64 filters of 4x4 with stride 2 Convolves 64 filters of 3x3 with stride 1 Full-connected 512 nodes Output a node for each action Update DQN Loss function Gradient Two Technique Experience Replay Experience Pooled Memory Data efficiency (bootstrap

文档评论(0)

1亿VIP精品文档

相关文档