- 90
- 0
- 约3.11千字
- 约 31页
- 2020-07-27 发布于浙江
- 举报
Introduction to Deep Reinforcement Learning
Yen-Chen Wu
2015/12/11
Outline
Reinforcement Learning
Markov Decision Process
How to Solve MDPs
DP
MC
TD
Q-learning (DQN)
Paper Review
Reinforcement Learning
Branches of Machine Learning
What makes different?
There is no supervisor, only a reward signal
Feedback is delayed, not instantaneous
Time really matters (sequential, non i.i.d data)
Agent’s actions affect the subsequent data it receives
Goal: Maximize Cumulative Reward
Actions may have long term consequences
Reward may be delayed
It may be better to sacrifice immediate reward to gain more long-term reward
Agent Enviroment
→←↑↓
Defense
Attack
Jump
Full observability vs Partial observability
Learning and Planning
Exploration and Exploitation
Prediction and Control
Markov Decision Process
Markov Processes
Markov Reward Processes
Markov Decision Processes
Markov Process
Markov Reward Processes
Markov Decision Process
Markov Decision Process(MDP)
S : finite set of states (observations)
A : finite set of actions
P : transition probability
R : immediate reward
γ : discount factor
Goal :
Choose policy π
Maximize expected return :
How to Solve MDP
Dynamic Programming
Monte-Carlo
Temporal-Difference
Q-Learning
Model-based
Dynamic Programming
Evaluate policy
Update policy
Model Free
Unknown Transition Probability Reward
MC vs TD
Model Free: Q-learning
Instead of tabular
optimal action-value function (Q-learning)
=
Bellman equation
Basic idea : iterative update (lack of generalization)
In practical : function approximator
Linear ?
Using DNN !
Deep Q-network (DQN)
Video
/watch?v=LJ4oCb6u7kk
Deep Q-Network
compute Q-values for all actions
Input : 84x84x4
Convolves 32 filters of 8x8 with stride 4
Convolves 64 filters of 4x4 with stride 2
Convolves 64 filters of 3x3 with stride 1
Full-connected 512 nodes
Output a node for each action
Update DQN
Loss function
Gradient
Two Technique
Experience Replay
Experience
Pooled Memory
Data efficiency (bootstrap
您可能关注的文档
最近下载
- 《雷雨》话剧剧本(第三幕).pdf VIP
- 单招综合素质测试题及答案.docx VIP
- 2026年《职业能力倾向测验》题库200道(含答案).docx VIP
- 《创伤失血性休克中国急诊专家共识(2023)》解读PPT课件.pptx VIP
- 冲床日常点检表.docx VIP
- 第3课字符编码教学设计四下信息科技浙教版(2023) .pdf VIP
- 2025年互联网营销师脑机接口技术在用户体验与营销中的潜在应用专题试卷及解析.pdf VIP
- 2025年无人机驾驶员执照疲劳数据采集与分析专题试卷及解析.pdf VIP
- 2025年房地产经纪人房地产登记中的政策理解能力专题试卷及解析.pdf VIP
- GB50134-2004人民防空工程施工及验收规范.docx VIP
原创力文档

文档评论(0)