- 1、本文档共27页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Qlearning with infinite stat space
$
%
Convergence of Q-learning with
linear function approximation
Francisco S. Melo and M. Isabel Ribeiro
Institute for Systems and Robotics
[fmelo,mir]@isr.ist.utl.pt
European Control Conference,
Kos, Greece, July 2007
July 4th, 2007 Slide 1
$
%
Outline of the presentation
? Motivation and problem formulation
? Background
? Related work
? Q-learning with LFA
? Some results
? Concluding remarks
July 4th, 2007 Slide 2
$
%
Motivation
? Markov decision processes provide useful models to address
discrete-time stochastic control problems;
? Many powerful methods are available (e.g., TD(λ), Q-learning,
SARSA).
However...
? Many such methods rely on explicit representations of the
state-space;
? Many interesting problems have a state-space unsuited for explicit
representation (e.g., infinite);
July 4th, 2007 Slide 3
$
%
Problem formulation
? In this paper we consider Markov decision processes with infinite
state-spaces;
? We propose a modified version of Q-learning that accomodates
MDPs with infinite state-space;
? To this end, we make use of linear function approximation to
achieve compact representation.
July 4th, 2007 Slide 4
$
%
Outline of the presentation
? Motivation and problem formulation
? Background
? Related work
? Q-learning with LFA
? Some results
? Concluding remarks
July 4th, 2007 Slide 5
$
%
Background
? We consider a controlled Markov chain {Xt}, where each r.v. Xt
takes values in a set X ;
? X ? Rp is the state-space, assumed compact;
? The transitions of the chain are governed by the transition
probabilities
Pa(x, U) = P [Xt+1 ∈ U | Xt = x,At = a] ,
where Pa is a probability kernel ;
? The sequence {At} represents the control process;
? The control At takes values in a finite set A;
July 4th, 2007 Slide 6
$
%
Background (cont.)
? For every transition from x to y under the action a, a reward
r(x, a, y) is issued;
? r is a bounded, real function, known as the reinforcement;
Xt Xt+1
At
rt
July 4th, 2007 Slide 7
$
%
Background (cont.)
? The control se
您可能关注的文档
- On the gamma-ray spectra radiated by protons accelerated in SNR shocks near molecular cloud.pdf
- On the Group of Automorphisms of Universal Algebra and Many Sorted Algebra.pdf
- On the long term spatial segregation for a competition-diffusion system.pdf
- On the number of bound states for Schrdinger operators with operator-valued potentials.pdf
- On the mechanisms of various fretting wear modes微动磨损.pdf
- On the operator space UMD property for noncommutative Lp-spaces.pdf
- On the Polyharmonic Operator with a Periodic Potential.pdf
- On the position operator for massless particles.pdf
- On the semi-regular module and vertex operator algebras.pdf
- On the Trivial Many Sorted Algebras and Many Sorted Congruences.pdf
文档评论(0)