- 1、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
- 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
- 4、该文档为VIP文档,如果想要下载,成为VIP会员后,下载免费。
- 5、成为VIP后,下载本文档将扣除1次下载权益。下载后,不支持退款、换文档。如有疑问请联系我们。
- 6、成为VIP后,您将拥有八大权益,权益包括:VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
- 7、VIP文档为合作方或网友上传,每下载1次, 网站将根据用户上传文档的质量评分、类型等,对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档
查看更多
Qlearning with infinite stat space
$
%
Convergence of Q-learning with
linear function approximation
Francisco S. Melo and M. Isabel Ribeiro
Institute for Systems and Robotics
[fmelo,mir]@isr.ist.utl.pt
European Control Conference,
Kos, Greece, July 2007
July 4th, 2007 Slide 1
$
%
Outline of the presentation
? Motivation and problem formulation
? Background
? Related work
? Q-learning with LFA
? Some results
? Concluding remarks
July 4th, 2007 Slide 2
$
%
Motivation
? Markov decision processes provide useful models to address
discrete-time stochastic control problems;
? Many powerful methods are available (e.g., TD(λ), Q-learning,
SARSA).
However...
? Many such methods rely on explicit representations of the
state-space;
? Many interesting problems have a state-space unsuited for explicit
representation (e.g., infinite);
July 4th, 2007 Slide 3
$
%
Problem formulation
? In this paper we consider Markov decision processes with infinite
state-spaces;
? We propose a modified version of Q-learning that accomodates
MDPs with infinite state-space;
? To this end, we make use of linear function approximation to
achieve compact representation.
July 4th, 2007 Slide 4
$
%
Outline of the presentation
? Motivation and problem formulation
? Background
? Related work
? Q-learning with LFA
? Some results
? Concluding remarks
July 4th, 2007 Slide 5
$
%
Background
? We consider a controlled Markov chain {Xt}, where each r.v. Xt
takes values in a set X ;
? X ? Rp is the state-space, assumed compact;
? The transitions of the chain are governed by the transition
probabilities
Pa(x, U) = P [Xt+1 ∈ U | Xt = x,At = a] ,
where Pa is a probability kernel ;
? The sequence {At} represents the control process;
? The control At takes values in a finite set A;
July 4th, 2007 Slide 6
$
%
Background (cont.)
? For every transition from x to y under the action a, a reward
r(x, a, y) is issued;
? r is a bounded, real function, known as the reinforcement;
Xt Xt+1
At
rt
July 4th, 2007 Slide 7
$
%
Background (cont.)
? The control se
您可能关注的文档
- On the gamma-ray spectra radiated by protons accelerated in SNR shocks near molecular cloud.pdf
- On the Group of Automorphisms of Universal Algebra and Many Sorted Algebra.pdf
- On the long term spatial segregation for a competition-diffusion system.pdf
- On the number of bound states for Schrdinger operators with operator-valued potentials.pdf
- On the mechanisms of various fretting wear modes微动磨损.pdf
- On the operator space UMD property for noncommutative Lp-spaces.pdf
- On the Polyharmonic Operator with a Periodic Potential.pdf
- On the position operator for massless particles.pdf
- On the semi-regular module and vertex operator algebras.pdf
- On the Trivial Many Sorted Algebras and Many Sorted Congruences.pdf
最近下载
- 路基工程安全风险辨识与防控表.pdf VIP
- 2026城银清算服务有限责任公司校园招聘16人备考题库及答案详解(历年真题).docx VIP
- Unit 6 Food Lesson 1(课件)人教精通版(2024)英语三年级上册.pptx
- 2026城银清算服务有限责任公司校园招聘16人备考题库精编答案详解.docx VIP
- DB32T 3985-2021河湖岸坡植物防护技术规范.docx VIP
- 生产与运作管理——考试题库及答案——2024年整理.pdf VIP
- 应急跑水事件应急预案.docx VIP
- 2026城银清算服务有限责任公司校园招聘16人备考题库带答案详解.docx VIP
- 提高幕墙预埋件安装一次合格率QC成果(九龙公司).docx
- 微项目论证重污染天气“汽车限行”的合理性——探讨社会性科学议题 说.pptx VIP
原创力文档


文档评论(0)