- 1、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
- 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
- 4、该文档为VIP文档,如果想要下载,成为VIP会员后,下载免费。
- 5、成为VIP后,下载本文档将扣除1次下载权益。下载后,不支持退款、换文档。如有疑问请联系我们。
- 6、成为VIP后,您将拥有八大权益,权益包括:VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
- 7、VIP文档为合作方或网友上传,每下载1次, 网站将根据用户上传文档的质量评分、类型等,对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档
查看更多
Q-Learning By Examples
In this tutorial, you will discover step by step how an agent learns through training without teacher (unsupervised) in unknown environment. You will find out part of reinforcement learning algorithm called Q-learning. Reinforcement learning algorithm has been widely used for many applications such as robotics, multi agent system, game, and etc.
Instead of learning the theory of reinforcement that you can read it from many books and other web sites (see Resources for more references), in this tutorial will introduce the concept through simple but comprehensive numerical example. You may also download the Matlab code or MS Excel Spreadsheet for free.
Suppose we have 5 rooms in a building connected by certain doors as shown in the figure below. We give name to each room A to E. We can consider outside of the building as one big room to cover the building, and name it as F. Notice that there are two doors lead to the building from F, that is through room B and room E.
We can represent the rooms by graph, each room as a vertex (or node) and each door as an edge (or link). Refer to my other tutorial on Graph if you are not sure about what is Graph.
We want to set the target room. If we put an agent in any room, we want the agent to go outside the building. In other word, the goal room is the node F. To set this kind of goal, we introduce give a kind of reward value to each door (i.e. edge of the graph). The doors that lead immediately to the goal have instant reward of 100 (see diagram below, they have red arrows). Other doors that do not have direct connection to the target room have zero reward. Because the door is two way (from A can go to E and from E can go back to A), we assign two arrows to each room of the previous graph. Each arrow contains an instant reward value. The graph becomes state diagram as shown below
Additional loop with highest reward (100) is given to the goal room (F back to F) so that if the agent arrives at the goal,
您可能关注的文档
- 继电盒盖课程设计说明书.doc
- 简易波形发生器课程设计报告.doc
- 简易电子琴单片机课程设计.doc
- 简易示波器课程设计报告.doc
- 简易自动电阻测试仪设计周记.doc
- 简易自动油烟机控制系统毕业设计.doc
- 胶带输送机减速器课程设计.doc
- 节流变压降流量计课程设计.doc
- 结构设计原理课程设计---钢筋混凝土单向肋梁楼盖设计书.doc
- 结合支持向量机的特征选择方法在信用评估中的应用外文翻译.doc
- 2025年山东淄博市淄川区事业单位招聘初级综合类岗位75人笔试备考题库及参考答案详解一套.docx
- 游泳池安全培训PPT课件.pptx
- 2025年农村物流信息化数据治理方案报告.docx
- 2025年小型健身器材租赁市场前景报告.docx
- 2025年山东淄博文昌湖省级旅游度假区事业单位招聘4人笔试备考题库及参考答案详解1套.docx
- 高校人力资源绩效考核制度改进方案.docx
- 2025年宜宾港信资产管理有限公司公开招聘的备考题库附答案详解.docx
- 2025年宜宾港信资产管理有限公司公开招聘的备考题库附答案详解.docx
- 2025年储能运维成本报告挑战.docx
- 2025年乳制品行业奶酪市场五年分析.docx
最近下载
- 北京CBD核心区钢结构供应及安装分包工程述标.pptx VIP
- 常州大学怀德学院《嵌入式系统及应用》2022-2023学年第一学期期末试卷.doc VIP
- 安全经验分享比赛优秀安全经验分享汇编.doc VIP
- 锦州银行哈尔滨分行个人金融业务营销策略研究.pdf VIP
- GB-T 16260-1996 信息技术 软件产品评价 质量特性及其使用指南.pdf VIP
- 高考英语必备688个高频词汇.pdf VIP
- 老旧小区改造施工方案及技术措施.doc VIP
- 比赛经验分享发言稿.docx VIP
- 【修缮维修】施工方案及主要技术措施.docx VIP
- 关于历年高考英语必备高频词汇汇编(全国卷真题版).pdf
原创力文档


文档评论(0)