基于高斯回归的连续空间多智能体跟踪学习.doc

基于高斯回归的连续空间多智能体跟踪学习.doc

  1. 1、本文档共15页,可阅读全部内容。
  2. 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
  3. 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  4. 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
 基于高斯回归的连续空间多智能体跟踪学 习# 魏海军,陈鑫** 5 10 15 20 25 30 35 40 (中南大学信息科学与工程学院,长沙 410083) 摘要:在多智能体的实际应用中,泛化往往是多智能体系统(MAS)策略学习算法应用于连续 状态空间需要解决的关键问题之一。本文提出一种基于高斯回归的连续空间多智能体跟踪学 习架构。通过定义降维的 Q 函数强调学习智能体对其它智能体策略的适应性;利用高斯回 归建立环境状态转移模型和队友联合策略模型,实现即时回报和样本值函数的计算;基于样 本值函数提出联合状态-个体动作空间的 Q 值函数分离建模方法和 V 值函数高斯建模方法, 实现连续空间状态和动作的泛化;结合样本集动态调整方法,形成 MAS 环境下智能体跟踪 式学习算法;通过算法在典型连续空间协调控制问题 Multi-Cart-pole 的仿真实验表明,算法 能够在动力学模型和同伴策略未知的条件下,在较短时间内实现协作策略的学习,以及状态 空间的泛化,具有学习效率高、泛化能力强等特点. 关键词:连续状态空间;多智能体系统;基于模型的强化学习;高斯回归 中图分类号:TP181 Tracking Learning Based on Gaussian Regression for Multi-agent Systems in Continuous spaces Wei Haijun, Chen Xin (School of Information Science and Engineering, Central South University, Changsha 410083) Abstract: In the implementations of multi-agent systems, generalization is always viewed as one of the key issues before multi-agent reinforcement learning algorithms are applicable to continuous environments. The paper proposes an architecture of tracking learning based on Gaussian regression for MASs in continuous spaces. With a new Q value with reduced dimension defined, it is emphasized to entitle agents to learn strategy adapting to (or tracking) others’ behaviors. A probabilistic model of state transition and a partner’s motion strategy in the algorithm are constructed by using Gaussian regression, so that the instant reward and Q values w.r.t. samples can be computed real time. Based on Q values w.r.t. samples, a separation type Q-function modeling method for joint state single action space is developed, as well as V-function modeling, in order to generalize state and action spaces. Thus, with a kind of dynamic adjustment for sample set applied, a tracking learning algorithm for MAS is achieved. In the simulation of Multi-Cart-pole, which is a typical coordinated control problem, even if dynamics or partner’s motion strategies are unknown in priori, agent executing the algorithm is able to learn coordinated strategy, and realize generalization of state space as well.

文档评论(0)

文档分享 + 关注
实名认证
内容提供者

该用户很懒,什么也没介绍

1亿VIP精品文档

相关文档