UCT算法的实现,,,.docx

  1. 1、本文档共8页,可阅读全部内容。
  2. 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
  3. 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  4. 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
UCT算法的实现,,,

UCT算法的实现在网上,实在很难找到一篇全面一点的介绍mc和uct的论文,即使是E文的东西,也因为专业性实在太强,导致很难读懂(only for me),偶尔看到crazystone网站上的一点东西,只好免强啃下来,E文水平实在太差,感兴趣的朋友只好对照E文啃了!{UCT?for Monte Carlo computer goA Monte Carlo (MC) go program plays random games and easily evaluates the terminal position after two passes using Chinese rules. A MC program searches for moves that have high win rates, calculated from playing out at least a few hundred random games.一个mc围棋程序随机地下棋,而且根据中国规则可以很容易地对双方结束后的态势进行估值。Mc程序经过计算无数个棋局搜索那个取得最高胜率的走法。A basic MC program would simply collect statistics for all children of the root node, but MC evaluation of these moves will not converge to the best move. It is therefore necessary to do a deeper search.一个最基本的mc程序会简单地为第一个根节点统计数值。但是这些着法的mc估值不会趋向于一个最佳着法,因此需要做更深的搜索。The UCT-method (which stands for Upper Confidence bounds applied to Trees) is a very natural extension to MC-search, where for each played game the first moves are selected by searching a tree which is grown in memory, and as soon as a terminal node is found a new move/child is added to the tree and the rest of the game is played randomly. The evaluation of the finished random game is then used to update the statistics of all moves in the tree that were part of that game.UCT算法(树图置信)是对mc搜索自然的扩展,对每一盘棋,通过搜索一个在内存中生长的树来确定最初的着法选择。当一个枝端节点出现时,再生长一个新的枝节点,余下的棋自由走下去。利用棋局最后的估值更新对局树中所有走法的统计数据,而这些走法仅是棋局的一部分。The UCT-method solves the problem of selecting moves in the tree such that important moves are searched more often than moves that appears to be bad. One way of doing this would be to select the node with the highest winrate 50% of the time and select a random move otherwise.UCT算法解决了一个问题,就是在树中选择走法,重要的走法比那些看起来差点的走法更多地被。一个办法就是选择最高胜率高过50%的,或者随机走法。UCT also selects the best move most of the time but also explores other moves in a more sophisticated way. It does this by adding a number to the winrate for each candidate move that goes down every time the node has been visited. But this number also goes up a little every time the parent was visited an

文档评论(0)

xcs88858 + 关注
实名认证
内容提供者

该用户很懒,什么也没介绍

版权声明书
用户编号:8130065136000003

1亿VIP精品文档

相关文档