- 24
- 0
- 约5.03千字
- 约 32页
- 2018-06-30 发布于四川
- 举报
中国象棋chinese—chess
How to Win a Chinese Chess Game Reinforcement Learning Cheng, Wen Ju Set Up General Guard Minister Rook Knight Cannon Pawn Training how long does it to take for a human? how long does it to take for a computer? Chess program, “KnightCap”, used TD to learn its evaluation function while playing on the Free Internet Chess Server (FICS, ), improved from a 1650 rating to a 2100 rating (the level of US Master, world champion are rating around 2900) in just 308 games and 3 days of play. Training to play a series of games in a self-play learning mode using temporal difference learning The goal is to learn some simple strategies piece values or weights Why Temporal Difference Learning the average branching factor for the game tree is usually around 30 the average game lasts around 100 ply the size of a game tree is 30100 Searching alpha-beta search 3 ply search vs 4 ply search horizon effect quiescence cutoff search Horizon Effect Evaluation Function feature property of the game feature evaluators Rook, Knight, Cannon , Minister, Guard, and Pawn weight: the value of a specific piece type feature function: f return the current player’s piece advantage on a scale from -1 to 1 evaluation function: Y Y = ∑k=1 to 7 wk * fk TD(λ) and Updating the Weights wi, t+1 = wi, t + a (Yt+1 – Yt)S k=1 to t l t-k? wiYk = wi, t + a (Yt+1 – Yt)(fi, t + l fi, t-1 + l 2fi, t-2 + … + l t-1fi, 1) = 0.01 learning rate –how quickly the weights can change = 0.01 feedback coefficient -how much to discount past values Features Table Example Final Reward loser if is a draw, the final reward is 0 if the board evaluation is negative, then the final reward is twice the board if the board evaluation is positive, then the final reward is -2 times the board evaluation winner if is a draw, the final reward is 0 if the board evaluation is negative, then the final reward is -2 times the board evaluation if the board evaluation is positive, then the final reward is twice the board evaluation Final Rewa
您可能关注的文档
最近下载
- 电能质量PPT课件.ppt
- 132_中药饮片炮制及生产管理.pptx VIP
- SimBank银行模拟教学平台实习指导书.pdf VIP
- 深度解析(2026)《JBT 12968-2025盾构机用变频调速三相异步电动机技术规范》.pptx VIP
- 统计学原理与实务.pdf VIP
- 冻干SOP(最新整理版).docx VIP
- DB11T 1213-2015 自来水单位产量能源消耗限额 .docx VIP
- (正式版)G-B∕T 43909-2024 叉车属具 安全要求.docx VIP
- 中国慢性乙型肝炎功能性(临床)治愈临床实践专家共识(2025)解读PPT课件.pptx VIP
- 监理概论教案.pdf VIP
原创力文档

文档评论(0)