中国象棋chinese—chess.pptVIP

  • 24
  • 0
  • 约5.03千字
  • 约 32页
  • 2018-06-30 发布于四川
  • 举报
中国象棋chinese—chess

How to Win a Chinese Chess Game Reinforcement Learning Cheng, Wen Ju Set Up General Guard Minister Rook Knight Cannon Pawn Training how long does it to take for a human? how long does it to take for a computer? Chess program, “KnightCap”, used TD to learn its evaluation function while playing on the Free Internet Chess Server (FICS, ), improved from a 1650 rating to a 2100 rating (the level of US Master, world champion are rating around 2900) in just 308 games and 3 days of play. Training to play a series of games in a self-play learning mode using temporal difference learning The goal is to learn some simple strategies piece values or weights Why Temporal Difference Learning the average branching factor for the game tree is usually around 30 the average game lasts around 100 ply the size of a game tree is 30100 Searching alpha-beta search 3 ply search vs 4 ply search horizon effect quiescence cutoff search Horizon Effect Evaluation Function feature property of the game feature evaluators Rook, Knight, Cannon , Minister, Guard, and Pawn weight: the value of a specific piece type feature function: f return the current player’s piece advantage on a scale from -1 to 1 evaluation function: Y Y = ∑k=1 to 7 wk * fk TD(λ) and Updating the Weights wi, t+1 = wi, t + a (Yt+1 – Yt)S k=1 to t l t-k? wiYk = wi, t + a (Yt+1 – Yt)(fi, t + l fi, t-1 + l 2fi, t-2 + … + l t-1fi, 1) = 0.01 learning rate –how quickly the weights can change = 0.01 feedback coefficient -how much to discount past values Features Table Example Final Reward loser if is a draw, the final reward is 0 if the board evaluation is negative, then the final reward is twice the board if the board evaluation is positive, then the final reward is -2 times the board evaluation winner if is a draw, the final reward is 0 if the board evaluation is negative, then the final reward is -2 times the board evaluation if the board evaluation is positive, then the final reward is twice the board evaluation Final Rewa

文档评论(0)

1亿VIP精品文档

相关文档