强化学习系统及其基于可靠度最优的学习算法.PDFVIP

  • 3
  • 0
  • 约1.99万字
  • 约 8页
  • 2019-03-17 发布于天津
  • 举报

强化学习系统及其基于可靠度最优的学习算法.PDF

265 Vol.26,No.5 1997 10 Information and Control Oct ., 1997 0 俞星星 阎平凡 ( 100084) , , ., . , J- -. a , , 1 ( einforcement Learning) , : , . , , , [2] . [3] 1 . 1 . ,, , 1 (Immediate einforcement) (Associative einforcement Learning) , . ., ,. , (T emporal Credit Assignment) (Structural Credit Assignment) . (TD:T emporal Difference Method) [4, 5] , . Sutton TD , . [6] , ( - ) - Q Q Learning [7] ( - ) ., Actor Critic Learning , 2. [3] [ 810] .[ 11] DYNA-Q ;[ 12] , BP ;[ 13] - a 1996- 09- 18 0 ( ) 5 : 333 , ;[ 14] , ( ] , ,, . [ 15] , ( ] ( ] . ;

文档评论(0)

1亿VIP精品文档

相关文档