ReinforcementLearning-TexasAMUniversity.pptVIP

  • 2
  • 0
  • 约2.08千字
  • 约 14页
  • 2016-09-07 发布于天津
  • 举报
ReinforcementLearning-TexasAamp;MUniversity.ppt

Reinforcement Learning Mitchell, Ch. 13 (see also Barto Sutton book on-line) Rationale Learning from experience Adaptive control Examples not explicitly labeled, delayed feedback Problem of credit assignment – which action(s) led to payoff? tradeoff short-term thinking (immediate reward) for long-term consequences Agent Model Transition function – T:SxA-S, environment Reward function R:SxA-real, payoff Stochastic but Markov Policy=decision function, p:S-A “rationality” – maximize long term expected reward Discounted long-term reward (convergent series) Alternatives: finite time horizon, un

文档评论(0)

1亿VIP精品文档

相关文档