非确定模型.pptVIP

  • 4
  • 0
  • 约1.68千字
  • 约 11页
  • 2016-12-29 发布于北京
  • 举报
Initialize matrix Q as zero matrix For each episode: Select random initial state Do while not reach goal state Select one among all possible actions for the current state Using this possible action, consider to go to the next state Get maximum Q value of this next state based on all possible actions Compute Set the next state as the current state End Do End For * Evaluation only. Created with Aspose.Slides for .NET 3.5 Client Profile 5.2.0.0. Copyright 2004-2011 Aspose Pty Ltd. Evaluatio

文档评论(0)

1亿VIP精品文档

相关文档