(十四)增强学习.pdfVIP

  • 4
  • 0
  • 约1.01万字
  • 约 6页
  • 2017-06-26 发布于河北
  • 举报
(十四)增强学习

澧炲己瀛︿範锛圧einforcement Learning and Control 锛 JerryLead csxulijie@ 鍦ㄤ箣鍓嶇殑璁ㄨ涓紝鎴戜滑鎬绘槸缁欏畾涓€涓牱鏈瑇 锛岀劧鍚庣粰鎴栬€呬笉缁檒abel y銆備箣鍚庡鏍锋湰杩涜 鎷熷悎銆佸垎绫汇€佽仛绫绘垨鑰呴檷缁寸瓑鎿嶄綔銆傜劧鑰屽浜庡緢澶氬簭鍒楀喅绛栨垨鑰呮帶鍒堕棶棰橈紝寰堥毦鏈夎繖涔堣鍒 鐨勬牱鏈€傛瘮濡傦紝鍥涜冻鏈哄櫒浜虹殑鎺у埗闂锛屽垰寮€濮嬮兘涓嶇煡閬撳簲璇ヨ鍏跺姩閭f潯鑵匡紝鍦ㄧЩ鍔ㄨ繃绋嬩腑锛 涔熶笉鐭ラ亾鎬庝箞璁╂満鍣ㄤ汉鑷姩鎵惧埌鍚堥€傜殑鍓嶈繘鏂瑰悜銆 鍙﹀濡傝璁捐涓€涓笅璞℃鐨 AI 锛屾瘡璧颁竴姝ュ疄闄呬笂涔熸槸涓€涓喅绛栬繃绋嬶紝铏界劧瀵逛簬绠€鍗曠殑 妫嬫湁 A* 鐨勫惎鍙戝紡鏂规硶锛屼絾鍦ㄥ眬鍔垮鏉傛椂锛屼粛鐒惰璁╂満鍣ㄥ悜鍚庨潰澶氳€冭檻鍑犳鍚庢墠鑳藉喅瀹氳蛋鍝 涓€姝ユ瘮杈冨ソ锛屽洜姝ら渶瑕佹洿濂界殑鍐崇瓥鏂规硶銆 瀵逛簬杩欑鎺у埗鍐崇瓥闂锛屾湁杩欎箞涓€绉嶈В鍐虫€濊矾銆傛垜浠璁′竴涓洖鎶ュ嚱鏁帮紙reward function锛夛紝 濡傛灉learning agent 锛堝涓婇潰鐨勫洓瓒虫満鍣ㄤ汉銆佽薄妫婣I 绋嬪簭锛夊湪鍐冲畾涓€姝ュ悗锛岃幏寰椾簡杈冨ソ鐨勭粨 鏋滐紝閭d箞鎴戜滑缁 agent 涓€浜涘洖鎶ワ紙姣斿鍥炴姤鍑芥暟缁撴灉涓烘锛夛紝寰楀埌杈冨樊鐨勭粨鏋滐紝閭d箞鍥炴姤鍑 鏁颁负璐熴€傛瘮濡傦紝鍥涜冻鏈哄櫒浜猴紝濡傛灉浠栧悜鍓嶈蛋浜嗕竴姝ワ紙鎺ヨ繎鐩爣锛夛紝閭d箞鍥炴姤鍑芥暟涓烘锛屽悗閫€ 涓鸿礋銆傚鏋滄垜浠兘澶熷姣忎竴姝ヨ繘琛岃瘎浠凤紝寰楀埌鐩稿簲鐨勫洖鎶ュ嚱鏁帮紝閭d箞灏卞ソ鍔炰簡锛屾垜浠彧闇€瑕 鎵惧埌涓€鏉″洖鎶ュ€兼渶澶х殑璺緞锛堟瘡姝ョ殑鍥炴姤涔嬪拰鏈€澶э級锛屽氨璁や负鏄渶浣崇殑璺緞銆 澧炲己瀛︿範鍦ㄥ緢澶氶鍩熷凡缁忚幏寰楁垚鍔熷簲鐢紝姣斿鑷姩鐩村崌鏈猴紝鏈哄櫒浜烘帶鍒讹紝鎵嬫満缃戠粶璺敱锛 甯傚満鍐崇瓥锛屽伐涓氭帶鍒讹紝楂樻晥缃戦〉绱㈠紩绛夈€ 鎺ヤ笅鏉ワ紝鍏堜粙缁嶄竴涓嬮┈灏旂澶喅绛栬繃绋嬶紙MDP锛孧arkov decision processes锛夈€ 1. 椹皵绉戝か鍐崇瓥杩囩▼ 涓€涓┈灏旂澶喅绛栬繃绋嬬敱涓€涓簲鍏冪粍鏋勬垚(S, A, *饊亙 +, 饊嬀, 饊亝) 肀狆€亷 飦 S 琛ㄧず鐘舵€侀泦锛坰tates 锛夈€傦紙姣斿锛屽湪鑷姩鐩村崌鏈虹郴缁熶腑锛岀洿鍗囨満褰撳墠浣嶇疆鍧愭爣缁勬垚鐘舵€ 闆嗭級 飦 A 琛ㄧず涓€缁勫姩浣滐紙actions 锛夈€傦紙姣斿锛屼娇鐢ㄦ帶鍒舵潌鎿嶇旱鐨勭洿鍗囨満椋炶鏂瑰悜锛岃鍏跺悜鍓嶏紝鍚 鍚庣瓑锛 飦 饊亙 鏄姸鎬佽浆绉绘鐜囥€係 涓殑涓€涓姸鎬佸埌鍙︿竴涓姸鎬佺殑杞彉锛岄渶瑕丄 鏉ュ弬涓庛€傪€亙 琛ㄧず鐨 肀狆€亷 肀狆€亷 鏄湪褰撳墠s 鈭 S鐘舵€佷笅锛岀粡杩嘺 鈭 A浣滅敤鍚庯紝浼氳浆绉诲埌鐨勫叾浠栫姸鎬佺殑姒傜巼鍒嗗竷鎯呭喌 锛堝綋鍓 鐘舵€佹墽琛宎 鍚庡彲鑳借烦杞埌寰堝鐘舵€侊級銆 飦 饊嬀 鈭 ,0, 1)鏄樆灏肩郴鏁帮紙discount factor 锛 飦 R: S 脳 A 鉄 鈩濓紝R 鏄洖鎶ュ嚱鏁帮紙reward function锛夛紝鍥炴姤鍑芥暟缁忓父鍐欎綔S 鐨勫嚱鏁帮紙鍙笌S 鏈夊叧锛夛紝杩欐牱鐨勮瘽锛孯 閲嶆柊鍐欎綔R: S 鉄 鈩 銆 MDP 鐨勫姩鎬佽繃绋嬪涓嬶細鏌愪釜agent 鐨勫垵濮嬬姸鎬佷负肀? 锛岀劧鍚庝粠A 涓寫閫変竴涓姩浣滒€亷0鎵ц锛 鎵ц鍚庯紝agent 鎸夝€亙 姒傜巼闅忔満杞Щ鍒颁簡涓嬩竴涓睜 鐘舵€侊紝肀 鈭 饊亙 銆傜劧鍚庡啀鎵ц涓€涓姩浣滒€亷

文档评论(0)

1亿VIP精品文档

相关文档