Reinforcement Learning_By David Silver笔记二: Markov Decision Processes

来源:互联网 发布:js ajax 实例 编辑:程序博客网 时间:2024/05/22 05:23
  1. Markov Process
  2. Markov Reward Process




    直接求解的时间复杂度是O(N^3), 对于small MRPs,可使用直接计算的方法,对于large MRPs使用如下迭代法:动态规划,蒙特卡洛评估,时序差分学习

  3. Markov Decision Process (Markov reward process with decisions)

  4. a policy is a distribution over actions given states. GIven an MDP and policy, the state sequence is Markov process, the state and reward sequence is Markov reward process.

  5. state-value function of an MDP is the expected return starting from state and then following policy
  6. action-value function is the expected return starting from state, taking action and following policy
阅读全文
0 0
原创粉丝点击