Reinforcement Learning
来源:互联网 发布:深圳证券交易所软件 编辑:程序博客网 时间:2024/05/23 09:45
https://github.com/lyuwenyu/RL
Reinforcement Learning
1.
MDP( Markov Decision Process) :
(S, A, P, R, r) PI
S ( state)
A ( action )
r (discount)
R (reward)
PI (policy)
G (Return)
Bellman equation
State-value function v(s)
Action-value function q(s,a)
Optimal state-value function
Optimal action-value function
Optimal policy
2.
Model-based solution
Dynamic Programming
Value Iteration
Policy Iteration:
Policy evaluation
Policy improve (greedy)
3.
Model-free solution
Policy Evaluation
MC (Monte Carlo)
TD (Temporal Difference)
4.
on policy
off policy
SARSA
QLearning
off-policy: It is called an off-policy because the policy being learned can be different than the policy being executed
on-policy: it updates value functions strictly on the basis of the experience gained from executing some (possibly non-stationary) policy
-----------------------reference-----------------------------
1. https://www.youtube.com/watch?v=0g4j2k_Ggc4
2. http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html
3. http://www.algorithmdog.com/reinforcement-learning-value-function-approximation
4.
- Reinforcement Learning
- reinforcement learning
- Reinforcement Learning
- Reinforcement Learning
- Reinforcement Learning
- Reinforcement Learning Resource
- 增强学习 (reinforcement learning)
- reinforcement learning学习
- Topic笔记:reinforcement learning
- Reinforcement Learning 强化学习
- MIXER as Reinforcement Learning
- Reinforcement Learning (DQN) tutorial
- Reinforcement Learning学习总结
- 强化学习Reinforcement Learning
- Reinforcement learning (RL) ①
- 增强学习(Reinforcement Learning)
- Deep Reinforcement Learning 基础知识
- CS231N-14-Reinforcement Learning
- java网络编程及简单聊天
- python——文件、os模块、异常,连接mysql
- 高并发的解决方案
- bzoj3894: 文理分科
- 从模块化到组件化再到插件化
- Reinforcement Learning
- Jqgrid插件实现单元格编辑,以及弹出选择数据后赋值。
- 51nod 1082 与7无关的数【打表】
- [雅礼4-8]老魔杖 SG函数
- MIGO入库时【不可能为条目A999 GBB CN01 BSA 7920确立帐户】解决方案
- 浅入浅出TensorFlow 3
- sip状态响应码
- 【雅礼集训2017】Day2 棋盘游戏
- 动态规划练习一 26题