CS231n学习笔记--14. Reinforcement Learning
来源:互联网 发布:js 图片跟随鼠标移动 编辑:程序博客网 时间:2024/06/02 02:03
1. What is Reinforcement Learning
概述:
举个栗子:
再举一个:
2. Markov Decision Process
- Mathematical formulation of the RL problem
- Markov property: Current state completely characterises the state of the world
**处理流程:**
The optimal policy π*
3. Q-learning
Definitions: Value function and Q-value function:
Bellman equation:
优化策略:
**Solving for the optimal policy: Q-learning**
举个栗子:Playing Atari Games
**Q-network Architecture**
**Training the Q-network: Experience Replay**
Deep Q-Learning with Experience Replay
4. Policy Gradients
Intuition:
Variance reduction:
Variance reduction: Baseline
How to choose the baseline?
A better baseline: Want to push up the probability of an action from a state, if this action was better than the **expected value of what we should get from that state**
**Actor-Critic Algorithm**
5. REINFORCE 的运用
5.1 Recurrent Attention Model (RAM)
效果示意图:
**5.2 AlphaGo**
6. Summary
- Policy gradients: very general but suffer from high variance so requires a lot of samples.
Challenge: sample-efficiency - Q-learning: does not always work but when it works, usually more sample-efficient. Challenge: exploration
- Guarantees:
Policy Gradients: Converges to a local minima of J(θ), often good enough!
Q-learning: Zero guarantees since you are approximating Bellman equation with a complicated function approximator
阅读全文
0 0
- CS231n学习笔记--14. Reinforcement Learning
- CS231N-14-Reinforcement Learning
- Reinforcement Learning学习笔记(1)
- Reinforcement Learning学习笔记(2)
- Reinforcement Learning学习笔记(3)
- Reinforcement Learning学习笔记(一)综述
- Reinforcement Learning 学习笔记(三)DQN
- Topic笔记:reinforcement learning
- CS231n学习笔记--8.Deep Learning Software
- 增强学习 (reinforcement learning)
- reinforcement learning学习
- Reinforcement Learning 强化学习
- Reinforcement Learning学习总结
- 强化学习Reinforcement Learning
- 增强学习(Reinforcement Learning)
- 强化学习(Reinforcement Learning)
- DL学习笔记【22】增强学习(Reinforcement Learning)
- [增强学习][Reinforcement Learning]学习笔记与回顾-1
- springMVC 的工作原理和机制
- 75. Sort Colors
- VIP视频解析
- Android 透明状态栏
- Collection接口和迭代器
- CS231n学习笔记--14. Reinforcement Learning
- 约瑟夫问题
- Spring工程,测试类编写demo
- IE9下MP3音频倍速播放及调整音量存在延迟的可能原因
- HttpClient实现远程调用
- 使用jenkins完成参数化构建-集成git和ssh-未完待续
- HDOJ2072 单词数
- JS 里为什么会有 this
- 2017.11.6作业