Importance Sampling in Reinforcement Learning
来源:互联网 发布:teackpad Windows 编辑:程序博客网 时间:2024/05/11 03:29
Thanks Sutton and Barto for their great work of Reinforcement Learning: An Introduction.
Almost all off-policy reinforcement learning methods utilize importance sampling, a general technique for estimating expected values under one distribution given samples from another. We apply it by weighting returns according to the relative probability of their trajectories occurring under the target and behavior policies, called the importance-sampling ratio.
Given a set of trajectory
Define
To estimate
When importance sampling is done as a simple average in this way it is called ordinary importance sampling. An important alternative is weighted importance sampling, which uses a weighted average, defined as:
The difference between the two kinds of importance sampling is expressed in their biases and variances. The ordinary importance-sampling estimator is unbiased whereas the weighted importance-sampling estimator is biased (the bias converges asymptotically to zero). On the other hand, the variance of the ordinary importance-sampling estimator is in general unbounded because the variance of the ratios can be unbounded, whereas in the weighted estimator the largest weight on any single return is one.
- Importance Sampling in Reinforcement Learning
- My Roadmap in Reinforcement Learning
- Sampling Importance Re-sampling Algorithm
- importance sampling(重要性采样)
- Policy Gradient Methods in Reinforcement Learning
- 161206 - Importance sampling 重要性采样
- 重要性采样(Importance Sampling)
- Reinforcement Learning
- reinforcement learning
- Reinforcement Learning
- Reinforcement Learning
- Reinforcement Learning
- DARLA: Improving Zero-Shot Transfer in Reinforcement Learning 阅读笔记
- 如何理解 重要性采样(importance sampling)
- Structured Importance Sampling of Environment Maps
- 如何理解 重要性采样(importance sampling)
- 关于重要性采样(Importance Sampling)
- Reinforcement Learning Resource
- 面对对象之封装继承多态(抽象)
- editplus初学前端之一
- 百度
- git使用方法---分支
- 十进制转换为二进制
- Importance Sampling in Reinforcement Learning
- 应用程序与Activity 3_1 Android应用程序
- 寻找数组中的第i小元素
- 数理逻辑1 -- 命题演算6
- Mybatis错误(一)Exception in thread "main" org.apache.ibatis.exceptions.PersistenceException:
- 从零开始搭建环境编写操作系统 AT&T GCC (一)搭建环境和测试环境
- 子类的构造方法必须继承父类的构造方法
- POJ 3268 Silver Cow Party 最短路—dijkstra算法的优化。
- 羊皮卷之四