Typical Exploration Strategies in Model-free Policy Search
来源:互联网 发布:手机淘宝怎么清除缓存 编辑:程序博客网 时间:2024/06/05 08:29
Thanks J. Peters et al for their great work of A Survey for Policy Search in Robotics.
The exploration strategy is used to generate new trajectory samples
Many model-free policy search approaches update the exploration distribution and , hence, the covariance of the Gaussian policy. Typically, a large exploration rate is used in the beginning of learning which is then gradually decreased to fine tune the policy parameters.
Action Space vs Parameter Space
In action space. we can simply add an exploration noise
Exploration in parameter space perturb the paramter vector
Many approaches can be formulized with the concept of an upper-level policy
Now we use the paramter vector
Episode-based vs Step-based
Step-based exploration use different exploration noise at each time step and can either in action space or in paramter space. Step-based exploration can be problematic as it might produce action sequences which are not reproducible by noise free control law.
Episode-base exploration use exploration noise only at the beginning of the episode, which leads to an exploration in parameter space. Episode-based exploration might produce more reliable policy updates.
Uncorrelated vs Correlated
As most policies are represented as Gaussian distributions, uncorrelated exploration noise is obtained by using a diagonal covariance matrix. It is also usable to achieve correlated exploration by maintaining a full representation of the covariance matrix.
Exploration in action space typically use a diagnoal covariance matrix. In paramter space, many approaches can be used to update the full covariance matrix of the Gaussian policy. Using the full covariance matrx often resultes in a considerably increased learning speed.
- Typical Exploration Strategies in Model-free Policy Search
- Typical Policy Evaluation Strategies in Model-free Policy Search
- Typical Policy Representation in Policy Search Methods
- A Policy Update Strategy in Model-free Policy Search: Policy Gradient
- A Policy Update Strategy in Model-free Policy Search: Expectation-Maximization
- <GPS> Guided Policy Search
- strategies
- Steps in a Typical HTTP Client Application
- Typical memory usage for objects in java
- Security policy in .Net
- Strategies for Self Introduction in an Interview
- new balance 574In various strategies
- In Defense of Color-based Model-free Tracking (CVPR'15) 代码运行
- CVPR 2015 In Defense of Color-based Model-free Tracking 阅读笔记
- Free Search & Replace Master
- Papers-policy uncertainty & Gravity Model of Trade
- Reinforcement Learning: Model-free control
- R&D Manager's activities in a typical day
- 全国各省市最低工资sql
- CentOS 7 安装FFTW3
- Java代理(三) JDK动态代理
- 创建型模式—原型模式
- IO之File
- Typical Exploration Strategies in Model-free Policy Search
- 主机字节序
- 网站防止注入入侵的一些有效代码和方法
- javaScript 写一个简单的面向对象程序
- 支付牌照简介
- Android设计模式之(6)----策略模式
- AndroidStudio文件夹结构视图讲解
- Codeforces839D Winter is here (数论:容斥原理)
- 文章标题