Trust Region Policy Optimization
来源:互联网 发布:好吃的白巧克力 知乎 编辑:程序博客网 时间:2024/05/22 13:33
https://arxiv.org/abs/1502.05477
Trust Region Policy Optimization
(Submitted on 19 Feb 2015 (v1), last revised 20 Apr 2017 (this version, v5))
We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.
Submission history
From: John Schulman [view email][v1] Thu, 19 Feb 2015 06:44:25 GMT (547kb,D)
[v2] Mon, 18 May 2015 14:56:50 GMT (540kb,D)
[v3] Mon, 8 Jun 2015 10:47:03 GMT (540kb,D)
[v4] Mon, 6 Jun 2016 01:00:57 GMT (541kb,D)
[v5] Thu, 20 Apr 2017 18:04:12 GMT (541kb,D)
0 0
- Trust Region Policy Optimization
- Trust region policy optimization笔记
- 读论文Trust Region Policy Optimization
- PR10.21:Trust Region Policy Optimization
- TRPO(trust region policy optimization)(2)
- TRPO(Trust Region Policy Optimization)(1)
- line search and trust region
- 信赖域(Trust Region)
- 信赖域(Trust Region)算法
- Proximal Policy Optimization Algorithms
- [Cloud Computing]Mechanisms: Platform Trust Policy
- line search and trust region----continue
- trust
- Improved Image Captioning via Policy Gradient optimization of SPIDEr
- hbase Region split policy 分区 分裂策略 算法
- Trust-Region with DogLeg method 信赖域和狗腿法的最优化求解
- optimization
- optimization
- 数据流中的中位数
- [无异常,才正常] 加载属性文件内容报:Cannot load JDBC driver class ${jdbc.driverClassName}'
- iOS学习笔记--自定义通讯录
- logback日志详解
- 刘汝佳《算法竞赛入门经典(第二版)》习题(三)
- Trust Region Policy Optimization
- Oracle基本操作六:子查询,rownum,rowid
- Redis Sentinel——安装部署
- 大数据学习--问题集锦(hadoop篇)--集群搭建
- 一个面试题的思考
- linux之环境变量的配置
- 读论文Trust Region Policy Optimization
- 腾讯十天Vue.js课程之六:组件
- 基于WebSocket的Android与服务端通信