NFQ文献中 参考文献的作用

来源:互联网 发布:mac系统升级安装失败 编辑:程序博客网 时间:2024/06/06 02:05

[BM95] Boyan and Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7. Morgan Kaufmann, 1995.

运用多层感知器表示价值函数,所存在的问题

[EPG05] D. Ernst and and L. Wehenkel P. Geurts. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503–556, 2005.

NFQ是其中’Fitted Q Iteration’的special realisation

[Gor95] G. J. Gordon. Stable function approximation in dynamic programming. In A. Prieditis and S. Russell, editors, Proceedings of the ICML, San Francisco, CA, 1995.

定值迭代算法fitted value iteration algorithm,NFQ基于此

[Lin92] L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293–321, 1992.

运用多层感知器表示价值函数的成功案例;

’experience replay‘ technique

[LP03] M. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107–1149, 2003.

倒立摆(5.1节)所需的样本,系统方程及参数;LSPI方法及其结果

[RB93] M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In H. Ruspini, editor, Proceedings of the IEEE International Conference on Neural Networks (ICNN), pages 586 – 591, San Francisco, 1993.

Rprop算法,一种用于批量学习的监督学习方法,训练Q函数

[Rie00] M. Riedmiller. Concepts and facilities of a neural reinforcement learning control architecture for technical process control. Journal of Neural Computing and Application, 8:323–338, 2000.

运用多层感知器表示价值函数的成功案例

[SB98] R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, Cambridge, MA, 1998.

爬山小车的模型;cartploe模型

[Tes92] G. Tesauro. Practical issues in temporal difference learning. Machine Learning, (8):257–277, 1992.

运用多层感知器表示价值函数的成功案例

阅读全文
0 0
原创粉丝点击