程序博客网 > js 图片跟随鼠标移动

CS231n学习笔记--14. Reinforcement Learning

来源：互联网发布：js 图片跟随鼠标移动编辑：程序博客网时间：2024/06/02 02:03

1. What is Reinforcement Learning

概述：

举个栗子：

再举一个：

2. Markov Decision Process

Mathematical formulation of the RL problem
Markov property: Current state completely characterises the state of the world

**处理流程：**

The optimal policy π*

3. Q-learning

Definitions: Value function and Q-value function：

Bellman equation：

优化策略：

**Solving for the optimal policy: Q-learning**

举个栗子：Playing Atari Games

**Q-network Architecture**

**Training the Q-network: Experience Replay**

Deep Q-Learning with Experience Replay

4. Policy Gradients

Intuition：

Variance reduction：

Variance reduction: Baseline

How to choose the baseline?

A better baseline: Want to push up the probability of an action from a state, if this action was better than the **expected value of what we should get from that state**

**Actor-Critic Algorithm**

5. REINFORCE 的运用

5.1 Recurrent Attention Model (RAM)

效果示意图：

**5.2 AlphaGo**

6. Summary

Policy gradients: very general but suffer from high variance so requires a lot of samples.
Challenge: sample-efficiency
Q-learning: does not always work but when it works, usually more sample-efficient. Challenge: exploration
Guarantees:
Policy Gradients: Converges to a local minima of J(θ), often good enough!
Q-learning: Zero guarantees since you are approximating Bellman equation with a complicated function approximator

阅读全文

0 0

js 图片跟随鼠标移动

js 图片跟随鼠标移动

原创粉丝点击

热门问题 老师的惩罚人脸识别我在镇武司摸鱼那些年重生之率土为王我在大康的咸鱼生活盘龙之生命进化天生仙种凡人之先天五行春回大明朝姑娘不必设防，我是瞎子眼睛流血怎么办下身流血是怎么回事小便流血怎么回事怀孕下体流血女生下面流血怎么回事排卵期阴道流血耳朵流血怎么止血月经量多流血不止怎么办拉屎屁眼流血阴道老是流血小便阴道流血自然流产流血几天眼睛流血是怎么回事耳朵流血疼痛小孩耳朵流血怎么回事鼻子流血为什么怀孕下面流血怎么回事下面突然流血孕妇流血怎么办女性下体流血上环后阴道出血拉屎阴道出血阴道有血丝是怎么回事阴道出血有异味阴道出血症状大便后阴道出血大便阴道出血拉大便阴道出血大便的时候阴道出血大便时阴出血什么问题阴道出血了阴道不规则阴道长期出血便后阴道出血阴道里出血孕期下体出血下面出血怎么办耳朵流血水是怎么回事下体流血水月经流血块什么原因女生下面一直流血