A Brief Note about Boltzmann/Softmax Exploration Strategy

来源：互联网发布：慧讯软件怎么样编辑：程序博客网时间：2024/06/04 19:06

One method that is often used in combination with the RL algorithms is the Beltzmann or softmax exploration strategy.
The action selection strategy is still random, but selection probabilities are weighted by their relative Q-values. This makes it more likely for the agent to choose good actions, whereas two actions that have similar Q-values will have almost the same probability to get selected. Its general form is

P (a) = e Q ( s , a ) T \sum i e Q ( s , a i ) T

in which

P(a) is the probability of selecting action

a and

T is the temperature parameter. Higher values of

T will move the selection more towards a purely random strategy and lower values will move to a fully greedy strategy.

0 0

A Brief Note about Boltzmann/Softmax Exploration Strategy
brief note on Makefile
A brief explanation about the resampling wheel in CS373 PROGRAMMING A ROBOTIC CAR
Reinforcement Learning in Continuous State and Action Spaces: A Brief Note
Continuous Multi-Step TD, Eligibility Traces and TD(λ): A brief note
Note 07/08/01 a mistake about ref
A note about some errors which cause by android.R
A note about how to write test plan
浅谈熵和打升级 (A brief talk about entropy and Sheng ji)
A brief advice
UVA10558- A Brief Gerrymander
some note about js
[note]others about java
Study Note: About CNN
Describe in brief about oracle database tuning.
A Tutorial on Restricted Boltzmann Machines
A FIRST EXPLORATION OF SOLRCLOUD
Caffe Knowledge about Softmax and fine tuning
lua学习笔记15：table数组逆序
MeanShift运动目标跟踪 matlab程序
POJ 2774 Long Long Message <后缀数组（DC3）>
值得推荐的C/C++框架和库 (真的很强大)
网络建设中，这些方法可以讨好你的客户
A Brief Note about Boltzmann/Softmax Exploration Strategy
Myeclipse 2016 Mac版破解
JavaScript学习笔记18-switch语句
IDbConnection 正确的链接关闭与打开
Mysql统计同一字段不同值的个数
架构漫谈（七）：不要空设架构师这个职位，给他实权
Java的内存机制
linux basename命令学习
Rviz可视化交互之Maker（一）