A Brief Note about Boltzmann/Softmax Exploration Strategy

来源:互联网 发布:慧讯软件 怎么样 编辑:程序博客网 时间:2024/06/04 19:06

One method that is often used in combination with the RL algorithms is the Beltzmann or softmax exploration strategy.
The action selection strategy is still random, but selection probabilities are weighted by their relative Q-values. This makes it more likely for the agent to choose good actions, whereas two actions that have similar Q-values will have almost the same probability to get selected. Its general form is

P(a)=eQ(s,a)TieQ(s,ai)T

in which P(a) is the probability of selecting action a and T is the temperature parameter. Higher values of T will move the selection more towards a purely random strategy and lower values will move to a fully greedy strategy.

0 0
原创粉丝点击