N-armed bandit problem

来源:互联网 发布:伍聚网络股票 编辑:程序博客网 时间:2024/05/19 19:12

expected reward Qk(a):

Qk(a)=R1+R2+...+Rkka

stationary problem: underlying reward probability distributions for each arm don’t change over time.

0 0
原创粉丝点击