Example 6.7 An Access-Control Queuing Task

来源:互联网 发布:sql查询字段包含汉字 编辑:程序博客网 时间:2024/06/05 03:48

This is a decision task involving access control to a set of n servers. Customers of four different priorities arrive at a single queue. If given access to a server, the customers pay a reward of 1, 2, 4, or 8, depending on their priority, with higher priority customers paying more. In each time step, the customer at the head of the queue is either accepted (assigned to one of the servers) or rejected (removed from the queue). In either case, on the next time step the next customer in the queue is considered. The queue never empties, and the proportion of (randomly distributed) high priority customers in the queue is h. Of course a customer can be served only if there is a free server. Each busy server becomes free with probability p on each time step.
Although we have just described them for definiteness, let us assume the statistics of arrivals and departures are unknown. The task is to decide on each step whether to accept or reject the next customer, on the basis of his priority and the number of free servers, so as to maximize long-term reward without discounting.
这是一系列决策任务。现有n=10个服务台,有四类等级的顾客在排队,服务费分别为1,2,4,8。在每个time step,队列队头的顾客被接受或被拒绝。队列不会空,总会有人在排队。h表示高级别的用户在队列中的比例,h=0.5就是各个等级的用户出现的概率相等。每个服务台在time step空闲的概率为p=0.06。
任务是基于空闲的服务台数量和客户等级,决定是否接受顾客,以最大化没有折扣的长期奖励。
使用R-Learning来解决。
这里写图片描述

设R-Learning的参数为:α=0.1,β=0.01,ϵ=0.1,初始的Q(s,a)和ρ为0。

天啊我放弃了。。

看了一下别人写的代码
https://github.com/rickyHong/reinforce-L-CASE-05/blob/master/chapter10/AccessControl.py

为啥要用tile coding呢。。
为啥呢
为啥呢

我直接设置Q值算出来的策略明显是不对的。。

原创粉丝点击