Simulate Rate limit process based on webserver logs
来源:互联网 发布:ikon日本人气 知乎 编辑:程序博客网 时间:2024/06/05 03:25
题目
You are working as a backend engineer for
alice.com
. Recently, various users started abusing the company’s API server with a flood of requests, and your team has decided to limit the rate at which a user can access certain API endpoints.
The domainapi.alice.com
is serverd by an application server. To study the effectiveness of varioius rate-limiting algorithms with realistic traffic, you have collected server logfiles, in which each request to the API server is recorded. The structure of the access logfile is:
id user_ip timestamp http_request_url http_response_code user_client
Although many rate-limiting algorithms exist, you are interested in a sliding-window strategy, which limits the number of API calls to a maximum of 10 requests per second, per IP address. This means that for an arbitrary window of one-second duration, the API server responds to 10 or fewer API requests, per IP address.
Write a functionrateLimiter
that simulates the sliding window limiter described above. Specifically, the function accepts a inputArrayList<ArrayList<String>> logs
, in which each ArrayList in the first dimension corresponding to a line of log file and each String in the second dimension corresponding to a field of that line of log. The function should return a list of integers representing the request_id of the requests that are rejected by the rate-limiter, e.g.[23, 30, 55]
.Additional requirements
- The duration of the sliding-window is exactly one second, and includes both extremes of the interval.
- Requests that are rejected also count towards the limit of 10 per seconds.
- Requests from the IP address 11.22.33.44 should never be rate-limited, because this is an IP used by the internal Acme crawler that indexes the site.
- Requests to any URL started with/admin/
should never be rate-limited, because they correspond to the administrator pages of the Acme site.Assumptions
- All entries in the input log will be chronologically ordered.
- There are no simultaneous requests.
- All fields described in the above log structure are guaranteed to be present, and have a valid value.
Analysis
对于这个问题,一眼看起可能要求很多难以处理。但是其实如果我们将题目进行抽象的话,就可以简化成如下问题:对于输入的一个二维ArrayList,找出其中所有ip相同,且在1秒内出现十次以上的的行,并返回出现的十次以后的行的id。考虑到题目说所有输入的行已经按时间进行了排序,所以我们只需要维持先进先出的顺序检查一个ip对应的所有行即可。这样的话使用一个Queue
就可以解决。同时考虑到其中可能有很多不同的ip,那么用一个HashMap
来保存{ip, queue}
对就可以方便的解决这个问题。
具体的算法如下:
- 对于二维ArrayList中的每一行,在Hashmap中创建其ip相对应的Queue。
- 对于每一行,取出其timestamp。检查其对应的Queue,从中丢弃所有一秒前的请求。对于剩余的请求,检查此时剩余请求的数量。如果大于10,则当前请求会被拒绝。将其id保存。并将当前请求继续放入Queue的末尾(因为题目要求被拒绝的请求也算)。如果小于10,则只进行放入Queue的操作。
- 重复这一过程直到遍历完所有的输入。
时间/空间复杂度
设logs
中共有n个请求。
我们使用了一个HashMap<String, ArrayList<String>>来存储所有的请求。则最坏情况下(所有请求都被存入)空间复杂度为
O(n)`。
对于每一个请求而言,其最多只会被放入和取出HashMap一次。因此空间复杂度为O(n)
。
Java实现如下:
import java.util.*;import java.math.BigDecimal;public class RateLimiter { public static List<Integer> rateLimiter(ArrayList<ArrayList<String>> logs) { List<Integer> rejectedRequestIds = new ArrayList<>(); if (logs == null || logs.size() == 0) { return rejectedRequestIds; } if (logs.get(0) == null || logs.get(0).size() == 0) { return rejectedRequestIds; } rejectedRequestIds = getInvalidRequest(logs); return rejectedRequestIds; } pri static List<Integer> getInvalidRequest(ArrayList<ArrayList<String>> logs){ List<Integer> res = new ArrayList<>(); Map<String, Queue<ArrayList<String>>> map = new HashMap<>(); for (ArrayList<String> logLine : logs) { // Skip the lines that don't controlled by rate limiting. if (logLine.get(1).equals("11.22.33.44") || logLine.get(3).startsWith("/admin/")) { continue; } if(map.containsKey(logLine.get(1))) { Queue<ArrayList<String>> queue = map.get(logLine.get(1)); // Remove all requests that is more than 1 seconds away BigDecimal currTime = new BigDecimal(logLine.get(2)); while ( currTime.compareTo( new BigDecimal(queue.peek().get(2)).add(new BigDecimal("1")) ) > 0 ) { queue.poll(); } // If there is still more than 10 requests within 1 seconds for this ip // Current request is going to be rejected. if( queue.size() >= 10) { res.add(Integer.parseInt(logLine.get(0))); queue.offer(logLine); } else { queue.offer(logLine); } } else{ Queue<ArrayList<String>> queue = new LinkedList<ArrayList<String>>(); queue.offer(logLine); map.put(logLine.get(1), queue); } } return res; }}
- Simulate Rate limit process based on webserver logs
- Simulate Common Stochastic Process
- kill Process based on port number
- CEF rate-limit限速
- Twitter API Rate Limit
- Sequential Task Process based on Spring Event Framework
- Not saving crash log because we have reached the limit for logs to store on disk.解决办法
- 动态修改Xen netback 的rate limit
- 动态修改Xen netback 的rate limit
- line-rate 与traffic-limit 限速区别
- line-rate 与traffic-limit 限速区别
- Simulate a key press on Android
- <AFNI>Task based process
- Analysis of Process Multi-Thread Manangement Based on a simpilified Linux Kernel--myKernel
- Epileptic seizure prediction based on multivariate statistical process control of HRV
- process on
- Based Off Versus Based On
- nodejs : AN EXAMPLE on WEBSERVER
- vmware 6u3
- 三角形面积
- HTML5的一些关于兼容性问题
- list接口
- 【kali之安装配置】修改更新源sources.list,提高软件下载安装速度
- Simulate Rate limit process based on webserver logs
- windows快捷键完整版分享
- set接口
- Ubuntu 更新源
- jQuery实现单击和鼠标感应事件。
- 集合Map和泛型
- jQuery使用之(二)设置元素的样式
- HashMap 工作原理
- CentOs下如何安装Laravel