Simulate Rate limit process based on webserver logs

来源:互联网 发布:ikon日本人气 知乎 编辑:程序博客网 时间:2024/06/05 03:25

题目

You are working as a backend engineer for alice.com. Recently, various users started abusing the company’s API server with a flood of requests, and your team has decided to limit the rate at which a user can access certain API endpoints.
The domain api.alice.com is serverd by an application server. To study the effectiveness of varioius rate-limiting algorithms with realistic traffic, you have collected server logfiles, in which each request to the API server is recorded. The structure of the access logfile is:

id user_ip timestamp http_request_url http_response_code user_client

Although many rate-limiting algorithms exist, you are interested in a sliding-window strategy, which limits the number of API calls to a maximum of 10 requests per second, per IP address. This means that for an arbitrary window of one-second duration, the API server responds to 10 or fewer API requests, per IP address.
Write a function rateLimiter that simulates the sliding window limiter described above. Specifically, the function accepts a input ArrayList<ArrayList<String>> logs, in which each ArrayList in the first dimension corresponding to a line of log file and each String in the second dimension corresponding to a field of that line of log. The function should return a list of integers representing the request_id of the requests that are rejected by the rate-limiter, e.g. [23, 30, 55].

Additional requirements
- The duration of the sliding-window is exactly one second, and includes both extremes of the interval.
- Requests that are rejected also count towards the limit of 10 per seconds.
- Requests from the IP address 11.22.33.44 should never be rate-limited, because this is an IP used by the internal Acme crawler that indexes the site.
- Requests to any URL started with /admin/ should never be rate-limited, because they correspond to the administrator pages of the Acme site.

Assumptions
- All entries in the input log will be chronologically ordered.
- There are no simultaneous requests.
- All fields described in the above log structure are guaranteed to be present, and have a valid value.

Analysis

对于这个问题,一眼看起可能要求很多难以处理。但是其实如果我们将题目进行抽象的话,就可以简化成如下问题:对于输入的一个二维ArrayList,找出其中所有ip相同,且在1秒内出现十次以上的的行,并返回出现的十次以后的行的id。考虑到题目说所有输入的行已经按时间进行了排序,所以我们只需要维持先进先出的顺序检查一个ip对应的所有行即可。这样的话使用一个Queue就可以解决。同时考虑到其中可能有很多不同的ip,那么用一个HashMap来保存{ip, queue}对就可以方便的解决这个问题。

具体的算法如下:
- 对于二维ArrayList中的每一行,在Hashmap中创建其ip相对应的Queue。
- 对于每一行,取出其timestamp。检查其对应的Queue,从中丢弃所有一秒前的请求。对于剩余的请求,检查此时剩余请求的数量。如果大于10,则当前请求会被拒绝。将其id保存。并将当前请求继续放入Queue的末尾(因为题目要求被拒绝的请求也算)。如果小于10,则只进行放入Queue的操作。
- 重复这一过程直到遍历完所有的输入。

时间/空间复杂度

logs中共有n个请求。

我们使用了一个HashMap<String, ArrayList<String>>来存储所有的请求。则最坏情况下(所有请求都被存入)空间复杂度为O(n)`。

对于每一个请求而言,其最多只会被放入和取出HashMap一次。因此空间复杂度为O(n)

Java实现如下:

import java.util.*;import java.math.BigDecimal;public class RateLimiter {    public static List<Integer> rateLimiter(ArrayList<ArrayList<String>> logs) {        List<Integer> rejectedRequestIds = new ArrayList<>();        if (logs == null || logs.size() == 0) {            return rejectedRequestIds;        }        if (logs.get(0) == null || logs.get(0).size() == 0) {            return rejectedRequestIds;        }        rejectedRequestIds = getInvalidRequest(logs);        return rejectedRequestIds;    }    pri static List<Integer> getInvalidRequest(ArrayList<ArrayList<String>> logs){        List<Integer> res = new ArrayList<>();        Map<String, Queue<ArrayList<String>>> map = new HashMap<>();        for (ArrayList<String> logLine : logs) {            // Skip the lines that don't controlled by rate limiting.            if (logLine.get(1).equals("11.22.33.44") || logLine.get(3).startsWith("/admin/")) {                continue;            }            if(map.containsKey(logLine.get(1))) {                Queue<ArrayList<String>> queue = map.get(logLine.get(1));                // Remove all requests that is more than 1 seconds away                BigDecimal currTime = new BigDecimal(logLine.get(2));                while (                    currTime.compareTo(                        new BigDecimal(queue.peek().get(2)).add(new BigDecimal("1"))                    ) > 0                ) {                    queue.poll();                }                // If there is still more than 10 requests within 1 seconds for this ip                // Current request is going to be rejected.                if( queue.size() >= 10) {                    res.add(Integer.parseInt(logLine.get(0)));                    queue.offer(logLine);                } else {                    queue.offer(logLine);                }            } else{                Queue<ArrayList<String>> queue = new LinkedList<ArrayList<String>>();                queue.offer(logLine);                map.put(logLine.get(1), queue);            }        }        return res;    }}
原创粉丝点击