c4 Internet Address - Some useful Program

来源:互联网 发布:软件工程设计招聘 编辑:程序博客网 时间:2024/06/16 11:08

SpamCheck

许多服务器要监视“垃圾邮件”,然后通知客户端它要访问的host是否是垃圾邮件。这种实时的黑洞列表要求尽可能快,并且负载很大,可能是百万级的。

解决这个问题要反应尽可能快,最好是有缓存,负载问题可以通过分布式服务器完成.可以用web server 来完成,SOAP,UDP,自定义协议等。实际上这种服务器可以DNS实现。

public static final String BLACKHOLE = "sbl.spamhaus.org";public static void main(String[] args) throws SocketException, UnknownHostException {for (String arg : args) {if (isSpammer(arg)) {System.out.println(arg + " is a known spammer.");} else {System.out.println(arg + " appears legitimate.");}}}private static boolean isSpammer(String arg) {try {InetAddress address = InetAddress.getByName(arg);byte[] quad = address.getAddress();String query = BLACKHOLE;for (byte octet : quad) {int unsignedByte = octet < 0 ? octet + 256 : octet;query = unsignedByte + "." + query;}System.out.println(InetAddress.getByName(query).getHostName());System.out.println(InetAddress.getByName(query).getHostAddress());return true;} catch (UnknownHostException e) {return false;}}

使用这种技术要注意对blackhole list和地址的维护。服务器被攻击,对所有请求都拒绝回应等问题要考虑到。


Processing Web Server Logfiles

205.160.186.76 unknown - [17/Jun/2013:22:53:58 -0500]                               "GET /bgs/greenbg.gif HTTP 1.0" 200 50

上面那条记录表示来着 205.160.186.76 的浏览器请求资源 /bgs/greenbg.gif ,并且成功请求到,资源大小是 50 bytes

public class Weblog {  public static void main(String[] args) {    try (FileInputStream fin =  new FileInputStream(args[0]);      Reader in = new InputStreamReader(fin);      BufferedReader bin = new BufferedReader(in);) {      for (String entry = bin.readLine();        entry != null;        entry = bin.readLine()) {        // separate out the IP address        int index = entry.indexOf(' ');        String ip = entry.substring(0, index);        String theRest = entry.substring(index);        // Ask DNS for the hostname and print it out        try {          InetAddress address = InetAddress.getByName(ip);          System.out.println(address.getHostName() + theRest);        } catch (UnknownHostException ex) {          System.err.println(entry);        }      }    } catch (IOException ex) {      System.out.println("Exception: " + ex);    }  }}

InetAddress会缓存结果,所以同样的ip地址,不会再次访问DNS。

但上面的程序可以改造一下,变得更快!因为上面的程序花了非常多的时间在“等待”DNS的反应结果。这个时候,用多线程正好解决该问题。一个线程读取log entry,读到的entry交给其他线程去执行。但要注意到,可能log entry有很多很多,那如果每条log entry都启动一个线程的话,那VM几下就会被干趴下,所以这里要用线程池。

public class LookupTask implements Callable<String> {  private String line;  public LookupTask(String line) {    this.line = line;  }  @Override  public String call() {    try {      // separate out the IP address      int index = line.indexOf(' ');      String address = line.substring(0, index);      String theRest = line.substring(index);      String hostname = InetAddress.getByName(address).getHostName();      return hostname + " " + theRest;    } catch (Exception ex) {      return line;    }  }}

// Requires Java 7 for try-with-resources and multi-catchpublic class PooledWeblog {  private final static int NUM_THREADS = 4;  public static void main(String[] args) throws IOException {    ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);    Queue<LogEntry> results = new LinkedList<LogEntry>();    try (BufferedReader in = new BufferedReader(      new InputStreamReader(new FileInputStream(args[0]), "UTF-8"));) {      for (String entry = in.readLine(); entry != null; entry = in.readLine()) {        LookupTask task = new LookupTask(entry);        Future<String> future = executor.submit(task);        LogEntry result = new LogEntry(entry, future);        results.add(result);      }    }    // Start printing the results. This blocks each time a result isn't ready.    for (LogEntry result : results) {      try {        System.out.println(result.future.get());      } catch (InterruptedException | ExecutionException ex) {        System.out.println(result.original);      }    }    executor.shutdown();  }  private static class LogEntry {    String original;    Future<String> future;    LogEntry(String original, Future<String> future) {     this.original = original;     this.future = future;    }  }}

不完全科学的统计,上述方法比第一种方法要快10-50倍!

但上面的程序还有一个设计上的downside!logfile可能会是很大很大,那queue就会很大,程序就会消耗很多很多内存!避免这个问题方法可以是,将output工作放在一个单独的线程中,和input共享一个queue,早先处理的entry可以先打印出来,不必等所有entry都放到queue后再去output。但这个会引起另一个问题,你需要一个单独的signal来告知output已经完成了,因为queue为空并不能保证output已经完成,最简单的办法是count input 的条数和output的条数一致!














0 0
原创粉丝点击