  • 项目背景:分布式消息中间件
  • 需求分析:业务系统升级方案
  • 架构设计:搭建Zookeeper的分步式协作平台
  • 程序开发:基于Zookeeper的程序设计
  • 程序运行

1. 项目背景:分布式消息中间件




Zookeeper就可以做为 分步式消息中间件,来完成上面的说的业务需求。ZooKeeper是Hadoop家族的一款高性能的分布式协作的产品,是一个为分布式应用所设计的分布的、开源的协调服务,它主要是用来解决分布式应用中经常遇到的一些数据管理问题,简化分布式应用协调及其管理的难度,提供高性能的分布式服务。Zookeeper的安装和使用,请参考文章ZooKeeper伪分布式集群安装及使用。

2. 需求分析:业务系统升级方案

2.1 案例介绍

某大型软件公司,从事领域为供应链管理,主要业务包括了 采购管理、应付账款管理、应收账款管理、供应商反复管理、退货管理、销售管理、库存管理、电子商务、系统集成等。


随着业务的发展,客户对响应速度要求越来越高,通过数据库来共享数据的方式,已经达不到信息交换的要求,系统进行了第一次升级,通过企业服务总线(ESB)统一管理公司内部所有业务。通过WebServices发布服务,通过Message Queue实现业务功能的调度。

公司业务规模继续扩大,跨国收购了多家公司。业务系统从原来的一个机房的集中式部署,变成了全球性的多机房的分步式部署。这时,Message Queue已经不能满足多机房跨地域的业务系统的功能需求了,需要一种分步式的消息中间件解决方案,来代替原有消息中间件的服务。


2.2 功能需求
比如,计算利润表 (请不要纠结于公式的准确性)
当月利润 = 当月销售金额 - 当月采购金额 - 当月其他支出


3. 架构设计:搭建Zookeeper的分步式协作平台
  • 采购数据,为海量数据,基于Hadoop存储和分析。
  • 销售数据,为海量数据,基于Hadoop存储和分析。
  • 其他费用支出,为少量数据,基于文件或数据库存储和分析。

  • 2个独立的Hadoop集群
  • 2个独立的Java应用
  • 3个Zookeeper集群节点

  • Hadoop App1,Hadoop App2 是2个独立的Hadoop集群应用
  • Java App3,Java App4 是2个独立的Java应用
  • zk1,zk2,zk3是ZooKeeper集群的3个连接点
  • /queue,是znode的队列目录,假设队列长度为3
  • /queue/purchase,是znode队列中,1号排对者,由Hadoop App1提交,用于统计采购金额。
  • /queue/sell,是znode队列中,2号排对者,由Hadoop App2提交,用于统计销售金额。
  • /queue/other,是znode队列中,3号排对者,由Java App3提交,用于统计其他费用支出金额。
  • /queue/profit,当znode队列中满了,触发创建利润节点。
  • 当/qeueu/profit被创建后,app4被启动,所有zk的连接通知同步程序(红色线),队列已完成,所有程序结束。
  • 创建/queue/purchase,/queue/sell,/queue/other目录时,没有前后顺序,程序提交后,/queue目录下会生成对应该子目录
  • App1可以通过zk2提交,App2也可通过zk3提交。原则上,找最近路由最近的znode节点提交。
  • 每个应用不能重复提出,直到3个任务都提交,计算利润的任务才会被执行。
  • /queue/profit被创建后,zk的应用会监听到这个事件,通知应用,队列已完成!
这里的同步队列的架构更详细的设计思路,请参考文章 ZooKeeper实现分布式队列Queue

4. 程序开发:基于Zookeeper的程序设计
4.1 实验环境
  • 把zookeeper的完全分步式部署的3台服务器集群节点的,改为一台服务器上3个集群节点。
  • 把2个独立Hadoop集群,改为一个集群的2个独立的MapReduce任务。
  • Win7 64bit
  • JDK 1.6
  • Maven3
  • Juno Service Release 2
  • IP:
  • Linux Ubuntu 12.04 LTS 64bit
  • Java 1.6.0_29
  • Zookeeper: 3.4.5
  • IP:
  • 3个集群节点
  • Linux Ubuntu 12.04 LTS 64bit
  • Java 1.6.0_29
  • Hadoop: 1.0.3
  • IP:

4.2 实验数据
  • 采购数据,purchase.csv
  • 销售数据,sell.csv
  • 其他费用数据,other.csv

4.2.1 采购数据集
一共4列,分别对应 产品ID,产品数量,产品单价,采购日期。

  1,26,1168,2013-01-08
  2. 2,49,779,2013-02-12
  3. 3,80,850,2013-02-05
  4. 4,69,1585,2013-01-26
  5. 5,88,1052,2013-01-13
  6. 6,84,2363,2013-01-19
  7. 7,64,1410,2013-01-12
  8. 8,53,910,2013-01-11
  9. 9,21,1661,2013-01-19
  10. 10,53,2426,2013-02-18
  11. 11,64,2022,2013-01-07
  12. 12,36,2941,2013-01-28
  13. 13,99,3819,2013-01-19
  14. 14,64,2563,2013-02-16
  15. 15,91,752,2013-02-05
  16. 16,65,750,2013-02-04
  17. 17,19,2426,2013-02-23
  18. 18,19,724,2013-02-05
  19. 19,87,137,2013-01-25
  20. 20,86,2939,2013-01-14
  21. 21,92,159,2013-01-23
  22. 22,81,2331,2013-03-01
  23. 23,88,998,2013-01-20
  24. 24,38,102,2013-02-22
  25. 25,32,4813,2013-01-13
  26. 26,36,1671,2013-01-19

  27. //省略部分数据

4.2.2 销售数据集
一共4列,分别对应 产品ID,销售数量,销售单价,销售日期。

  1,14,1236,2013-01-14
  2. 2,19,808,2013-03-06
  3. 3,26,886,2013-02-23
  4. 4,23,1793,2013-02-09
  5. 5,27,1206,2013-01-21
  6. 6,27,2648,2013-01-30
  7. 7,22,1502,2013-01-19
  8. 8,20,1050,2013-01-18
  9. 9,13,1778,2013-01-30
  10. 10,20,2718,2013-03-14
  11. 11,22,2175,2013-01-12
  12. 12,16,3284,2013-02-12
  13. 13,30,4152,2013-01-30
  14. 14,22,2770,2013-03-11
  15. 15,28,778,2013-02-23
  16. 16,22,874,2013-02-22
  17. 17,12,2718,2013-03-22
  18. 18,12,747,2013-02-23
  19. 19,27,172,2013-02-07
  20. 20,27,3282,2013-01-22
  21. 21,28,224,2013-02-05
  22. 22,26,2613,2013-03-30
  23. 23,27,1147,2013-01-31
  24. 24,16,141,2013-03-20
  25. 25,15,5343,2013-01-21
  26. 26,16,1887,2013-01-30
  27. 27,12,2535,2013-01-12
  28. 28,16,469,2013-01-07
  29. 29,29,2395,2013-03-30
  30. 30,17,1549,2013-01-30
  31. 31,25,4173,2013-03-17

  32. //省略部分数据

4.2.3 其他费用数据集
一共2列,分别对应 发生日期,发生金额

  2013-01-02,552
  2. 2013-01-03,1092
  3. 2013-01-04,1794
  4. 2013-01-05,435
  5. 2013-01-06,960
  6. 2013-01-07,1066
  7. 2013-01-08,1354
  8. 2013-01-09,880
  9. 2013-01-10,1992
  10. 2013-01-11,931
  11. 2013-01-12,1209
  12. 2013-01-13,1491
  13. 2013-01-14,804
  14. 2013-01-15,480
  15. 2013-01-16,1891
  16. 2013-01-17,156
  17. 2013-01-18,1439
  18. 2013-01-19,1018
  19. 2013-01-20,1506
  20. 2013-01-21,1216
  21. 2013-01-22,2045
  22. 2013-01-23,400
  23. 2013-01-24,1795
  24. 2013-01-25,1977
  25. 2013-01-26,1002
  26. 2013-01-27,226
  27. 2013-01-28,1239
  28. 2013-01-29,702
  29. 2013-01-30,1396

  30. //省略部分数据

4.3 程序设计
  • 计算采购金额,Purchase.java
  • 计算销售金额,Sell.java
  • 计算其他费用金额,Other.java
  • 计算利润,Profit.java
  • Zookeeper的调度,ZookeeperJob.java

4.3.1 计算采购金额

  1. public class Purchase {

  2. public static final String HDFS = "hdfs://";
  3. public static final Pattern DELIMITER = Pattern.compile("[\t,]");

  4. public static class PurchaseMapper extends Mapper {

  5. private String month = "2013-01";
  6. private Text k = new Text(month);
  7. private IntWritable v = new IntWritable();
  8. private int money = 0;

  9. public void map(LongWritable key, Text values, Context context) throws IOException, InterruptedException {
  10. System.out.println(values.toString());
  11. String[] tokens = DELIMITER.split(values.toString());
  12. if (tokens[3].startsWith(month)) {// 1月的数据
  13. money = Integer.parseInt(tokens[1]) * Integer.parseInt(tokens[2]);//单价*数量
  14. v.set(money);
  15. context.write(k, v);
  16. }
  17. }
  18. }

  19. public static class PurchaseReducer extends Reducer {
  20. private IntWritable v = new IntWritable();
  21. private int money = 0;

  22. @Override
  23. public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
  24. for (IntWritable line : values) {
  25. // System.out.println(key.toString() + "\t" + line);
  26. money += line.get();
  27. }
  28. v.set(money);
  29. context.write(null, v);
  30. System.out.println("Output:" + key + "," + money);
  31. }

  32. }

  33. public static void run(Map path) throws IOException, InterruptedException, ClassNotFoundException {
  34. JobConf conf = config();
  35. String local_data = path.get("purchase");
  36. String input = path.get("input");
  37. String output = path.get("output");

  38. // 初始化purchase
  39. HdfsDAO hdfs = new HdfsDAO(HDFS, conf);
  40. hdfs.rmr(input);
  41. hdfs.mkdirs(input);
  42. hdfs.copyFile(local_data, input);

  43. Job job = new Job(conf);
  44. job.setJarByClass(Purchase.class);

  45. job.setOutputKeyClass(Text.class);
  46. job.setOutputValueClass(IntWritable.class);

  47. job.setMapperClass(PurchaseMapper.class);
  48. job.setReducerClass(PurchaseReducer.class);

  49. job.setInputFormatClass(TextInputFormat.class);
  50. job.setOutputFormatClass(TextOutputFormat.class);

  51. FileInputFormat.setInputPaths(job, new Path(input));
  52. FileOutputFormat.setOutputPath(job, new Path(output));

  53. job.waitForCompletion(true);
  54. }

  55. public static JobConf config() {// Hadoop集群的远程配置信息
  56. JobConf conf = new JobConf(Purchase.class);
  57. conf.setJobName("purchase");
  58. conf.addResource("classpath:/hadoop/core-site.xml");
  59. conf.addResource("classpath:/hadoop/hdfs-site.xml");
  60. conf.addResource("classpath:/hadoop/mapred-site.xml");
  61. return conf;
  62. }

  63. public static Map path(){
  64. Map path = new HashMap();
  65. path.put("purchase", "logfile/biz/purchase.csv");// 本地的数据文件
  66. path.put("input", HDFS + "/user/hdfs/biz/purchase");// HDFS的目录
  67. path.put("output", HDFS + "/user/hdfs/biz/purchase/output"); // 输出目录
  68. return path;
  69. }

  70. public static void main(String[] args) throws Exception {
  71. run(path());
  72. }

  73. }

4.3.2 计算销售金额

  1. public class Sell {

  2. public static final String HDFS = "hdfs://";
  3. public static final Pattern DELIMITER = Pattern.compile("[\t,]");

  4. public static class SellMapper extends Mapper {

  5. private String month = "2013-01";
  6. private Text k = new Text(month);
  7. private IntWritable v = new IntWritable();
  8. private int money = 0;

  9. public void map(LongWritable key, Text values, Context context) throws IOException, InterruptedException {
  10. System.out.println(values.toString());
  11. String[] tokens = DELIMITER.split(values.toString());
  12. if (tokens[3].startsWith(month)) {// 1月的数据
  13. money = Integer.parseInt(tokens[1]) * Integer.parseInt(tokens[2]);//单价*数量
  14. v.set(money);
  15. context.write(k, v);
  16. }
  17. }
  18. }

  19. public static class SellReducer extends Reducer {
  20. private IntWritable v = new IntWritable();
  21. private int money = 0;

  22. @Override
  23. public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
  24. for (IntWritable line : values) {
  25. // System.out.println(key.toString() + "\t" + line);
  26. money += line.get();
  27. }
  28. v.set(money);
  29. context.write(null, v);
  30. System.out.println("Output:" + key + "," + money);
  31. }

  32. }

  33. public static void run(Map path) throws IOException, InterruptedException, ClassNotFoundException {
  34. JobConf conf = config();
  35. String local_data = path.get("sell");
  36. String input = path.get("input");
  37. String output = path.get("output");

  38. // 初始化sell
  39. HdfsDAO hdfs = new HdfsDAO(HDFS, conf);
  40. hdfs.rmr(input);
  41. hdfs.mkdirs(input);
  42. hdfs.copyFile(local_data, input);

  43. Job job = new Job(conf);
  44. job.setJarByClass(Sell.class);

  45. job.setOutputKeyClass(Text.class);
  46. job.setOutputValueClass(IntWritable.class);

  47. job.setMapperClass(SellMapper.class);
  48. job.setReducerClass(SellReducer.class);

  49. job.setInputFormatClass(TextInputFormat.class);
  50. job.setOutputFormatClass(TextOutputFormat.class);

  51. FileInputFormat.setInputPaths(job, new Path(input));
  52. FileOutputFormat.setOutputPath(job, new Path(output));

  53. job.waitForCompletion(true);
  54. }

  55. public static JobConf config() {// Hadoop集群的远程配置信息
  56. JobConf conf = new JobConf(Purchase.class);
  57. conf.setJobName("purchase");
  58. conf.addResource("classpath:/hadoop/core-site.xml");
  59. conf.addResource("classpath:/hadoop/hdfs-site.xml");
  60. conf.addResource("classpath:/hadoop/mapred-site.xml");
  61. return conf;
  62. }

  63. public static Map path(){
  64. Map path = new HashMap();
  65. path.put("sell", "logfile/biz/sell.csv");// 本地的数据文件
  66. path.put("input", HDFS + "/user/hdfs/biz/sell");// HDFS的目录
  67. path.put("output", HDFS + "/user/hdfs/biz/sell/output"); // 输出目录
  68. return path;
  69. }

  70. public static void main(String[] args) throws Exception {
  71. run(path());
  72. }

  73. }

4.3.3 计算其他费用金额

  1. public class Other {

  2. public static String file = "logfile/biz/other.csv";
  3. public static final Pattern DELIMITER = Pattern.compile("[\t,]");
  4. private static String month = "2013-01";

  5. public static void main(String[] args) throws IOException {
  6. calcOther(file);
  7. }

  8. public static int calcOther(String file) throws IOException {
  9. int money = 0;
  10. BufferedReader br = new BufferedReader(new FileReader(new File(file)));

  11. String s = null;
  12. while ((s = br.readLine()) != null) {
  13. // System.out.println(s);
  14. String[] tokens = DELIMITER.split(s);
  15. if (tokens[0].startsWith(month)) {// 1月的数据
  16. money += Integer.parseInt(tokens[1]);
  17. }
  18. }
  19. br.close();

  20. System.out.println("Output:" + month + "," + money);
  21. return money;
  22. }
  23. }

4.3.4 计算利润

  1. public class Profit {

  2. public static void main(String[] args) throws Exception {
  3. profit();
  4. }

  5. public static void profit() throws Exception {
  6. int sell = getSell();
  7. int purchase = getPurchase();
  8. int other = getOther();
  9. int profit = sell - purchase - other;
  10. System.out.printf("profit = sell - purchase - other = %d - %d - %d = %d\n", sell, purchase, other, profit);
  11. }

  12. public static int getPurchase() throws Exception {
  13. HdfsDAO hdfs = new HdfsDAO(Purchase.HDFS, Purchase.config());
  14. return Integer.parseInt(hdfs.cat(Purchase.path().get("output") + "/part-r-00000").trim());
  15. }

  16. public static int getSell() throws Exception {
  17. HdfsDAO hdfs = new HdfsDAO(Sell.HDFS, Sell.config());
  18. return Integer.parseInt(hdfs.cat(Sell.path().get("output") + "/part-r-00000").trim());
  19. }

  20. public static int getOther() throws IOException {
  21. return Other.calcOther(Other.file);
  22. }

  23. }

4.3.5 Zookeeper调度

  1. public class ZooKeeperJob {

  2. final public static String QUEUE = "/queue";
  3. final public static String PROFIT = "/queue/profit";
  4. final public static String PURCHASE = "/queue/purchase";
  5. final public static String SELL = "/queue/sell";
  6. final public static String OTHER = "/queue/other";

  7. public static void main(String[] args) throws Exception {
  8. if (args.length == 0) {
  9. System.out.println("Please start a task:");
  10. } else {
  11. doAction(Integer.parseInt(args[0]));
  12. }
  13. }

  14. public static void doAction(int client) throws Exception {
  15. String host1 = "";
  16. String host2 = "";
  17. String host3 = "";

  18. ZooKeeper zk = null;
  19. switch (client) {
  20. case 1:
  21. zk = connection(host1);
  22. initQueue(zk);
  23. doPurchase(zk);
  24. break;
  25. case 2:
  26. zk = connection(host2);
  27. initQueue(zk);
  28. doSell(zk);
  29. break;
  30. case 3:
  31. zk = connection(host3);
  32. initQueue(zk);
  33. doOther(zk);
  34. break;
  35. }
  36. }

  37. // 创建一个与服务器的连接
  38. public static ZooKeeper connection(String host) throws IOException {
  39. ZooKeeper zk = new ZooKeeper(host, 60000, new Watcher() {
  40. // 监控所有被触发的事件
  41. public void process(WatchedEvent event) {
  42. if (event.getType() == Event.EventType.NodeCreated && event.getPath().equals(PROFIT)) {
  43. System.out.println("Queue has Completed!!!");
  44. }
  45. }
  46. });
  47. return zk;
  48. }

  49. public static void initQueue(ZooKeeper zk) throws KeeperException, InterruptedException {
  50. System.out.println("WATCH => " + PROFIT);
  51. zk.exists(PROFIT, true);

  52. if (zk.exists(QUEUE, false) == null) {
  53. System.out.println("create " + QUEUE);
  54. zk.create(QUEUE, QUEUE.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
  55. } else {
  56. System.out.println(QUEUE + " is exist!");
  57. }
  58. }

  59. public static void doPurchase(ZooKeeper zk) throws Exception {
  60. if (zk.exists(PURCHASE, false) == null) {

  61. Purchase.run(Purchase.path());

  62. System.out.println("create " + PURCHASE);
  63. zk.create(PURCHASE, PURCHASE.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
  64. } else {
  65. System.out.println(PURCHASE + " is exist!");
  66. }
  67. isCompleted(zk);
  68. }

  69. public static void doSell(ZooKeeper zk) throws Exception {
  70. if (zk.exists(SELL, false) == null) {

  71. Sell.run(Sell.path());

  72. System.out.println("create " + SELL);
  73. zk.create(SELL, SELL.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
  74. } else {
  75. System.out.println(SELL + " is exist!");
  76. }
  77. isCompleted(zk);
  78. }

  79. public static void doOther(ZooKeeper zk) throws Exception {
  80. if (zk.exists(OTHER, false) == null) {

  81. Other.calcOther(Other.file);

  82. System.out.println("create " + OTHER);
  83. zk.create(OTHER, OTHER.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
  84. } else {
  85. System.out.println(OTHER + " is exist!");
  86. }
  87. isCompleted(zk);
  88. }

  89. public static void isCompleted(ZooKeeper zk) throws Exception {
  90. int size = 3;
  91. List children = zk.getChildren(QUEUE, true);
  92. int length = children.size();

  93. System.out.println("Queue Complete:" + length + "/" + size);
  94. if (length >= size) {
  95. System.out.println("create " + PROFIT);
  96. Profit.profit();
  97. zk.create(PROFIT, PROFIT.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);

  98. for (String child : children) {// 清空节点
  99. zk.delete(QUEUE + "/" + child, -1);
  100. }
  101. }
  102. }
  103. }

5. 运行程序最后,我们运行整个的程序,包括3个部分。
  • zookeeper服务器
  • hadoop服务器
  • 分步式队列应用
5.1 启动zookeeper服务

  1. ~ cd toolkit/zookeeper345

  2. # 启动zk集群3个节点
  3. ~ bin/zkServer.sh start conf/zk1.cfg
  4. ~ bin/zkServer.sh start conf/zk2.cfg
  5. ~ bin/zkServer.sh start conf/zk3.cfg

  6. ~ jps
  7. 4234 QuorumPeerMain
  8. 5002 Jps
  9. 4275 QuorumPeerMain
  10. 4207 QuorumPeerMain


  1. # 查看zk1节点状态
  2. ~ bin/zkServer.sh status conf/zk1.cfg
  3. JMX enabled by default
  4. Using config: conf/zk1.cfg
  5. Mode: follower

  6. # 查看zk2节点状态,zk2为leader
  7. ~ bin/zkServer.sh status conf/zk2.cfg
  8. JMX enabled by default
  9. Using config: conf/zk2.cfg
  10. Mode: leader

  11. # 查看zk3节点状态
  12. ~ bin/zkServer.sh status conf/zk3.cfg
  13. JMX enabled by default
  14. Using config: conf/zk3.cfg
  15. Mode: follower


  1. ~ bin/zkCli.sh -server

  2. # 查看zk
  3. [zk: 0] ls /
  4. [queue, queue-fifo, zookeeper]

  5. # /queue路径无子目录
  6. [zk: 1] ls /queue
  7. []

5.2 启动Hadoop服务

  1. ~ hadoop/hadoop-1.0.3
  2. ~ bin/start-all.sh

  3. ~ jps
  4. 25979 JobTracker
  5. 26257 TaskTracker
  6. 25576 DataNode
  7. 25300 NameNode
  8. 12116 Jps
  9. 25875 SecondaryNameNode

5.3 启动分步式队列ZookeeperJob
5.3.1 启动统计采购数据程序,设置启动参数1

  1. WATCH => /queue/profit
  2. /queue is exist!
  3. Delete: hdfs://
  4. Create: hdfs://
  5. copy from: logfile/biz/purchase.csv to hdfs://
  6. Output:2013-01,9609887
  7. create /queue/purchase
  8. Queue Complete:1/3


  1. [zk: 3] ls /queue
  2. [purchase]

5.3.2 启动统计销售数据程序,设置启动参数2

  1. WATCH => /queue/profit
  2. /queue is exist!
  3. Delete: hdfs://
  4. Create: hdfs://
  5. copy from: logfile/biz/sell.csv to hdfs://
  6. Output:2013-01,2950315
  7. create /queue/sell
  8. Queue Complete:2/3

  1. [zk: 5] ls /queue
  2. [purchase, sell]

5.3.3 启动统计其他费用数据程序,设置启动参数3

  1. WATCH => /queue/profit
  2. /queue is exist!
  3. Output:2013-01,34193
  4. create /queue/other
  5. Queue Complete:3/3
  6. create /queue/profit
  7. cat: hdfs://
  8. 2950315

  9. cat: hdfs://
  10. 9609887

  11. Output:2013-01,34193
  12. profit = sell - purchase - other = 2950315 - 9609887 - 34193 = -6693765
  13. Queue has Completed!!!


  1. [zk: 6] ls /queue
  2. [profit]



0 0