scrapy-culster集群搭建之kafka安装

来源:互联网 发布:wav格式播放器 mac 编辑:程序博客网 时间:2024/05/22 01:27

环境同上次zookeeper的安装环境一致就不累赘了。我们来下载kafka,这里我下载的是Scala 2.10 - kafka_2.10-0.10.2.0.tgz (asc, md5)版本的

wget https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz

完成后解压:

[root@shulaibao4 ~]# tar zxf kafka_2.11-0.10.2.0.tgz[root@shulaibao4 ~]# mv kafka_2.11-0.10.2.0  /usr/lib[root@shulaibao4 ~]# cd /usr/lib/kafka_2.11-0.10.2.0

现在这个单台机器的集群是可以启动起来的,但是我们的目的是搭建三台机器的集群,那我们来开始配置,打开conf文件下的server.properties

[root@shulaibao4 kafka_2.11-0.10.2.0]# vim config/server.properties # Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##    http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# see kafka.server.KafkaConfig for additional details and defaults############################# Server Basics ############################## The id of the broker. This must be set to a unique integer for each broker.broker.id=2                      # kafka的机器编号,host.name = 172.*.*.13         # 绑定ipport=9092                        # 默认端口9092# Switch to enable topic deletion or not, default value is falsedelete.topic.enable=true############################# Socket Server Settings ############################## The address the socket server listens on. It will get the value returned from # java.net.InetAddress.getCanonicalHostName() if not configured.#   FORMAT:#     listeners = listener_name://host_name:port#   EXAMPLE:#     listeners = PLAINTEXT://your.host.name:9092#listeners=PLAINTEXT://:9092# Hostname and port the broker will advertise to producers and consumers. If not set, # it uses the value for "listeners" if configured.  Otherwise, it will use the value# returned from java.net.InetAddress.getCanonicalHostName().#advertised.listeners=PLAINTEXT://your.host.name:9092# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL# The number of threads handling network requestsnum.network.threads=3# The number of threads doing disk I/Onum.io.threads=8# The send buffer (SO_SNDBUF) used by the socket serversocket.send.buffer.bytes=102400# The receive buffer (SO_RCVBUF) used by the socket serversocket.receive.buffer.bytes=102400# The maximum size of a request that the socket server will accept (protection against OOM)socket.request.max.bytes=104857600############################# Log Basics ############################## A comma seperated list of directories under which to store log fileslog.dirs=/tmp/kafka-logs     # kafka的日志目录,话题,偏移量等也存于此处# The default number of log partitions per topic. More partitions allow greater# parallelism for consumption, but this will also result in more files across# the brokers.num.partitions=3             # 设置复制数量,默认是1# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.# This value is recommended to be increased for installations with data dirs located in RAID array.num.recovery.threads.per.data.dir=1############################# Log Flush Policy ############################## Messages are immediately written to the filesystem but by default we only fsync() to sync# the OS cache lazily. The following configurations control the flush of data to disk.# There are a few important trade-offs here:#    1. Durability: Unflushed data may be lost if you are not using replication.#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.# The settings below allow one to configure the flush policy to flush data after a period of time or# every N messages (or both). This can be done globally and overridden on a per-topic basis.# The number of messages to accept before forcing a flush of data to disk#log.flush.interval.messages=10000# The maximum amount of time a message can sit in a log before we force a flush#log.flush.interval.ms=1000############################# Log Retention Policy ############################## A segment will be deleted whenever *either* of these criteria are met. Deletion always happens# from the end of the log.# The minimum age of a log file to be eligible for deletion due to agelog.retention.hours=168# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining# segments don't drop below log.retention.bytes. Functions independently of log.retention.hours.#log.retention.bytes=1073741824# The maximum size of a log segment file. When this size is reached a new log segment will be created.log.segment.bytes=1073741824# The interval at which log segments are checked to see if they can be deleted according# to the retention policieslog.retention.check.interval.ms=300000############################# Zookeeper ############################## Zookeeper connection string (see zookeeper docs for details).# This is a comma separated host:port pairs, each corresponding to a zk# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".# You can also append an optional chroot string to the urls to specify the# root directory for all kafka znodes.#zookeeper.connect=localhost:2181#zookeeper.connect=workstation1:2181zookeeper.connect=172.*.*.12:2181,172.*.*.13:2181,172.*.*.14:2181    # 连接zookeeper集群,# Timeout in ms for connecting to zookeeperzookeeper.connection.timeout.ms=6000

此配置文件配置好后把kafka文件scp到其他机器上,

[root@shulaibao4 kafka_2.11-0.10.2.0]# cd ..[root@shulaibao4 lib]# scp -r kafka_2.11-0.10.2.0/ root@172.*.*.14: /usr/lib......[root@shulaibao4 lib]# scp -r kafka_2.11-0.10.2.0/ root@172.*.*.12: /usr/lib......

在这两台机器上只需要修改conf目录下的server.properties文件即可,
第一, 修改 broker.id且互相不可重复,
第二, 修改 host.name,改为各自机器绑定的IP
每台机器做完这两步,这个小小的集群就搭建起来了,接下来我们让它运行起来
首先启动每台机器的zookeeper,(上篇文章已介绍,这里不在累赘)
然后启动我们的kafka集群,命令如下:

[root@shulaibao4 kafka_2.11-0.10.2.0]# bin/kafka-server-start.sh config/server.properties &[1] 12608[root@shulaibao4 kafka_2.11-0.10.2.0]# [2017-08-10 11:56:48,766] INFO KafkaConfig values:      … …    … …    … …

当我们看到如下输出则证明单台机器启动成功:

       … …[2017-08-10 11:56:49,930] INFO Kafka version : 0.10.2.0 (org.apache.kafka.common.utils.AppInfoParser)[2017-08-10 11:56:49,931] INFO Kafka commitId : 576d93a8dc0cf421 (org.apache.kafka.common.utils.AppInfoParser)[2017-08-10 11:56:49,931] INFO [Kafka Server 2], started (kafka.server.KafkaServer)

接下来我们验证下我们的集群是否可用, 首先创建一个 test“topic”,

[root@shulaibao4 kafka_2.11-0.10.2.0]# bin/kafka-topics.sh --create --zookeeper shulaibao3:2181 --replication-factor 1 --partitions 1 --topic testCreated topic "test".[root@shulaibao4 kafka_2.11-0.10.2.0]# 

然后在另一台机器上查看我们刚创建的 topic的详细信息

[root@shulaibao4 kafka_2.11-0.10.2.0]# bin/kafka-topics.sh --describe --zookeeper shulaibao4:2181 --topic testTopic:test  PartitionCount:1    ReplicationFactor:1 Configs:    Topic: test Partition: 0    Leader: 3   Replicas: 3 Isr: 3

查看集群中所有的 topic 列表

[root@shulaibao5 kafka_2.11-0.10.2.0]# bin/kafka-topics.sh --list --zookeeper shulaibao4:2181test[root@shulaibao5 kafka_2.11-0.10.2.0]# 

可以看到这个 topic已经在集群中, 接下来我们利用此 topic 来生产和消费,这里输入完成后敲回车键就可以了,当然也可以读取文件或者数据库将内容发送出去

[root@shulaibao5 kafka_2.11-0.10.2.0]# bin/kafka-console-producer.sh --broker-list shulaibao4:9092 --topic test This is a scrapy cluster

接着我们来消费刚才发送的消息:

[root@shulaibao3 kafka_2.11-0.10.2.0]# bin/kafka-console-consumer.sh --zookeeper shulaibao4:2181 --topic test --from-beginningUsing the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].This is a scrapy cluster

这里可以看到消息被消费了,只要这个消费窗口不关,有test的消息被生产,此窗口就会消费。
至此我们的kafka集群已经搭建完成并且可以正常工作了。
接下来我们看看scrapy-cluster集群的搭建。

如有疑问请加qq群:526855734

原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 小孩的睾丸睾丸碰肿了怎么办 怎么判断小孩子的睾丸没下来怎么办 怀孕39周腰酸屁股酸疼该怎么办 我儿子18岁睾丸筋鼓起来怎么办 去医院检查说精子跑的慢怎么办 多囊卵巢综合症引起屁股增大怎么办 蚊子咬了肿了挠破了流水怎么办 血糖高引发的睾丸一直烂怎么办? 被洪水淹过的猪后期怎么办 做睾丸阴囊彩超阴茎突然勃起怎么办 阴茎冠状沟皮肤感染总不愈合怎么办 不小心咬到孩子破皮了怎么办 小孩子不小心碰到脸黑了一块怎么办 眼睛不小心碰到了里面红了怎么办 八个月宝宝睾丸还沒掉下来怎么办 孩子背部皮肤有一块皮肤很脏怎么办 洗浴种心搓背老板不给发工资怎么办 半个月小鸡屁骨下垂眼睛紧闭怎么办 在学校走廊把老师撞倒了怎么办 裤衩给孩子买的有点肥怎么办 到交警队立案后医疗费没了怎么办 交警扣车车里的贵重东西没了怎么办 睾丸内囊肿割了又长怎么办 我把别人的卵子踢碎了怎么办 两岁宝宝不小心被猫抓出血怎么办 两岁宝宝小蛋蛋肿了怎么办 吃大胺片过敏蛋皮又痛又痒怎么办 刚出生二十天宝宝蛋蛋有疝气怎么办 宝宝八个月蛋蛋一个没掉下来怎么办 引产23天同房内射肚子疼怎么办l 房东禁止养宠物如果养了怎么办 圆通快递退回但没有签收记录怎么办 信用卡没有收到又退回去了怎么办 新疆不给邮寄快递被退回运费怎么办 淘宝上已付款还在想留言怎么办 b超显示肾结石但x光看不到怎么办 洗脚让洗脚妹摸射精了怎么办 在新疆塔城干活不给工资怎么办 挨打了屁股肿的又大又硬怎么办 李贞将军有关电影小腿肌腱疼怎么办 朋友老婆老是背地里说我坏话怎么办