Storm处理流程, 基本参数配置
来源:互联网 发布:python csv 编辑:程序博客网 时间:2024/06/05 12:33
Storm处理流程, 基本参数配置
配置选项名称
配置选项作用
topology.max.task.parallelism
每个Topology运行时最大的executor数目
topology.workers
每个Topology运行时的worker的默认数目,若在代码中设置,则此选项值被覆盖
storm.zookeeper.servers
zookeeper集群的节点列表
storm.local.dir
Storm用于存储jar包和临时文件的本地存储目录
storm.zookeeper.root
Storm在zookeeper集群中的根目录,默认是“/”
ui.port
Storm集群的UI地址端口号,默认是8080
nimbus.host:
Nimbus节点的host
supervisor.slots.ports
Supervisor 节点的worker占位槽,集群中的所有Topology公用这些槽位数,即使提交时设置了较大数值的槽位数,系统也会按照当前集群中实际剩余的槽位数来 进行分配,当所有的槽位数都分配完时,新提交的Topology只能等待,系统会一直监测是否有空余的槽位空出来,如果有,就再次给新提交的 Topology分配
supervisor.worker.timeout.secs
Worker的超时时间,单位为秒,超时后,Storm认为当前worker进程死掉,会重新分配其运行着的task任务
drpc.servers
在使用drpc服务时,drpc server的服务器列表
drpc.port
在使用drpc服务时,drpc server的服务端口
本地模式下, 基本并发度控制
conf.setMaxTaskParallelism(5); 本地模式下一个组件能够运行的最大线程数
builder.setSpout("spout", new RandomSentenceSpout(), 10); 最后的参数parallelism_hint 表示executor的数目,每个作为一个thread在work下工作, 但是如果超过setMaxTaskParallelism定义的上限,则使用setMaxTaskParallelism设置的TOPOLOGY_MAX_TASK_PARALLELISM
builder.setSpout("spout", new RandomSentenceSpout(), 5).setNumTasks(2); ,task的数目,默认和executor是1:1 的关系,就是每个task运行在一个物理线程上,
在这里设置的是taskNum为2,executor 是5,表示RandomSentenceSpout创建2次,实际只有两个2个executor, executor不能超过NumTask
builder.setSpout("spout", new RandomSentenceSpout(), 2).setNumTasks(5);
在这里设置的是taskNum为5,executor 是2, 表示RandomSentenceSpout创建5次,2个executor在两个物理线程上执行, 每个executor执行1/2的任务
这么写感觉意义都不大, 只是个人为了理解storm executor task概念, 在0.8以后,几个executor有可能是共用一个物理线程,由上面测试能看出。
突然想起这个其实还是有好处的,因为在storm中 TaskNum是静态的, executor是动态的, 比如tasknum是5,exector是2,这时候是在两个物理线程执行, 如果我们将executor改成3, 这时会变成在3个物理线程上执行,提高了并发性. 物理线程公式应该Min(executor, tasknum), 这个未在任何文档上见过,个人的一个推断.
动态调整参数
# Reconfigure the topology "mytopology" to use 5 worker processes,# the spout "blue-spout" to use 3 executors and# the bolt "yellow-bolt" to use 10 executors.$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
builder.setBolt("split", new SplitSentence(), 8).setNumTasks(1).shuffleGrouping("spout"); 这里和上面一样,会负载均衡地放入一个线程中运行
conf.setDebug(true); //
conf.setMaxSpoutPending(2); // 这个设置一个spout task上面最多有多少个没有处理(ack/fail)的tuple,防止tuple队列过大, 只对可靠任务起作用
conf.setMessageTimeoutSecs(1); // 消息处理延时, 就是消息超过延时后, emit发射源会认为是fail , storm默认是30秒,如果实现的为Irichbolt接口,没有ack和ack延时都会触发,这个时间过短的话,如果自定义重发,bolt可能会多处理,tuple在发射过程中, 但是还没有到达bolt, 但是已经延时了,emit发射源会认为已经失败了,但是bolt还是收到这个tuple, 所以storm引入了事务拓扑,0.8以后叫trident. 如果实现的为IBaseBolt,则只会在延时情况下触发, 默认会调用ack,但是这个ack如果有再次发射, 这个ack就会自动锚定了.
根据具体业务需求选择合适的Bolt
conf.setNumAckers(2); // 消息处理的acker数量.默认1,可以根据实际处理情况调大
真实环境
conf.setNumWorkers(5); // 设置工作进程 , 如果不添加端口, 默认会是4个worker进程
需要在storm.yaml下添加端口
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
- 6704
每个worker使用一个端口.
在uI窗口是spout bolt acker几个的累加.
storm.yaml参数参考
java.library.path:"/usr/local/lib:/opt/local/lib:/usr/lib" ### storm.* configs are general configurations # the local dir is where jars are kept storm.local.dir: "storm-local" storm.zookeeper.servers: - "localhost" storm.zookeeper.port: 2181 storm.zookeeper.root: "/storm" storm.zookeeper.session.timeout: 20000 storm.zookeeper.connection.timeout: 15000 storm.zookeeper.retry.times: 5 storm.zookeeper.retry.interval: 1000 storm.zookeeper.retry.intervalceiling.millis: 30000 storm.cluster.mode: "distributed" # can be distributed or local storm.local.mode.zmq: false storm.thrift.transport: "backtype.storm.security.auth.SimpleTransportPlugin" storm.messaging.transport: "backtype.storm.messaging.netty.Context" storm.meta.serialization.delegate: "backtype.storm.serialization.DefaultSerializationDelegate" ### nimbus.* configs are for the master nimbus.host: "localhost" nimbus.thrift.port: 6627 nimbus.thrift.max_buffer_size: 1048576 nimbus.childopts: "-Xmx1024m" nimbus.task.timeout.secs: 30 nimbus.supervisor.timeout.secs: 60 nimbus.monitor.freq.secs: 10 nimbus.cleanup.inbox.freq.secs: 600 nimbus.inbox.jar.expiration.secs: 3600 nimbus.task.launch.secs: 120 nimbus.reassign: true nimbus.file.copy.expiration.secs: 600 nimbus.topology.validator: "backtype.storm.nimbus.DefaultTopologyValidator" ### ui.* configs are for the master ui.port: 8080 ui.childopts: "-Xmx768m" logviewer.port: 8000 logviewer.childopts: "-Xmx128m" logviewer.appender.name: "A1" drpc.port: 3772 drpc.worker.threads: 64 drpc.queue.size: 128 drpc.invocations.port: 3773 drpc.request.timeout.secs: 600 drpc.childopts: "-Xmx768m" transactional.zookeeper.root: "/transactional" transactional.zookeeper.servers: null transactional.zookeeper.port: null ### supervisor.* configs are for node supervisors # Define the amount of workers that can be run on this machine. Each worker is assigned a port to use for communication supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 supervisor.childopts: "-Xmx256m" #how long supervisor will wait to ensure that a worker process is started supervisor.worker.start.timeout.secs: 120 #how long between heartbeats until supervisor considers that worker dead and tries to restart it supervisor.worker.timeout.secs: 30 #how frequently the supervisor checks on the status of the processes it's monitoring and restarts if necessary supervisor.monitor.frequency.secs: 3 #how frequently the supervisor heartbeats to the cluster state (for nimbus) supervisor.heartbeat.frequency.secs: 5 supervisor.enable: true ### worker.* configs are for task workers worker.childopts: "-Xmx768m" worker.heartbeat.frequency.secs: 1 # control how many worker receiver threads we need per worker topology.worker.receiver.thread.count: 1 task.heartbeat.frequency.secs: 3 task.refresh.poll.secs: 10 zmq.threads: 1 zmq.linger.millis: 5000 zmq.hwm: 0 storm.messaging.netty.server_worker_threads: 1 storm.messaging.netty.client_worker_threads: 1 storm.messaging.netty.buffer_size: 5242880 #5MB buffer # Since nimbus.task.launch.secs and supervisor.worker.start.timeout.secs are 120, other workers should also wait at least that long before giving up on connecting to the other worker. storm.messaging.netty.max_retries: 300 storm.messaging.netty.max_wait_ms: 1000 storm.messaging.netty.min_wait_ms: 100 # If the Netty messaging layer is busy(netty internal buffer not writable), the Netty client will try to batch message as more as possible up to the size of storm.messaging.netty.transfer.batch.size bytes, otherwise it will try to flush message as soon as possible to reduce latency. storm.messaging.netty.transfer.batch.size: 262144 # We check with this interval that whether the Netty channel is writable and try to write pending messages if it is. storm.messaging.netty.flush.check.interval.ms: 10 ### topology.* configs are for specific executing storms topology.enable.message.timeouts: true topology.debug: false topology.workers: 1 topology.acker.executors: null topology.tasks: null # maximum amount of time a message has to complete before it's considered failed topology.message.timeout.secs: 30 topology.multilang.serializer: "backtype.storm.multilang.JsonSerializer" topology.skip.missing.kryo.registrations: false topology.max.task.parallelism: null topology.max.spout.pending: null topology.state.synchronization.timeout.secs: 60 topology.stats.sample.rate: 0.05 topology.builtin.metrics.bucket.size.secs: 60 topology.fall.back.on.java.serialization: true topology.worker.childopts: null topology.executor.receive.buffer.size: 1024 #batched topology.executor.send.buffer.size: 1024 #individual messages topology.receiver.buffer.size: 8 # setting it too high causes a lot of problems (heartbeat thread gets starved, throughput plummets) topology.transfer.buffer.size: 1024 # batched topology.tick.tuple.freq.secs: null topology.worker.shared.thread.pool.size: 4 topology.disruptor.wait.strategy: "com.lmax.disruptor.BlockingWaitStrategy" topology.spout.wait.strategy: "backtype.storm.spout.SleepSpoutWaitStrategy" topology.sleep.spout.wait.strategy.time.ms: 1 topology.error.throttle.interval.secs: 10 topology.max.error.report.per.interval: 5 topology.kryo.factory: "backtype.storm.serialization.DefaultKryoFactory" topology.tuple.serializer: "backtype.storm.serialization.types.ListDelegateSerializer" topology.trident.batch.emit.interval.millis: 500 topology.classpath: null topology.environment: null dev.zookeeper.path: "/tmp/dev-storm-zookeeper"
storm内默认参数
- Storm处理流程, 基本参数配置
- Storm处理流程, 基本参数配置
- Storm处理流程, 基本参数配置
- Storm处理流程, 基本参数配置
- vim的基本参数配置
- rman 基本参数配置
- Google Analytics基本参数配置
- mongo入门--基本参数配置
- elasticsearch基本参数配置
- storm读书笔记---storm运行流程
- storm流程——storm
- 【 Tomcat 】tomcat8.0 基本参数调优配置
- storm配置
- Apache Storm 编程入门基础(六):Storm 并行处理的理解和配置
- storm 执行流程
- Storm 流程分析
- storm任务提交流程
- storm开发流程
- java格式化数字DecimalFormat使用
- 2015年GRE考试报名:GRE北京考点信息
- Dijkstra算法求解单源最短路径
- linux监控程序自启动!!
- 博客开通了
- Storm处理流程, 基本参数配置
- 通过投影矩阵推导最小二乘法
- GDB 调试多进程
- Redis和Memcache的区别
- 推荐 7 款免费开源的 BBS 论坛软件
- 仿拉勾网图片文字切换效果
- C实现字符行排版
- wordpress 非插件 实现文章的无限加载
- 【自定义函数】判断指定文件夹下是否包含子文件夹