Error on initialization of server mk-worker (stormconf.ser is missing)

来源:互联网 发布:儿歌大全软件 编辑:程序博客网 时间:2024/06/01 08:42
Error on initialization of server mk-worker (stormconf.ser is missing)
2 名作者发布了 13 个帖子
  
Moshe Bixenshpaner
12-8-10
将帖子翻译为中文  
Hi guys,

I have a simple topology that puts values on a Redis server.
When I deploy it, no client gets to connect Redis, although it works perfectly fine when I run it via LocalCluster.
I attached logs and conf for the nimbus and the supervisors.

The cluster is configured as following:
  • 1GB RAM for the nimbus (192.168.1.22)
  • 1GB RAM for the zookeeper1 (192.168.1.31)
  • 2GB RAM for the supervisor1 (192.168.1.16; 4 workers)
  • 2GB RAM for the supervisor2 (192.168.1.19; 2 workers)

All machines are virtual and have JDK 6u33 x64 installed.
nimbus, supervisor1 & supervisor2 have Storm 0.8.0, ZeroMQ 2.1.7 and the latest JZMQ installed.
zookeeper1 has Python 2.6.6 (with default configuration) and Zookeeper 3.3.6 installed.


I'm not sure this is the entire problem, but I'm getting the following exception on some of my supervisors (in our case - supervisor2):
2012-08-10 08:21:27 worker [ERROR] Error on initialization of server mk-worker
java.io.FileNotFoundException: File '/opt/storm/local/supervisor/stormdist/DistributedSystem-1-1344586762/stormconf.ser' does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:138)
at backtype.storm.daemon.worker$worker_data.invoke(worker.clj:146)
at backtype.storm.daemon.worker$fn__4316$exec_fn__1206__auto____4317.invoke(worker.clj:331)
at clojure.lang.AFn.applyToHelper(AFn.java:185)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(c ore.clj:601)
at backtype.storm.daemon.worker$fn__4316$mk_worker__4372.doInvoke(worker.clj:322)
at clojure.lang.RestFn.invoke(RestFn.java:512)
at backtype.storm.daemon.worker$_main.invoke(worker.clj:432)
at clojure.lang.AFn.applyToHelper(AFn.java:172)
at clojure.lang.AFn.applyTo(AFn.java:151)
at backtype.storm.daemon.worker.main(Unknown Source)
2012-08-10 08:21:27 util [INFO] Halting process: ("Error on initialization")


The topology I'm trying to run requires 4 workers altogether.
So even if supervisor2 dysfunctions, the other supervisor should be able to run the entire topology on its own.
Am I doing something wrong here?


Thanks,
Moshe.
附件 (1)
logs.rar
38 KB   查看   下载
nathanmarz
12-8-13
将帖子翻译为中文  
Are your supervisors sharing a directory over a network mount, by any chance? What happens if you turn off supervisor2 completely? Do topologies launch successfully on the other supervisor?
- 显示引用文字 -
--
Twitter: @nathanmarz
http://nathanmarz.com

Moshe Bixenshpaner
12-8-14
将帖子翻译为中文  
No, supervisors don't share directories.
They are virtual machines created by kvm though (I'm not sure if it has anything to do with the problem).

If I have enough workers on a single supervisor, everything works perfectly fine.
It seems the coordination between the supervisors is the cause of the problem.

Thanks,
Moshe.
- 显示引用文字 -
nathanmarz
12-8-14
将帖子翻译为中文  
The error you're facing indicates that the supervisor failed to download the configuration file from Nimbus. Can you show me the results of doing an ls -R on the supervisor local dir for the node that's getting that error? (do it while the topology is active and causing the error – that is, don't shut it down and then do the ls -R). 
- 显示引用文字 -
此帖已被删除。
Moshe Bixenshpaner
12-8-14
将帖子翻译为中文  
Hi,

I attached the local directory and log files for nimbus and each of the supervisors.
sv2 is the supervisor that fails to load.

Thanks,
Moshe.
- 显示引用文字 -
附件 (6)
nb-local.tar
366 KB   查看   下载
nb-logs.tar
4 KB   查看   下载
sv1-local.tar
187 KB   查看   下载
sv1-logs.tar
27 KB   查看   下载
sv2-local.tar
187 KB   查看   下载
sv2-logs.tar
22 KB   查看   下载
nathanmarz
12-8-15
将帖子翻译为中文  
I would need you to do the ls -R while the error is happening and the topology is still active.
- 显示引用文字 -
Moshe Bixenshpaner
12-8-15
将帖子翻译为中文  
This is exactly what I did (only I attached a tar file or the entire local directory, instead of just attaching the output from an ls -R).
- 显示引用文字 -
nathanmarz
12-8-17
将帖子翻译为中文  
I don't quite understand – you said you did the ls -R a few days after the exception happened.
- 显示引用文字 -
Moshe Bixenshpaner
12-8-17
将帖子翻译为中文  
I deleted that post, the one I posted eventually was after I reset everything, reproduced the whole thing and attached logs and contents of local directories.
- 显示引用文字 -
nathanmarz
12-8-17
将帖子翻译为中文  
The sv2 logs don't show any exceptions.
- 显示引用文字 -
Moshe Bixenshpaner
12-8-26
将帖子翻译为中文  
Hi Nathan,

Log files of both SV2 workers show the logs show java.io.FileNotFoundException: File '/opt/storm/local/supervisor/stormdist/DistributedSystem-1-1344956702/stormconf.ser' does not exist followed byHalting process: ("Error on initialization").
On another note, the ZK1 log shows that clients are disconnecting every few seconds.
- 显示引用文字 -
Moshe Bixenshpaner
12-8-26
将帖子翻译为中文  
Hey guys,

Problem is solved.
There were actually two of them:
1. The documentation specify to use a specific version of ZeroMQ, JZMQ, Python and JDK but doesn't specify anything about the Zookeeper, I assumed I can use the newest version (3.3.6) but it turned out to be a bad move. After a week with poor performance, I checked the jars attached to Storm 0.8.0 and I saw that it is aimed for Zookeeper 3.3.3.

2. I'm not sure how it is with real clusters, but on virtual cluster you need to have each node specified in the /etc/hosts file of all other nodes - pay attention to the following form:
ip_address host_name.defaultdomain

Notice the .defaultdomain at the end of each host name - this was what actually solved the problem of having a cluster of supervisors working together simultaneously.
- 显示引用文字 -
原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 艾灸后脸上长痘怎么办 艾灸烟大怎么办 湿毛巾 月经推迟一个月了还不来怎么办 埋线了喝酒了怎么办 对待孩子脾气暴燥怎么办 买的新鞋子臭脚怎么办 鞋子臭脚怎么办如何除 惠普803墨盒干了怎么办 酷派手机无命令怎么办 华为手机锁机了怎么办 小孩被蜘蛛咬了怎么办 我是一个不爱说话的人怎么办 光动能手表停了怎么办 电波表收不到波怎么办 吃了壮阳药头疼怎么办 吃了伟哥后头痛怎么办 伟哥吃了不管用怎么办 钱站的钱还不起怎么办 面试工资要低了怎么办 线切割环保来查怎么办 喝白酒咽不下去怎么办 翠佛堂的玉假了怎么办 300英雄账号忘了怎么办 买了假酒不承认怎么办 买到假酒不承认怎么办 喝假酒胃里难受怎么办 怀孕初期塞药了怎么办 促黄体生成素高怎么办 激素脸痒的厉害怎么办 肾阴虚又肾阳虚怎么办 泡菜坛子太酸了怎么办 吃花粉过敏休刻怎么办 脸过敏怎么办才最有效 喝花粉后过敏了怎么办 阿比特龙耐药后怎么办 马蜂把人蜇了怎么办 非那雄胺副作用怎么办 来电充电宝丢了怎么办 蜂哈哈过量伤子怎么办 八个月宝宝伤食怎么办 床有虫子咬木板怎么办