Gluster 部署上的 unknown error 107 问题

来源:互联网 发布:虚拟化桌面软件 编辑:程序博客网 时间:2024/06/05 00:11

今天拿了两台旧机器(PentiumD)想搭一个分布式文件系统来玩玩,看看和HDFS对比有哪些使用上的不同。

安装应该是很容易:操作系统Fedora17 32位,留了大片的磁盘空间不做分区(51GBout of 73GB)。


然后用yum安装;

然后发现需要将glusterd 这个daemon启动……

好了,到了peer probe。几个钟都卡在unknown error 107上。


[root@gluster0 sbin]# ./gluster peer probe gluster1Probe unsuccessfulProbe returned with unknown errno 107

* 两台机器分别在/etc/hosts上命名为 gluster0和 gluster1


查了netstat,端口24007已经打开。没有理由的。没有用DNS但已经都在/etc/hosts文件上做了登记……


日志曰:

[2013-05-08 17:34:32.369306] I [glusterd-handler.c:685:glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req gluster1 24007[2013-05-08 17:34:32.371086] I [glusterd-handler.c:428:glusterd_friend_find] 0-glusterd: Unable to find hostname: gluster1[2013-05-08 17:34:32.371129] I [glusterd-handler.c:2245:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: gluster1 (24007)[2013-05-08 17:34:32.371776] I [rpc-clnt.c:968:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600[2013-05-08 17:34:32.380750] I [glusterd-handler.c:2227:glusterd_friend_add] 0-management: connect returned 0[2013-05-08 17:34:32.380917] E [socket.c:1715:socket_connect_finish] 0-management: connection to  failed (No route to host)[2013-05-08 17:34:32.381070] I [glusterd-handler.c:2423:glusterd_xfer_cli_probe_resp] 0-glusterd: Responded to CLI, ret: 0

重点是:

0-glusterd: Unable to find hostname: gluster1

-------------------

靠,上代码。编译,调试——用gdb 挂上glusterd进程。

intglusterd_friend_find_by_hostname (const char *hoststr,                                  glusterd_peerinfo_t  **peerinfo){        int                     ret = -1;        glusterd_conf_t         *priv = NULL;        glusterd_peerinfo_t     *entry = NULL;        struct addrinfo         *addr = NULL;        struct addrinfo         *p = NULL;        char                    *host = NULL;        struct sockaddr_in6     *s6 = NULL;        struct sockaddr_in      *s4 = NULL;        struct in_addr          *in_addr = NULL;        char                    hname[1024] = {0,};        xlator_t                *this  = NULL;        this = THIS;        GF_ASSERT (hoststr);        GF_ASSERT (peerinfo);        *peerinfo = NULL;        priv    = this->private;        GF_ASSERT (priv);        list_for_each_entry (entry, &priv->peers, uuid_list) {                if (!strncasecmp (entry->hostname, hoststr,                                  1024)) {                        gf_log (this->name, GF_LOG_DEBUG,                                 "Friend %s found.. state: %d", hoststr,                                  entry->state.state);                        *peerinfo = entry;                        return 0;                }        }        ret = getaddrinfo (hoststr, NULL, NULL, &addr);        if (ret != 0) {                gf_log (this->name, GF_LOG_ERROR,                        "error in getaddrinfo: %s\n",                        gai_strerror(ret));                goto out;        }        for (p = addr; p != NULL; p = p->ai_next) {                switch (p->ai_family) {                        case AF_INET:                                s4 = (struct sockaddr_in *) p->ai_addr;                                in_addr = &s4->sin_addr;                                break;                        case AF_INET6:                                s6 = (struct sockaddr_in6 *) p->ai_addr;                                in_addr =(struct in_addr *) &s6->sin6_addr;                                break;                       default: ret = -1;                                goto out;                }                host = inet_ntoa(*in_addr);                ret = getnameinfo (p->ai_addr, p->ai_addrlen, hname,                                   1024, NULL, 0, 0);                if (ret)                        goto out;                list_for_each_entry (entry, &priv->peers, uuid_list) {                        if (!strncasecmp (entry->hostname, host,                            1024) || !strncasecmp (entry->hostname,hname,                            1024)) {                                gf_log (this->name, GF_LOG_DEBUG,                                        "Friend %s found.. state: %d",                                        hoststr, entry->state.state);                                *peerinfo = entry;                                freeaddrinfo (addr);                                return 0;                        }                }        }out:        gf_log (this->name, GF_LOG_DEBUG, "Unable to find friend: %s", hoststr);        if (addr)                freeaddrinfo (addr);        return -1;}

跟了一下,发现奇怪问题:entry这个局部指针变量是在哪里赋值?

entry为NULL,第一次的list_for_each_entry() 循环,可是一次都没有进去。

而过了  

ret = getaddrinfo (hoststr, NULL, NULL, &addr);

entry就莫名其妙有了值,但这个值是有问题的。



猜测,可能漏了为entry赋值,而entry应该赋值为传入的peerinfo变量的头元素;有无内存溢出?

为了验证猜测,看看这个循环的原型:

google了一下:

http://lxr.free-electrons.com/source/include/linux/list.h#L418

/** * list_for_each_entry  -       iterate over list of given type * @pos:        the type * to use as a loop cursor. * @head:       the head for your list. * @member:     the name of the list_struct within the struct. */#define list_for_each_entry(pos, head, member)                          \        for (pos = list_entry((head)->next, typeof(*pos), member);      \             &pos->member != (head);    \             pos = list_entry(pos->member.next, typeof(*pos), member))

原来只是一个宏定义,本质是对成员做一个for循环。