OpenStack Neutron源码分析:ovs-neutron-agent启动源码解析

来源:互联网 发布:大庆电视台网络直播 编辑:程序博客网 时间:2024/05/30 23:02

声明:

本博客欢迎转载,但请保留原作者信息!

作者:华为云计算工程师 林凯

团队:华为杭州研发中心OpenStack社区团队


本文是在个人学习过程中整理和总结,由于时间和个人能力有限,错误之处在所难免,欢迎指正!


OpenStack Neutron,是专注于为OpenStack提供网络服务的项目。对Neutron各个组件的介绍请看这一篇博客:http://www.openstack.cn/p1745.html。

引用其中对L2 Agent的组件的介绍:L2Agent通常运行在Hypervisor,与neutron-server通过RPC通信,监听并通知设备的变化,创建新的设备来确保网络segment的正确性,应用security groups规则等。例如,OVS Agent,使用Open vSwitch来实现VLAN, GRE,VxLAN来实现网络的隔离,还包括了网络流量的转发控制。

本篇博客将对Neutron中的OVS Agent组件启动源码进行解析。

OVS Agent组件启动大致流程如下图所示:



接下来,让我们真正开始OVS Agent组件启动源码的解析

(1)    /neutron/plugins/openvswitch/agent/ovs-neutron-agent.py中的main()

<span style="font-size:14px;">def main():    cfg.CONF.register_opts(ip_lib.OPTS)    common_config.init(sys.argv[1:])    common_config.setup_logging(cfg.CONF)    q_utils.log_opt_values(LOG)    try:        agent_config = create_agent_config_map(cfg.CONF)    except ValueError as e:        LOG.error(_('%s Agent terminated!'), e)        sys.exit(1)    is_xen_compute_host = 'rootwrap-xen-dom0' in agent_config['root_helper']    if is_xen_compute_host:        # Force ip_lib to always use the root helper to ensure that ip        # commands target xen dom0 rather than domU.        cfg.CONF.set_default('ip_lib_force_root', True)    <span style="color:#ff0000;">agent = OVSNeutronAgent(**agent_config)     (1)</span>    signal.signal(signal.SIGTERM, agent._handle_sigterm)    # Start everything.    LOG.info(_("Agent initialized successfully, now running... "))    <span style="color:#ff0000;">agent.daemon_loop()                         (2)</span></span>

上述代码中,最重要的函数是(1)函数和(2)函数,(1)函数主要的工作是实例化一个OVSAgent,并完成OVS Agent的一系列初始化工作,(2)函数一直在循环检查一些状态,发现状态发生变化,执行相应的操作。

接下来,首先仔细分析(1)函数中实例化OVS Agent,那么在实例化这个OVS Agent时,它做了哪些初始化工作。

<span style="font-size:14px;">def __init__(self, integ_br, tun_br, local_ip,                 bridge_mappings, root_helper,                 polling_interval, tunnel_types=None,                 veth_mtu=None, l2_population=False,                 enable_distributed_routing=False,                 minimize_polling=False,                 ovsdb_monitor_respawn_interval=(                     constants.DEFAULT_OVSDBMON_RESPAWN),                 arp_responder=False,                 use_veth_interconnection=False):        super(OVSNeutronAgent, self).__init__()        self.use_veth_interconnection = use_veth_interconnection        self.veth_mtu = veth_mtu        self.root_helper = root_helper        self.available_local_vlans = set(moves.xrange(q_const.MIN_VLAN_TAG,                                                      q_const.MAX_VLAN_TAG))        self.tunnel_types = tunnel_types or []        self.l2_pop = l2_population        # TODO(ethuleau): Change ARP responder so it's not dependent on the        #     ML2 l2 population mechanism driver.        # enable_distributed_routing是否使能分布式路由        self.enable_distributed_routing = enable_distributed_routing        self.arp_responder_enabled = arp_responder and self.l2_pop        self.agent_state = {            'binary': 'neutron-openvswitch-agent',            'host': cfg.CONF.host,            'topic': q_const.L2_AGENT_TOPIC,            'configurations': {'bridge_mappings': bridge_mappings,                               'tunnel_types': self.tunnel_types,                               'tunneling_ip': local_ip,                               'l2_population': self.l2_pop,                               'arp_responder_enabled':                               self.arp_responder_enabled,                               'enable_distributed_routing':                               self.enable_distributed_routing},            'agent_type': q_const.AGENT_TYPE_OVS,            'start_flag': True}        # Keep track of int_br's device count for use by _report_state()        self.int_br_device_count = 0        self.int_br = ovs_lib.OVSBridge(integ_br, self.root_helper)        # setup_integration_br:安装整合网桥——int_br        # 创建patch ports,并移除所有现有的流规则        # 添加基本的流规则        <span style="color:#ff0000;">self.setup_integration_br()(1)</span>        # Stores port update notifications for processing in main rpc loop        self.updated_ports = set()        # setup_rpc完成以下任务:        # 设置plugin_rpc,这是用来与neutron-server通信的        # 设置state_rpc,用于agent状态信息上报        # 设置connection,用于接收neutron-server的消息        # 启动状态周期上报        <span style="color:#ff0000;">self.setup_rpc()(2)</span>        self.bridge_mappings = bridge_mappings        # 创建物理网络网桥,并用veth与br-int连接起来        <span style="color:#ff0000;">self.setup_physical_bridges(self.bridge_mappings)<span style="white-space:pre"></span>(3)</span>        self.local_vlan_map = {}        self.tun_br_ofports = {p_const.TYPE_GRE: {},                               p_const.TYPE_VXLAN: {}}        self.polling_interval = polling_interval        self.minimize_polling = minimize_polling        self.ovsdb_monitor_respawn_interval = ovsdb_monitor_respawn_interval        if tunnel_types:            self.enable_tunneling = True        else:            self.enable_tunneling = False        self.local_ip = local_ip        self.tunnel_count = 0        self.vxlan_udp_port = cfg.CONF.AGENT.vxlan_udp_port        self.dont_fragment = cfg.CONF.AGENT.dont_fragment        self.tun_br = None        self.patch_int_ofport = constants.OFPORT_INVALID        self.patch_tun_ofport = constants.OFPORT_INVALID        if self.enable_tunneling:            # The patch_int_ofport and patch_tun_ofport are updated            # here inside the call to setup_tunnel_br            self.setup_tunnel_br(tun_br)        <span style="color:#ff0000;">self.dvr_agent = ovs_dvr_neutron_agent.OVSDVRNeutronAgent(            self.context,            self.plugin_rpc,            self.int_br,            self.tun_br,            self.patch_int_ofport,            self.patch_tun_ofport,            cfg.CONF.host,            self.enable_tunneling,            self.enable_distributed_routing)(4)</span>        self.dvr_agent.setup_dvr_flows_on_integ_tun_br()        # Collect additional bridges to monitor        self.ancillary_brs = self.setup_ancillary_bridges(integ_br, tun_br)        # Security group agent support        <span style="color:#ff0000;">self.sg_agent = OVSSecurityGroupAgent(self.context,                                              self.plugin_rpc,                                              root_helper)<span style="white-space:pre"></span>(5)</span>        # Initialize iteration counter        self.iter_num = 0        <span style="color:#ff0000;">self.run_daemon_loop = True<span style="white-space:pre"></span>(6)</span></span>

在构造函数中,有(1)-(6)等函数完成了重要的初始化工作。首先来看(1)函数self.setup_integration_br()中的内容

<span style="font-size:14px;">def setup_integration_br(self):        """                        安装integration网桥                        创建patch ports,并移除所有现有的流规则                        添加基本的流规则        """        # Ensure the integration bridge is created.        # ovs_lib.OVSBridge.create() will run        #   ovs-vsctl -- --may-exist add-br BRIDGE_NAME        # which does nothing if bridge already exists.                # 通过执行ovs-vsctl中add-br创建int_br        self.int_br.create()        self.int_br.set_secure_mode()        # del-port删除patch         self.int_br.delete_port(cfg.CONF.OVS.int_peer_patch_port)        # 通过ovs-ofctl移除所有流规则        self.int_br.remove_all_flows()        # switch all traffic using L2 learning        # 增加actions为normal,优先级为1的流规则        # 用L2学习来交换所有通信内容        self.int_br.add_flow(priority=1, actions="normal")        # Add a canary flow to int_br to track OVS restarts        # 添加canary流规则给int_br来跟踪OVS的重启 优先级0级,actions drop        self.int_br.add_flow(table=constants.CANARY_TABLE, priority=0,                             actions="drop")</span>

函数的内容很明显,就是完成安装integration网桥br-int,具体操作内容可以参考代码中的注释。br-int建立完成之后,将原有的流规则删除,并会添加两条基础的流规则,我们来看下这两条流规则的作用是什么?第一条流规则是优先级为1、actions为normal的流规则,这个规则是用来将连接到br-int的网络设备的通信内容进行转发给所有其他网络设备;第二条流规则是优先级为0、actions为drop的流规则,用来跟踪OVS的重启,这个功能在后面循环中会分析到。

之后,我们来看第二个函数self.setup_rpc()的具体内容。

<span style="font-size:14px;">def setup_rpc(self):        self.agent_id = 'ovs-agent-%s' % cfg.CONF.host        self.topic = topics.AGENT        # 设置plugin_rpc,用来与neutron-server通信的        self.plugin_rpc = OVSPluginApi(topics.PLUGIN)        # 设置state_rpc,用于agent状态信息上报        self.state_rpc = agent_rpc.PluginReportStateAPI(topics.PLUGIN)        # 设置connection,并添加consumers,用于接收neutron-server的消息        # RPC network init        self.context = context.get_admin_context_without_session()        # Handle updates from service        self.endpoints = [self]        # Define the listening consumers for the agent        consumers = [[topics.PORT, topics.UPDATE],                     [topics.NETWORK, topics.DELETE],                     [constants.TUNNEL, topics.UPDATE],                     [topics.SECURITY_GROUP, topics.UPDATE],                     [topics.DVR, topics.UPDATE]]        if self.l2_pop:            consumers.append([topics.L2POPULATION,                              topics.UPDATE, cfg.CONF.host])        self.connection = agent_rpc.create_consumers(self.endpoints,                                                     self.topic,                                                     consumers)                # 启动心跳周期上报        report_interval = cfg.CONF.AGENT.report_interval        if report_interval:            heartbeat = loopingcall.FixedIntervalLoopingCall(                self._report_state)            heartbeat.start(interval=report_interval)</span>

通过代码的分析,我们可以看到这个函数中分别设置用来与neutron-server通信的plugin_rpc,设置了用于agent状态信息上报的state_rpc,设置用于接收neutron-server的消息connection, 并且启动心跳的周期上报,周期默认为30s。Neutron server端启动了rpc_listeners,对agent发过来的消息进行监听,对于心跳的监听,是如果接收到心跳信号,就会对数据库中的时间戳进行更新,如果一直不更新时间戳,当前时间减去更新的时间戳,如果超过默认的agent_down_time=75s,则认为agent处于down的状态。

接下来解析(3)函数self.setup_physical_bridges(self.bridge_mappings),具体内容如下:

<span style="font-size:14px;">def setup_physical_bridges(self, bridge_mappings):        '''Setup the physical network bridges.        Creates physical network bridges and links them to the        integration bridge using veths.        :param bridge_mappings: map physical network names to bridge names.        '''        """                        安装物理网络网桥                        创建物理网络网桥,并用veth/patchs与br-int连接起来        """        self.phys_brs = {}        self.int_ofports = {}        self.phys_ofports = {}        ip_wrapper = ip_lib.IPWrapper(self.root_helper)        ovs_bridges = ovs_lib.get_bridges(self.root_helper)        for physical_network, bridge in bridge_mappings.iteritems():            LOG.info(_("Mapping physical network %(physical_network)s to "                       "bridge %(bridge)s"),                     {'physical_network': physical_network,                      'bridge': bridge})            # setup physical bridge            if bridge not in ovs_bridges:                LOG.error(_("Bridge %(bridge)s for physical network "                            "%(physical_network)s does not exist. Agent "                            "terminated!"),                          {'physical_network': physical_network,                           'bridge': bridge})                sys.exit(1)            br = ovs_lib.OVSBridge(bridge, self.root_helper)            br.remove_all_flows()            br.add_flow(priority=1, actions="normal")            self.phys_brs[physical_network] = br            # 使用veth/patchs使br-eth1与br-int互联            # 删除原有的patchs,创建int-br-eth1和phy-br-eth1            # 使用ovs-vsctl show            # interconnect physical and integration bridges using veth/patchs            int_if_name = self.get_peer_name(constants.PEER_INTEGRATION_PREFIX,                                             bridge)            phys_if_name = self.get_peer_name(constants.PEER_PHYSICAL_PREFIX,                                              bridge)            self.int_br.delete_port(int_if_name)            br.delete_port(phys_if_name)            if self.use_veth_interconnection:                if ip_lib.device_exists(int_if_name, self.root_helper):                    ip_lib.IPDevice(int_if_name,                                    self.root_helper).link.delete()                    # Give udev a chance to process its rules here, to avoid                    # race conditions between commands launched by udev rules                    # and the subsequent call to ip_wrapper.add_veth                    utils.execute(['/sbin/udevadm', 'settle', '--timeout=10'])                # 通过ip netns exec 'namespace' ip link add veth命令添加veth                int_veth, phys_veth = ip_wrapper.add_veth(int_if_name,                                                          phys_if_name)                int_ofport = self.int_br.add_port(int_veth)                phys_ofport = br.add_port(phys_veth)            else:                # Create patch ports without associating them in order to block                # untranslated traffic before association                int_ofport = self.int_br.add_patch_port(                    int_if_name, constants.NONEXISTENT_PEER)                phys_ofport = br.add_patch_port(                    phys_if_name, constants.NONEXISTENT_PEER)            self.int_ofports[physical_network] = int_ofport            self.phys_ofports[physical_network] = phys_ofport            # 封锁桥梁之间的所有通信翻译            # block all untranslated traffic between bridges            self.int_br.add_flow(priority=2, in_port=int_ofport,                                 actions="drop")            br.add_flow(priority=2, in_port=phys_ofport, actions="drop")            if self.use_veth_interconnection:                # 使能veth传递通信                # enable veth to pass traffic                int_veth.link.set_up()                phys_veth.link.set_up()                if self.veth_mtu:                    # set up mtu size for veth interfaces                    int_veth.link.set_mtu(self.veth_mtu)                    phys_veth.link.set_mtu(self.veth_mtu)            else:                # 关联patch ports传递通信                # associate patch ports to pass traffic                self.int_br.set_db_attribute('Interface', int_if_name,                                             'options:peer', phys_if_name)                br.set_db_attribute('Interface', phys_if_name,                                    'options:peer', int_if_name)</span>

在setup_physical_bridges这个函数中,完成了物理网桥br-eth*的创建,创建好网桥之后,与安装br-int一样,首先删除了现有的所有流规则,并添加了同样为normal的流规则,用以转发消息,接下来是与br-int不同的地方,根据use_veth_interconnection决定是否使用veth与br-int进行连接,并配置veth或者patch port,然后通过设置drop流规则,封锁桥之间的通信,然后使能veth或者patch ports进行通信。

(4)函数与(5)函数分别是对DVR Agent(分布式路由代理)和Security Group Agent(安全组代理)的初始化工作,用于处理DVR和security group,这部分的内容将在之后的博客介绍。

最后把run_daemon_loop变量置为True,开始循环查询的工作。当run_daemon_loop变量置为True,main函数调用daemon_loop函数,之后调用rpc_loop函数,我们来看下rpc_loop函数都完成了哪些工作。

<span style="font-size:14px;">def rpc_loop(self, polling_manager=None):        if not polling_manager:            polling_manager = polling.AlwaysPoll()# 初始化设置        sync = True        ports = set()        updated_ports_copy = set()        ancillary_ports = set()        tunnel_sync = True        ovs_restarted = False# 进入循环        while self.run_daemon_loop:            start = time.time()            port_stats = {'regular': {'added': 0,                                      'updated': 0,                                      'removed': 0},                          'ancillary': {'added': 0,                                        'removed': 0}}            LOG.debug(_("Agent rpc_loop - iteration:%d started"),                      self.iter_num)            if sync:                LOG.info(_("Agent out of sync with plugin!"))                ports.clear()                ancillary_ports.clear()                sync = False                polling_manager.force_polling()# 根据之前在br-int中设置canary flow的有无判断是否进行restart操作            ovs_restarted = self.check_ovs_restart()            if ovs_restarted:                ......            # Notify the plugin of tunnel IP            if self.enable_tunneling and tunnel_sync:                ......            if self._agent_has_updates(polling_manager) or ovs_restarted:                try:                    LOG.debug(_("Agent rpc_loop - iteration:%(iter_num)d - "                                "starting polling. Elapsed:%(elapsed).3f"),                              {'iter_num': self.iter_num,                               'elapsed': time.time() - start})                    updated_ports_copy = self.updated_ports                    self.updated_ports = set()                    reg_ports = (set() if ovs_restarted else ports)                    <span style="color:#ff0000;">port_info = self.scan_ports(reg_ports, updated_ports_copy)(1)</span>                    LOG.debug(_("Agent rpc_loop - iteration:%(iter_num)d - "                                "port information retrieved. "                                "Elapsed:%(elapsed).3f"),                              {'iter_num': self.iter_num,                               'elapsed': time.time() - start})                    # Secure and wire/unwire VIFs and update their status                    # on Neutron server                    if (self._port_info_has_changes(port_info) or                        self.sg_agent.firewall_refresh_needed() or                        ovs_restarted):                        LOG.debug(_("Starting to process devices in:%s"),                                  port_info)                        # If treat devices fails - must resync with plugin                        <span style="color:#ff0000;">sync = self.process_network_ports(port_info,                                                          ovs_restarted)(2)</span>                        LOG.debug(_("Agent rpc_loop - iteration:%(iter_num)d -"                                    "ports processed. Elapsed:%(elapsed).3f"),                                  {'iter_num': self.iter_num,                                   'elapsed': time.time() - start})                        port_stats['regular']['added'] = (                            len(port_info.get('added', [])))                        port_stats['regular']['updated'] = (                            len(port_info.get('updated', [])))                        port_stats['regular']['removed'] = (                            len(port_info.get('removed', [])))                    ports = port_info['current']                    # Treat ancillary devices if they exist                    if self.ancillary_brs:                        port_info = self.update_ancillary_ports(                            ancillary_ports)                        LOG.debug(_("Agent rpc_loop - iteration:%(iter_num)d -"                                    "ancillary port info retrieved. "                                    "Elapsed:%(elapsed).3f"),                                  {'iter_num': self.iter_num,                                   'elapsed': time.time() - start})                        if port_info:                            rc = self.process_ancillary_network_ports(                                port_info)                            LOG.debug(_("Agent rpc_loop - iteration:"                                        "%(iter_num)d - ancillary ports "                                        "processed. Elapsed:%(elapsed).3f"),                                      {'iter_num': self.iter_num,                                       'elapsed': time.time() - start})                            ancillary_ports = port_info['current']                            port_stats['ancillary']['added'] = (                                len(port_info.get('added', [])))                            port_stats['ancillary']['removed'] = (                                len(port_info.get('removed', [])))                            sync = sync | rc                    polling_manager.polling_completed()                except Exception:                    LOG.exception(_("Error while processing VIF ports"))                    # Put the ports back in self.updated_port                    self.updated_ports |= updated_ports_copy                    sync = True            # sleep till end of polling interval            elapsed = (time.time() - start)            LOG.debug(_("Agent rpc_loop - iteration:%(iter_num)d "                        "completed. Processed ports statistics: "                        "%(port_stats)s. Elapsed:%(elapsed).3f"),                      {'iter_num': self.iter_num,                       'port_stats': port_stats,                       'elapsed': elapsed})            if (elapsed < self.polling_interval):                time.sleep(self.polling_interval - elapsed)            else:                LOG.debug(_("Loop iteration exceeded interval "                            "(%(polling_interval)s vs. %(elapsed)s)!"),                          {'polling_interval': self.polling_interval,                           'elapsed': elapsed})            self.iter_num = self.iter_num + 1</span>
rpc_loop做的工作很明显就是进行循环地查询一些状态,根据这些状态,进行相应的操作,其中最重要的工作就是扫描数据库中的ports信息,然后对这些信息进行处理,所以我们来看(1)函数,看下它是怎么提取这些ports信息

<span style="font-size:14px;">def scan_ports(self, registered_ports, updated_ports=None):        # 通过ovs-vsctl命令获取数据库中port设置信息        cur_ports = self.int_br.get_vif_port_set()        self.int_br_device_count = len(cur_ports)        port_info = {'current': cur_ports}        if updated_ports is None:            updated_ports = set()        # 获取已经注册的port的更新信息        updated_ports.update(self.check_changed_vlans(registered_ports))        if updated_ports:            # Some updated ports might have been removed in the            # meanwhile, and therefore should not be processed.            # In this case the updated port won't be found among            # current ports.            updated_ports &= cur_ports            # 更新updated_ports的数量            if updated_ports:                port_info['updated'] = updated_ports        # FIXME(salv-orlando): It's not really necessary to return early        # if nothing has changed.        if cur_ports == registered_ports:            # No added or removed ports to set, just return here            return port_info        # 更新added_ports的数量        port_info['added'] = cur_ports - registered_ports        # Remove all the known ports not found on the integration bridge        # 更新removed_ports的数量,移除所有没有在br-int上发现的已知ports        port_info['removed'] = registered_ports - cur_ports        return port_info</span>
获取到port_info之后就要根据这些信息,对port进行真正的操作,真正的操作就在(2)函数process_network_ports中进行。

<span style="font-size:14px;">def process_network_ports(self, port_info, ovs_restarted):        resync_a = False        resync_b = False        # 取出更新和添加的prot信息        devices_added_updated = (port_info.get('added', set()) |                                 port_info.get('updated', set()))        if devices_added_updated:            start = time.time()            try:                # treat_devices_added_or_updated根据是否已经存在这个port分别进行添加和更新的操作                # 添加:skipped_devices.append(device)进行添加之后,将做与update一样的操作                # 更新:通过treat_vif_port将port添加并且绑定到net_uuid/lsw_id并且 为没有绑定的通信设置流规则                skipped_devices = self.treat_devices_added_or_updated(                    devices_added_updated, ovs_restarted)                LOG.debug(_("process_network_ports - iteration:%(iter_num)d -"                            "treat_devices_added_or_updated completed. "                            "Skipped %(num_skipped)d devices of "                            "%(num_current)d devices currently available. "                            "Time elapsed: %(elapsed).3f"),                          {'iter_num': self.iter_num,                           'num_skipped': len(skipped_devices),                           'num_current': len(port_info['current']),                           'elapsed': time.time() - start})                # Update the list of current ports storing only those which                # have been actually processed.                port_info['current'] = (port_info['current'] -                                        set(skipped_devices))            except DeviceListRetrievalError:                # Need to resync as there was an error with server                # communication.                LOG.exception(_("process_network_ports - iteration:%d - "                                "failure while retrieving port details "                                "from server"), self.iter_num)                resync_a = True        if 'removed' in port_info:            start = time.time()            # 完成移除port的功能,通过发送RPC命令给Neutron server完成            resync_b = self.treat_devices_removed(port_info['removed'])            LOG.debug(_("process_network_ports - iteration:%(iter_num)d -"                        "treat_devices_removed completed in %(elapsed).3f"),                      {'iter_num': self.iter_num,                       'elapsed': time.time() - start})        # If one of the above operations fails => resync with plugin        return (resync_a | resync_b)</span>

从代码的解释可以看到,process_network_ports完成了port的添加,删除和更新的操作。之后循环检测是否已经到了循环间隔,如果还没有到间隔时间就sleep到那个时间,然后继续循环工作。

 

         至此,我们也就完成OVS Agent的启动源码解析。







0 0
原创粉丝点击