Neutron-server初始化 — Neutron L2 Agent服务初始化

来源:互联网 发布:中国空军 知乎 编辑:程序博客网 时间:2024/04/30 14:38

OpenvSwitch,简称OVS是一个虚拟交换软件,主要用于虚拟机VM环境,作为一个虚拟交换机,支持Xen/XenServer, KVM, and VirtualBox多种虚拟化技术。在这种某一台机器的虚拟化的环境中,一个虚拟交换机(vswitch)主要有两个作用:1. 传递虚拟机VM之间的流量。2. 实现VM和外界网络的通信。

在openstack中目前用的比较多的L2层agent应该就是openvswitch agent了。本文大致分析了一下openvswithc agent做了哪些事。

Ovs agent初始化

以常用的openvswitch agent为例,可以执行以下命令启动agent服务:
CLI:

service neutron-openvswitch-agent start

setup.cfg配置文件的以下内容可以知道,实际执行的方法是:
neutron.plugins.openvswitch.agent.ovs_neutron_agent:main

[entry_points]  console-scripts =       ...      neutron-openvswitch-agent = neutron.plugins.openvswitch.agent.ovs_neutron_agent:main      ...

a. 启动过程解析

neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:main

def main(bridge_classes):    try:        # 从配置文件中读取agent的配置,主要是network_mappings,各个bridges名称         agent_config = create_agent_config_map(cfg.CONF)    except ValueError:        LOG.exception(_LE("Agent failed to create agent config map"))        raise SystemExit(1)    prepare_xen_compute()    validate_local_ip(agent_config['local_ip'])    try:        # 创建agent实例        agent = OVSNeutronAgent(bridge_classes, **agent_config)    except (RuntimeError, ValueError) as e:        LOG.error(_LE("%s Agent terminated!"), e)        sys.exit(1)    # Agent initialized successfully    agent.daemon_loop()

启动时做了以下工作:
1. 设置plugin_rpc,这是用来与neutron-server通信的。
2. 设置state_rpc,用于agent状态信息上报。
3. 设置connection,用于接收neutron-server的消息。
4. 启动状态周期上报。
5. 设置br-int。
6. 设置bridge_mapping对应的网桥。
7. 初始化sg_agent,用于处理security group。
8. 周期检测br-int上的端口变化,调用process_network_ports处理添加/删除端口。

b. neutron-server/nova与ovs agent的交互解析

  1. neutron-server和neutron-openvswitch-agent的消息队列如下:
    这里写图片描述
    neutron-server可能会发生上述四种消息广播给neutron-openvswitch-agent。openvswitch agent会先看一下端口是否在本地,如果在本地则进行对应动作。

  2. nova与neutron-openvswitch-agent的交互,这张图片来源于GongYongSheng在香港峰会的PPT:
    这里写图片描述
    首先boot虚机时,nova-compute发消息给neutron-server请求创建port。之后,在driver里面在br-int上建立port后,neutron-openvswitch-port循环检测br-int会发现新增端口,对其设定合适的openflow规则以及localvlan,最后将port状态设置为ACTIVE。

neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:_init_

c. OVSNeutronAgent函数解析

在OVSNeutronAgent的docstring中,概要说明了agent实现虚拟的方式,有以下几点:
1) 创建br-int, br-tun以及每个物理网络接口一个bridge。
2) 虚拟机的虚拟网卡都会接入到br-int。使用同一个虚拟网络的虚拟网卡共享一个local的VLAN(与外部网络的VLAN无关,vlan id可以重叠)。这个local的VLAN id会映射到外部网络的某个VLAN id。
3) 对于network_type是VLAN或者FLAT的网络,在br-int和各个物理网络bridge之间创建一个虚拟网卡,用于限定流规则、映射或者删除VLAN id等处理。
4) 对于network_type是GRE的,每个租户在不同hypervisor之间的网络通信通过一个逻辑交换机标识符(Logical Switch identifier)进行区分,并创建一个连通各个hypervisor的br-tun的通道(tunnel)网络。Port patching用于连通br-int和各个hypervisor的br-tun上的VLAN。

neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:OVSNeutronAgent

class OVSNeutronAgent(sg_rpc.SecurityGroupAgentRpcCallbackMixin,                      l2population_rpc.L2populationRpcCallBackTunnelMixin,                      dvr_rpc.DVRAgentRpcCallbackMixin):    '''Implements OVS-based tunneling, VLANs and flat networks.    Two local bridges are created: an integration bridge (defaults to    'br-int') and a tunneling bridge (defaults to 'br-tun'). An    additional bridge is created for each physical network interface    used for VLANs and/or flat networks.    All VM VIFs are plugged into the integration bridge. VM VIFs on a    given virtual network share a common "local" VLAN (i.e. not    propagated externally). The VLAN id of this local VLAN is mapped    to the physical networking details realizing that virtual network.    For virtual networks realized as GRE tunnels, a Logical Switch    (LS) identifier is used to differentiate tenant traffic on    inter-HV tunnels. A mesh of tunnels is created to other    Hypervisors in the cloud. These tunnels originate and terminate on    the tunneling bridge of each hypervisor. Port patching is done to    connect local VLANs on the integration bridge to inter-hypervisor    tunnels on the tunnel bridge.    For each virtual network realized as a VLAN or flat network, a    veth or a pair of patch ports is used to connect the local VLAN on    the integration bridge with the physical network bridge, with flow    rules adding, modifying, or stripping VLAN tags as necessary.    '''    # history    #   1.0 Initial version    #   1.1 Support Security Group RPC    #   1.2 Support DVR (Distributed Virtual Router) RPC    #   1.3 Added param devices_to_update to security_groups_provider_updated    #   1.4 Added support for network_update    target = oslo_messaging.Target(version='1.4')    def __init__(self, bridge_classes, integ_br, tun_br, local_ip,                 bridge_mappings, polling_interval, tunnel_types=None,                 veth_mtu=None, l2_population=False,                 enable_distributed_routing=False,                 minimize_polling=False,                 ovsdb_monitor_respawn_interval=(                     constants.DEFAULT_OVSDBMON_RESPAWN),                 arp_responder=False,                 prevent_arp_spoofing=True,                 use_veth_interconnection=False,                 quitting_rpc_timeout=None,                 conf=None):        '''Constructor.        :param bridge_classes: a dict for bridge classes.        :param integ_br: name of the integration bridge.        :param tun_br: name of the tunnel bridge.        :param local_ip: local IP address of this hypervisor.        :param bridge_mappings: mappings from physical network name to bridge.        :param polling_interval: interval (secs) to poll DB.        :param tunnel_types: A list of tunnel types to enable support for in               the agent. If set, will automatically set enable_tunneling to               True.        :param veth_mtu: MTU size for veth interfaces.        :param l2_population: Optional, whether L2 population is turned on        :param minimize_polling: Optional, whether to minimize polling by               monitoring ovsdb for interface changes.        :param ovsdb_monitor_respawn_interval: Optional, when using polling               minimization, the number of seconds to wait before respawning               the ovsdb monitor.        :param arp_responder: Optional, enable local ARP responder if it is               supported.        :param prevent_arp_spoofing: Optional, enable suppression of any ARP               responses from ports that don't match an IP address that belongs               to the ports. Spoofing rules will not be added to ports that               have port security disabled.        :param use_veth_interconnection: use veths instead of patch ports to               interconnect the integration bridge to physical bridges.        :param quitting_rpc_timeout: timeout in seconds for rpc calls after               SIGTERM is received        :param conf: an instance of ConfigOpts        '''        super(OVSNeutronAgent, self).__init__()        self.conf = conf or cfg.CONF        self.fullsync = True        # init bridge classes with configured datapath type.        self.br_int_cls, self.br_phys_cls, self.br_tun_cls = (            functools.partial(bridge_classes[b],                              datapath_type=self.conf.OVS.datapath_type)            for b in ('br_int', 'br_phys', 'br_tun'))        self.use_veth_interconnection = use_veth_interconnection        self.veth_mtu = veth_mtu        # local VLAN id范围是[1, 2094]        self.available_local_vlans = set(moves.range(p_const.MIN_VLAN_TAG,                                                     p_const.MAX_VLAN_TAG))        self.tunnel_types = tunnel_types or []        self.l2_pop = l2_population        # TODO(ethuleau): Change ARP responder so it's not dependent on the        #                 ML2 l2 population mechanism driver.        self.enable_distributed_routing = enable_distributed_routing        self.arp_responder_enabled = arp_responder and self.l2_pop        self.prevent_arp_spoofing = prevent_arp_spoofing        if tunnel_types:            self.enable_tunneling = True        else:            self.enable_tunneling = False        # Validate agent configurations        self._check_agent_configurations()        # Keep track of int_br's device count for use by _report_state()        self.int_br_device_count = 0        self.agent_uuid_stamp = uuid.uuid4().int & UINT64_BITMASK        # 创建br-int,重置流表规则等,通过调用brctl, ovs-vsctl, ip等命令实现         self.int_br = self.br_int_cls(integ_br)        self.setup_integration_br()        # Stores port update notifications for processing in main rpc loop        # Stores port update notifications for processing in main rpc loop         self.updated_ports = set()        # Stores port delete notifications        self.deleted_ports = set()        self.network_ports = collections.defaultdict(set)        # keeps association between ports and ofports to detect ofport change        self.vifname_to_ofport_map = {}        # 配置plugin的rpcapi连接(topic='q-plugin',接口neutron.agent.rpc.py:PluginApi)并监听其它服务对agent的rpc的调用(topic='q-agent-notifier')        self.setup_rpc()        self.init_extension_manager(self.connection)        # 配置文件中传入的参数         self.bridge_mappings = bridge_mappings        # 给每个mapping创建一个bridge,并连接到br-int        self.setup_physical_bridges(self.bridge_mappings)        self.local_vlan_map = {}        self._reset_tunnel_ofports()        self.polling_interval = polling_interval        self.minimize_polling = minimize_polling        self.ovsdb_monitor_respawn_interval = ovsdb_monitor_respawn_interval        self.local_ip = local_ip        self.tunnel_count = 0        self.vxlan_udp_port = self.conf.AGENT.vxlan_udp_port        self.dont_fragment = self.conf.AGENT.dont_fragment        self.tunnel_csum = cfg.CONF.AGENT.tunnel_csum        self.tun_br = None        self.patch_int_ofport = constants.OFPORT_INVALID        self.patch_tun_ofport = constants.OFPORT_INVALID        if self.enable_tunneling:            # The patch_int_ofport and patch_tun_ofport are updated            # here inside the call to setup_tunnel_br()            self.setup_tunnel_br(tun_br)        self.dvr_agent = ovs_dvr_neutron_agent.OVSDVRNeutronAgent(            self.context,            self.dvr_plugin_rpc,            self.int_br,            self.tun_br,            self.bridge_mappings,            self.phys_brs,            self.int_ofports,            self.phys_ofports,            self.patch_int_ofport,            self.patch_tun_ofport,            self.conf.host,            self.enable_tunneling,            self.enable_distributed_routing)        self.agent_state = {            'binary': 'neutron-openvswitch-agent',            'host': self.conf.host,            'topic': n_const.L2_AGENT_TOPIC,            'configurations': {'bridge_mappings': bridge_mappings,                               'tunnel_types': self.tunnel_types,                               'tunneling_ip': local_ip,                               'l2_population': self.l2_pop,                               'arp_responder_enabled':                               self.arp_responder_enabled,                               'enable_distributed_routing':                               self.enable_distributed_routing,                               'log_agent_heartbeats':                               self.conf.AGENT.log_agent_heartbeats,                               'extensions': self.ext_manager.names()},            'agent_type': self.conf.AGENT.agent_type,            'start_flag': True}        report_interval = self.conf.AGENT.report_interval        if report_interval:            heartbeat = loopingcall.FixedIntervalLoopingCall(                self._report_state)            heartbeat.start(interval=report_interval)        if self.enable_tunneling:            self.setup_tunnel_br_flows()        self.dvr_agent.setup_dvr_flows()        # Collect additional bridges to monitor        self.ancillary_brs = self.setup_ancillary_bridges(integ_br, tun_br)        # In order to keep existed device's local vlan unchanged,        # restore local vlan mapping at start        self._restore_local_vlan_map()        # Security group agent support        # 创建tunnel的代码省略          # Security group agent supprot        self.sg_agent = sg_rpc.SecurityGroupAgentRpc(self.context,                self.sg_plugin_rpc, self.local_vlan_map,                defer_refresh_firewall=True)        # Initialize iteration counter        self.iter_num = 0        self.run_daemon_loop = True        self.catch_sigterm = False        self.catch_sighup = False        # The initialization is complete; we can start receiving messages        self.connection.consume_in_threads()        self.quitting_rpc_timeout = quitting_rpc_timeout

d. 启动agent.daemon_loop()

OVSNeutronAgent初始化完成后启动agent.daemon_loop()

1) daemon_loop
    def daemon_loop(self):        # Start everything.        LOG.info(_LI("Agent initialized successfully, now running... "))        signal.signal(signal.SIGTERM, self._handle_sigterm)        if hasattr(signal, 'SIGHUP'):            signal.signal(signal.SIGHUP, self._handle_sighup)        with polling.get_polling_manager(            self.minimize_polling,            self.ovsdb_monitor_respawn_interval) as pm:            self.rpc_loop(polling_manager=pm)
2) rpc_loop

rpc_loop()中最重要的两个函数为tunnel_sync(查询并建立隧道)和process_network_ports(处理port和安全组变更)

    def rpc_loop(self, polling_manager=None):        if not polling_manager:            polling_manager = polling.get_polling_manager(                minimize_polling=False)        sync = True        ports = set()        updated_ports_copy = set()        ancillary_ports = set()        tunnel_sync = True        ovs_restarted = False        consecutive_resyncs = 0        need_clean_stale_flow = True        while self._check_and_handle_signal():            if self.fullsync:                LOG.info(_LI("rpc_loop doing a full sync."))                sync = True                self.fullsync = False            port_info = {}            ancillary_port_info = {}            start = time.time()            LOG.debug("Agent rpc_loop - iteration:%d started",                      self.iter_num)            if sync:                LOG.info(_LI("Agent out of sync with plugin!"))                polling_manager.force_polling()                consecutive_resyncs = consecutive_resyncs + 1                if consecutive_resyncs >= constants.MAX_DEVICE_RETRIES:                    LOG.warn(_LW("Clearing cache of registered ports, retrials"                                 " to resync were > %s"),                             constants.MAX_DEVICE_RETRIES)                    ports.clear()                    ancillary_ports.clear()                    sync = False                    consecutive_resyncs = 0            else:                consecutive_resyncs = 0            ovs_status = self.check_ovs_status()            if ovs_status == constants.OVS_RESTARTED:                self.setup_integration_br()                self.setup_physical_bridges(self.bridge_mappings)                if self.enable_tunneling:                    self._reset_tunnel_ofports()                    self.setup_tunnel_br()                    self.setup_tunnel_br_flows()                    tunnel_sync = True                if self.enable_distributed_routing:                    self.dvr_agent.reset_ovs_parameters(self.int_br,                                                 self.tun_br,                                                 self.patch_int_ofport,                                                 self.patch_tun_ofport)                    self.dvr_agent.reset_dvr_parameters()                    self.dvr_agent.setup_dvr_flows()            elif ovs_status == constants.OVS_DEAD:                # Agent doesn't apply any operations when ovs is dead, to                # prevent unexpected failure or crash. Sleep and continue                # loop in which ovs status will be checked periodically.                port_stats = self.get_port_stats({}, {})                self.loop_count_and_wait(start, port_stats)                continue            # Notify the plugin of tunnel IP            if self.enable_tunneling and tunnel_sync:                LOG.info(_LI("Agent tunnel out of sync with plugin!"))                try:                    tunnel_sync = self.tunnel_sync()                except Exception:                    LOG.exception(_LE("Error while synchronizing tunnels"))                    tunnel_sync = True            ovs_restarted |= (ovs_status == constants.OVS_RESTARTED)            if self._agent_has_updates(polling_manager) or ovs_restarted:                try:                    LOG.debug("Agent rpc_loop - iteration:%(iter_num)d - "                              "starting polling. Elapsed:%(elapsed).3f",                              {'iter_num': self.iter_num,                               'elapsed': time.time() - start})                    # Save updated ports dict to perform rollback in                    # case resync would be needed, and then clear                    # self.updated_ports. As the greenthread should not yield                    # between these two statements, this will be thread-safe                    updated_ports_copy = self.updated_ports                    self.updated_ports = set()                    reg_ports = (set() if ovs_restarted else ports)                    # 从br-int确定配置更新或者删除的端口信息                      port_info = self.scan_ports(reg_ports, sync,                                                updated_ports_copy)                    self.process_deleted_ports(port_info)                    ofport_changed_ports = self.update_stale_ofport_rules()                    if ofport_changed_ports:                        port_info.setdefault('updated', set()).update(                            ofport_changed_ports)                    LOG.debug("Agent rpc_loop - iteration:%(iter_num)d - "                              "port information retrieved. "                              "Elapsed:%(elapsed).3f",                              {'iter_num': self.iter_num,                               'elapsed': time.time() - start})                    # Treat ancillary devices if they exist                    if self.ancillary_brs:                        ancillary_port_info = self.scan_ancillary_ports(                            ancillary_ports, sync)                        LOG.debug("Agent rpc_loop - iteration:%(iter_num)d - "                                  "ancillary port info retrieved. "                                  "Elapsed:%(elapsed).3f",                                  {'iter_num': self.iter_num,                                   'elapsed': time.time() - start})                    sync = False                    # Secure and wire/unwire VIFs and update their status                    # on Neutron server                    if (self._port_info_has_changes(port_info) or                        self.sg_agent.firewall_refresh_needed() or                        ovs_restarted):                        LOG.debug("Starting to process devices in:%s",                                  port_info)                        # If treat devices fails - must resync with plugin                        # # If treat devices fails - must resync with plugin                          # 这个方法会从plugin查询port的详情,根据port的admin_state_up状态,分别执行self.port_bound()或者self.port_dead()                          # 并调用plugin rpc的update_device_up或update_device_down方法更新端口状态                          sync = self.process_network_ports(port_info,                                                          ovs_restarted)                        if not sync and need_clean_stale_flow:                            self.cleanup_stale_flows()                            need_clean_stale_flow = False                        LOG.debug("Agent rpc_loop - iteration:%(iter_num)d - "                                  "ports processed. Elapsed:%(elapsed).3f",                                  {'iter_num': self.iter_num,                                   'elapsed': time.time() - start})                    ports = port_info['current']                    if self.ancillary_brs:                        sync |= self.process_ancillary_network_ports(                            ancillary_port_info)                        LOG.debug("Agent rpc_loop - iteration: "                                  "%(iter_num)d - ancillary ports "                                  "processed. Elapsed:%(elapsed).3f",                                  {'iter_num': self.iter_num,                                   'elapsed': time.time() - start})                        ancillary_ports = ancillary_port_info['current']                    polling_manager.polling_completed()                    # Keep this flag in the last line of "try" block,                    # so we can sure that no other Exception occurred.                    if not sync:                        ovs_restarted = False                        self._dispose_local_vlan_hints()                except Exception:                    LOG.exception(_LE("Error while processing VIF ports"))                    # Put the ports back in self.updated_port                    self.updated_ports |= updated_ports_copy                    sync = True            port_stats = self.get_port_stats(port_info, ancillary_port_info)            self.loop_count_and_wait(start, port_stats)

参考:
about云: http://www.aboutyun.com/thread-10306-1-1.html
csdn blog: http://blog.csdn.net/u013920085/article/details/50099147

0 0