【Nova】nova-compute代码学习1-启动时准备工作

来源:互联网 发布:淘宝dsr影响多大 编辑:程序博客网 时间:2024/06/06 13:10

这篇先学习下nova-compute每次启动时提供服务之前的准备工作

nova-compute支持多种虚拟化驱动,包括LibvirtDriver、XenAPIDriver、FakeDriver、BareMetalDriver、VMwareESXDriver、VMwareVCDriver和HyperVDriver。从驱动名可以看出对应的虚拟化技术,一般在Linux服务器上,我们普遍会使用Qemu和Kvm这两种虚拟化技术(说法可能不太严谨, Kvm其实只是在Qemu的基础上实现了CPU硬件加速),那么选择的驱动应该为LibvirtDriver,下面也会以这个驱动进行讲解。Libvirt是一个虚拟化API库,通过对不同的虚拟化backend进行抽象来提供统一的编程接口,目前支持KVM/QEMU/Xen/Virtuozzo/VMWare ESX/LXC/BHyve等,在nova-compute中就会使用到Libvirt的python-binding。

通过对启动流程的分析,可以大体分为以下几步:

1.进行Libvirt驱动的初始化工作,包括初始化Libvirt连接、创建事件队列和事件调度loop、通过原生线程运行Libvirt自带的事件loop,每当实例有事件发生,那么Libvirt的事件loop就会回调我们注册的方法来将事件放入事件队列中,然后唤醒事件调度loop对事件进行处理。作为管理平台,跟踪实例的状态信息是十分重要的,当我们使用实例时,关闭guest后,可以看到OpenStack通过Libvirt的事件loop能很快更新实例的状态为shutoff。

2.清理主机的已撤离、删除或者残留的实例,并对主机上的现有实例进行初始化操作,譬如我们重启服务器后需要恢复实例原来的状态;在初始化实例时,nova-compute也会保证虚拟网络的网桥、vlan设备等的存在,这也证实了多节点不使用multi_host的可行性,因为nova-compute也有责任创建网桥;

3.根据“是否延迟iptables的应用“配置,我们可以将iptables规则的应用时机延迟到最后一步,那么以上步骤中涉及到iptables规则的操作并不会生效;默认配置是False,也就是不延迟。

# nova-compute使用eventlet的greenpool来实现网络并发, # 但是在调用一些C库的接口时, eventlet的monkey_patch并不能修改它们, 导致# 在协程中调用它们会被阻塞, 所以需要使用eventlet的另一个模块tpool来解决这个# 问题, tpool使用线程池来实现并发from eventlet import tpoolDISABLE_PREFIX = 'AUTO: 'DISABLE_REASON_UNDEFINED = 'None'# tpool.Proxy类的__str__和__repr__内置方法有问题, 这里进行patch操作def patch_tpool_proxy():    def str_method(self):        return str(self._obj)    def repr_method(self):        return repr(self._obj)    tpool.Proxy.__str__ = str_method    tpool.Proxy.__repr__ = repr_methodpatch_tpool_proxy()def libvirt_error_handler(context, err):    pass    class ComputeDriver(object):    # 注册计算服务的事件回调方法    def register_event_listener(self, callback):        self._compute_event_callback = callback        # 处理事件    def emit_event(self, event):        if not self._compute_event_callback:            LOG.debug(_("Discarding event %s") % str(event))            return        if not isinstance(event, virtevent.Event):            raise ValueError(                _("Event must be an instance of nova.virt.event.Event"))        try:            LOG.debug(_("Emitting event %s") % str(event))            # 使用注册的回调函数处理事件            self._compute_event_callback(event)        except Exception as ex:            LOG.error(_("Exception dispatching event %(event)s: %(ex)s"),                      {'event': event, 'ex': ex})    class LibvirtDriver(driver.ComputeDriver):    def __init__(self, virtapi, read_only=False):        ...        self._wrapped_conn = None        self._wrapped_conn_lock = threading.Lock()        self.read_only = read_only        self._event_queue = None                # Libvirt虚拟网卡驱动, 默认libvirt.vif.LibvirtGenericVIFDriver        vif_class = importutils.import_class(CONF.libvirt.vif_driver)        self.vif_driver = vif_class(self._get_connection)                # 防火墙驱动, 这里使用的Libvirt的Iptables防火墙驱动        self.firewall_driver = firewall.load_driver(            DEFAULT_FIREWALL_DRIVER,            self.virtapi,            get_connection=self._get_connection)    # 测试Libvirt连接是否可用    @staticmethod    def _test_connection(conn):        try:            conn.getLibVersion()            return True        except libvirt.libvirtError as e:            if (e.get_error_code() in (libvirt.VIR_ERR_SYSTEM_ERROR,                                       libvirt.VIR_ERR_INTERNAL_ERROR) and                e.get_error_domain() in (libvirt.VIR_FROM_REMOTE,                                         libvirt.VIR_FROM_RPC)):                LOG.debug(_('Connection to libvirt broke'))                return False            raise        # 将event放入事件队列, 并通过管道通知事件调度loop进行处理    def _queue_event(self, event):        if self._event_queue is None:            return        self._event_queue.put(event)        c = ' '.encode()        self._event_notify_send.write(c)        self._event_notify_send.flush()        # 使能/禁用本主机的计算服务    def _set_host_enabled(self, enabled,                          disable_reason=DISABLE_REASON_UNDEFINED):        status_name = {True: 'disabled',                       False: 'enabled'}        disable_service = not enabled        ctx = nova_context.get_admin_context()        try:            service = service_obj.Service.get_by_compute_host(ctx, CONF.host)            # 如果服务的当前状态与将要处于的状态不一样, 那么我们才需要进行操作            if service.disabled != disable_service:                # 如果服务的当前状态是使能, 那么我们就修改数据库中服务的状态为禁用并记录禁用原因;                # 或者服务的当前状态是禁用, 并且禁用原因是以$DISABLE_PREFIX开头, 那么我们就修改                # 数据库中服务的状态为使能并清空禁用原因;                # nova-compute不会擅自做主使能自己                if not service.disabled or (                        service.disabled_reason and                        service.disabled_reason.startswith(DISABLE_PREFIX)):                    service.disabled = disable_service                    service.disabled_reason = (                       DISABLE_PREFIX + disable_reason                       if disable_service else DISABLE_REASON_UNDEFINED)                    service.save()                    LOG.debug(_('Updating compute service status to %s'),                                 status_name[disable_service])                else:                    LOG.debug(_('Not overriding manual compute service '                                'status with: %s'),                                 status_name[disable_service])        except exception.ComputeHostNotFound:            LOG.warn(_('Cannot update service status on host: %s,'                        'since it is not registered.') % CONF.host)        except Exception:            LOG.warn(_('Cannot update service status on host: %s,'                        'due to an unexpected exception.') % CONF.host,                     exc_info=True)        # Libvirt连接关闭时就会调用此方法    def _close_callback(self, conn, reason, opaque):        # 将连接和关闭的原因放入事件队列中        close_info = {'conn': conn, 'reason': reason}        self._queue_event(close_info)        # 每当有domain或实例发生事件时就会调用此方法    @staticmethod    def _event_lifecycle_callback(conn, dom, event, detail, opaque):        self = opaque        # 获取domain的UUID, 这也是OpenStack中实例的UUID        uuid = dom.UUIDString()        # 将Libvirt的domain事件转换为nova-compute的virtevent,        # 并且我们只关注domain的停止、开始、挂起和恢复事件        transition = None        if event == libvirt.VIR_DOMAIN_EVENT_STOPPED:            transition = virtevent.EVENT_LIFECYCLE_STOPPED        elif event == libvirt.VIR_DOMAIN_EVENT_STARTED:            transition = virtevent.EVENT_LIFECYCLE_STARTED        elif event == libvirt.VIR_DOMAIN_EVENT_SUSPENDED:            transition = virtevent.EVENT_LIFECYCLE_PAUSED        elif event == libvirt.VIR_DOMAIN_EVENT_RESUMED:            transition = virtevent.EVENT_LIFECYCLE_RESUMED        if transition is not None:            # 如果是我们感兴趣的事件, 那么将其放入事件队列中            self._queue_event(virtevent.LifecycleEvent(uuid, transition))        # 获取虚拟化技术对应的Libvirt uri    @staticmethod    def uri():        if CONF.libvirt.virt_type == 'uml':            uri = CONF.libvirt.connection_uri or 'uml:///system'        elif CONF.libvirt.virt_type == 'xen':            uri = CONF.libvirt.connection_uri or 'xen:///'        elif CONF.libvirt.virt_type == 'lxc':            uri = CONF.libvirt.connection_uri or 'lxc:///'        else:            uri = CONF.libvirt.connection_uri or 'qemu:///system'        return uri            # 进行Libvirt连接, 并返回连接    @staticmethod    def _connect(uri, read_only):        def _connect_auth_cb(creds, opaque):            if len(creds) == 0:                return 0            LOG.warning(                _("Can not handle authentication request for %d credentials")                % len(creds))            raise exception.NovaException(                _("Can not handle authentication request for %d credentials")                % len(creds))        auth = [[libvirt.VIR_CRED_AUTHNAME,                 libvirt.VIR_CRED_ECHOPROMPT,                 libvirt.VIR_CRED_REALM,                 libvirt.VIR_CRED_PASSPHRASE,                 libvirt.VIR_CRED_NOECHOPROMPT,                 libvirt.VIR_CRED_EXTERNAL],                _connect_auth_cb,                None]        try:            flags = 0            # 判断Libvirt连接是否是只读, 并修改flags            if read_only:                flags = libvirt.VIR_CONNECT_RO            # 这里使用tpool来进行非阻塞的Libvirt连接, 原本的调用方式是            # conn = libvirt.openAuth(uri, auth, flags)            # conn是libvirt.virConnect类的实例;            # 这里的原理是:在当前协程中把连接操作交由线程池去处理, 然后阻塞本协程, 把控制权交还给主循环;            # 如果不这样做, 那么整个进程就会阻塞在这里            # 这里的返回值是conn的Proxy代理, 我们之后如果要调用conn的方法, 那么可以通过此Proxy进行调用,            # 好处是直接调用conn的方法可能会阻塞整个进程, 但是通过Proxy进行调用, 依旧沿用刚才的方式处理, 不会阻塞整个进程            return tpool.proxy_call(                (libvirt.virDomain, libvirt.virConnect),                libvirt.openAuth, uri, auth, flags)        except libvirt.libvirtError as ex:            LOG.exception(_("Connection to libvirt failed: %s"), ex)            payload = dict(ip=LibvirtDriver.get_host_ip_addr(),                           method='_connect',                           reason=ex)            rpc.get_notifier('compute').error(nova_context.get_admin_context(),                                              'compute.libvirt.error',                                              payload)            raise exception.HypervisorUnavailable(host=CONF.host)                # 获取新的Libvirt连接, 并进行callback注册    def _get_new_connection(self):        LOG.debug(_('Connecting to libvirt: %s'), self.uri())        wrapped_conn = None        try:            # 进行Libvirt连接, 返回一个经过封装的连接            wrapped_conn = self._connect(self.uri(), self.read_only)        finally:            # 如果wrapped_conn为空, 说明连接失败, 此时禁用本主机的服务;            # 如果wrapped_conn不为空, 说明连接成功, 此时使能本主机的服务            disable_reason = DISABLE_REASON_UNDEFINED            if not wrapped_conn:                disable_reason = 'Failed to connect to libvirt'            self._set_host_enabled(bool(wrapped_conn), disable_reason)        self._wrapped_conn = wrapped_conn        try:            LOG.debug(_("Registering for lifecycle events %s"), self)            # 这里调用之前不先判断是否为空吗? 一脸问号            # 这里为domain或实例整个生命周期的事件注册callback            wrapped_conn.domainEventRegisterAny(                None,                libvirt.VIR_DOMAIN_EVENT_ID_LIFECYCLE,                self._event_lifecycle_callback,                self)        except Exception as e:            LOG.warn(_("URI %(uri)s does not support events: %(error)s"),                     {'uri': self.uri(), 'error': e})        try:            LOG.debug(_("Registering for connection events: %s") %                      str(self))            # 这里为Libvirt连接的关闭事件注册callback            wrapped_conn.registerCloseCallback(self._close_callback, None)        except (TypeError, AttributeError) as e:            LOG.debug(_("The version of python-libvirt does not support "                        "registerCloseCallback or is too old: %s"), e)        except libvirt.libvirtError as e:            LOG.warn(_("URI %(uri)s does not support connection"                       " events: %(error)s"),                     {'uri': self.uri(), 'error': e})        # 返回封装的连接或None        return wrapped_conn    # 返回已有的Libvirt连接, 在必要时才进行初始化    def _get_connection(self):        # 在多线程情况下, 共享一个Libvirt连接实例;        # 通过线程锁进行同步, 只有当连接为空或者失效的情况下, 才会进行实例化        with self._wrapped_conn_lock:            wrapped_conn = self._wrapped_conn            if not wrapped_conn or not self._test_connection(wrapped_conn):                wrapped_conn = self._get_new_connection()        return wrapped_conn    # 使用_get_connection方法创建_conn属性;    # 之前看到self._conn还以为是一个成员变量或者是方法, 通过"self._conn"和"def _conn"搜索了半天orz    _conn = property(_get_connection)       # 获取本主机的虚拟化能力    def get_host_capabilities(self):        if not self._caps:            # 以XML格式返回本主机的虚拟化能力, 包括CPU架构和特性等主机能力、给定hypervisor下的客户机能力等            xmlstr = self._conn.getCapabilities()            # 由于是XML格式, 需要进行解析            self._caps = vconfig.LibvirtConfigCaps()            self._caps.parse_str(xmlstr)            if hasattr(libvirt, 'VIR_CONNECT_BASELINE_CPU_EXPAND_FEATURES'):                try:                    # 计算出兼容给定的所有CPU且功能最丰富的CPU信息                    # PS: 求出feature的交集?                    features = self._conn.baselineCPU(                        [self._caps.host.cpu.to_xml()],      # 这里是获取主机能力中的CPU节点字符串                        libvirt.VIR_CONNECT_BASELINE_CPU_EXPAND_FEATURES)  # flag: 显示所有feature                    if features and features != -1:                        # 如果获取features成功, 那么重新解析host节点下的cpu节点                        self._caps.host.cpu.parse_str(features)                except libvirt.libvirtError as ex:                    error_code = ex.get_error_code()                    if error_code == libvirt.VIR_ERR_NO_SUPPORT:                        LOG.warn(_LW("URI %(uri)s does not support full set"                                     " of host capabilities: " "%(error)s"),                                     {'uri': self.uri(), 'error': ex})                    else:                        raise        return self._caps    # 进行合适的警告工作    def _do_quality_warnings(self):        caps = self.get_host_capabilities()        arch = caps.host.cpu.arch        # 如果配置的虚拟化技术不是qemu和kvm, 且本主机的cpu架构不是i686和x86_64,        # 那么进行警告: 没有在这种条件下进行测试        if (CONF.libvirt.virt_type not in ('qemu', 'kvm') or                arch not in ('i686', 'x86_64')):            LOG.warning(_('The libvirt driver is not tested on '                          '%(type)s/%(arch)s by the OpenStack project and '                          'thus its quality can not be ensured. For more '                          'information, see: https://wiki.openstack.org/wiki/'                          'HypervisorSupportMatrix'),                        {'type': CONF.libvirt.virt_type, 'arch': arch})    # 初始化事件管道    def _init_events_pipe(self):        # 创建一个原生队列        self._event_queue = native_Queue.Queue()        try:            # 创建事件通知管道            rpipe, wpipe = os.pipe()            self._event_notify_send = greenio.GreenPipe(wpipe, 'wb', 0)            self._event_notify_recv = greenio.GreenPipe(rpipe, 'rb', 0)        except (ImportError, NotImplementedError):            # 以下兼容Windows系统,使用socket替代pipe, 因为Windows下不存在            # 真正的pipe            sock = eventlet_util.__original_socket__(socket.AF_INET,                                                     socket.SOCK_STREAM)            sock.bind(('localhost', 0))            sock.listen(50)            csock = eventlet_util.__original_socket__(socket.AF_INET,                                                      socket.SOCK_STREAM)            csock.connect(('localhost', sock.getsockname()[1]))            nsock, addr = sock.accept()            self._event_notify_send = nsock.makefile('wb', 0)            gsock = greenio.GreenSocket(csock)            self._event_notify_recv = gsock.makefile('rb', 0)        # Libvirt事件循环    def _native_thread(self):        while True:            libvirt.virEventRunDefaultImpl()        def _dispatch_events(self):        try:            # 通过管道等待调度通知            _c = self._event_notify_recv.read(1)            assert _c        except ValueError:            return        last_close_event = None        # 当事件队列不为空        while not self._event_queue.empty():            try:                # 从队列获取一个事件, 这是在协程中, 因此使用非阻塞的方式                event = self._event_queue.get(block=False)                if isinstance(event, virtevent.LifecycleEvent):                    # 处理事件                    self.emit_event(event)                elif 'conn' in event and 'reason' in event:                    # 如果是Libvirt连接关闭事件, 就进行记录                    last_close_event = event            except native_Queue.Empty:                pass        # 当队列为空且没有发生Libvirt连接关闭事件, 那么返回        if last_close_event is None:            return        # 连接被关闭, 那么记录日志并修改服务状态        conn = last_close_event['conn']        with self._wrapped_conn_lock:            if conn == self._wrapped_conn:                reason = last_close_event['reason']                _error = _("Connection to libvirt lost: %s") % reason                LOG.warn(_error)                self._wrapped_conn = None                self._set_host_enabled(False, disable_reason=_error)        # 事件调度循环    def _dispatch_thread(self):        while True:            self._dispatch_events()        def _init_events(self):        # 初始化事件管道        self._init_events_pipe()        LOG.debug(_("Starting native event thread"))        # 创建一个原生线程去处理Libvirt的事件循环        event_thread = native_threading.Thread(target=self._native_thread)        event_thread.setDaemon(True)        event_thread.start()        LOG.debug(_("Starting green dispatch thread"))        # 创建一个greenthread去处理事件调度循序        eventlet.spawn(self._dispatch_thread)        # 获取hypervisor上正在运行的domain ID列表    def list_instance_ids(self):        if self._conn.numOfDomains() == 0:            return []        return self._conn.listDomainsID()        # 通过domain ID获取对应的domain    def _lookup_by_id(self, instance_id):        try:            return self._conn.lookupByID(instance_id)        except libvirt.libvirtError as ex:            error_code = ex.get_error_code()            if error_code == libvirt.VIR_ERR_NO_DOMAIN:                raise exception.InstanceNotFound(instance_id=instance_id)            msg = (_("Error from libvirt while looking up %(instance_id)s: "                     "[Error Code %(error_code)s] %(ex)s")                   % {'instance_id': instance_id,                      'error_code': error_code,                      'ex': ex})            raise exception.NovaException(msg)        # 通过domain名字获取对应的domain    def _lookup_by_name(self, instance_name):        try:            return self._conn.lookupByName(instance_name)        except libvirt.libvirtError as ex:            error_code = ex.get_error_code()            if error_code == libvirt.VIR_ERR_NO_DOMAIN:                raise exception.InstanceNotFound(instance_id=instance_name)            msg = (_('Error from libvirt while looking up %(instance_name)s: '                     '[Error Code %(error_code)s] %(ex)s') %                   {'instance_name': instance_name,                    'error_code': error_code,                    'ex': ex})            raise exception.NovaException(msg)        # 获取hypervisor上的全部domain UUID列表, 这也是OpenStack中对应实例的UUID    def list_instance_uuids(self):        uuids = set()        # 遍历hypervisor上的正在运行的所有domain        for domain_id in self.list_instance_ids():            try:                # domain ID为0代表hypervisor                if domain_id != 0:                    # 通过id获取对应的domain                    domain = self._lookup_by_id(domain_id)                    # 记录domain的UUID                    uuids.add(domain.UUIDString())            except exception.InstanceNotFound:                continue                # 遍历hypervisor上的不处于运行状态的所有domain        for domain_name in self._conn.listDefinedDomains():            try:                # 通过名字获取对应的domain, 并记录其UUID                uuids.add(self._lookup_by_name(domain_name).UUIDString())            except exception.InstanceNotFound:                continue        return list(uuids)        @property    def need_legacy_block_device_info(self):        return False        # 关闭domain, 注:并不是删除    def _destroy(self, instance):        try:            # 通过实例的名字来查找对应的domain            # 注: OpenStack的instances表中是没有name字段的, 只有display_name            # display_name是OpenStack上的实例名字, name是hypervisor上的实例名字,            # 可以配置实例在hypervisor上的名字格式, 通过实例的id与名字格式来生成name            virt_dom = self._lookup_by_name(instance['name'])        except exception.InstanceNotFound:            virt_dom = None        old_domid = -1        if virt_dom is not None:            try:                old_domid = virt_dom.ID()                # 调用domain的destory方法来关闭domain                virt_dom.destroy()                # 如果使用的lxc容器虚拟化技术, 还要销毁容器以避免资源泄漏                if CONF.libvirt.virt_type == 'lxc':                    self._teardown_container(instance)            except libvirt.libvirtError as e:                is_okay = False                errcode = e.get_error_code()                if errcode == libvirt.VIR_ERR_OPERATION_INVALID:                    # 如果domain已经处于shutoff状态了, 那么关闭时是会报错的,                    # 但是这不影响什么                    (state, _max_mem, _mem, _cpus, _t) = virt_dom.info()                    state = LIBVIRT_POWER_STATE[state]                    if state == power_state.SHUTDOWN:                        is_okay = True                elif errcode == libvirt.VIR_ERR_OPERATION_TIMEOUT:                    # 如果关闭超时了, 记录日志并抛出InstancePowerOffFailure异常                    LOG.warn(_("Cannot destroy instance, operation time out"),                            instance=instance)                    reason = _("operation time out")                    raise exception.InstancePowerOffFailure(reason=reason)                if not is_okay:                    # 如果没有关闭成功, 并且是我们预期外的异常, 那么记录日志并抛出                    with excutils.save_and_reraise_exception():                        LOG.error(_('Error from libvirt during destroy. '                                    'Code=%(errcode)s Error=%(e)s'),                                  {'errcode': errcode, 'e': e},                                  instance=instance)        # 撤销domain的定义, 可以理解为从Libvirt数据库中删除domain的信息    # 其实Libvirt有一个目录用于存放定义domain的XML文件    def _undefine_domain(self, instance):        try:            # 使用domain名字查找对应的domain实例            virt_dom = self._lookup_by_name(instance['name'])        except exception.InstanceNotFound:            virt_dom = None        if virt_dom:            try:                try:                    # 调用带flag的undefine接口                    # VIR_DOMAIN_UNDEFINE_MANAGED_SAVE: 一并删除管理所需的保存文件                    virt_dom.undefineFlags(                        libvirt.VIR_DOMAIN_UNDEFINE_MANAGED_SAVE)                except libvirt.libvirtError:                    LOG.debug(_("Error from libvirt during undefineFlags."                        " Retrying with undefine"), instance=instance)                    # 遇到libvirtError尝试使用不带flag的undefine接口进行重试                    virt_dom.undefine()                except AttributeError:                    # 旧版本可能不支持VIR_DOMAIN_UNDEFINE_MANAGED_SAVE                    try:                        # 检查domain是否有管理所需的保存镜像, 如果有就删除                        if virt_dom.hasManagedSaveImage(0):                            virt_dom.managedSaveRemove(0)                    except AttributeError:                        pass                    # 然后再进行undefine操作                    virt_dom.undefine()            except libvirt.libvirtError as e:                with excutils.save_and_reraise_exception():                    errcode = e.get_error_code()                    LOG.error(_('Error from libvirt during undefine. '                                'Code=%(errcode)s Error=%(e)s') %                              {'errcode': errcode, 'e': e}, instance=instance)        # 从虚拟网络中拔掉虚拟网卡    def unplug_vifs(self, instance, network_info, ignore_errors=False):        for vif in network_info:            try:                # 其实这里什么也没做, 当我们使用网桥时, Libvirt会自动为我们创建虚拟网卡;                # 我们关闭并删除domain后, Libvirt会自动为我们删除虚拟网卡                self.vif_driver.unplug(instance, vif)            except exception.NovaException:                if not ignore_errors:                    raise        # 删除实例的文件, 并返回删除结果    def delete_instance_files(self, instance):        # 获取实例的文件在本主机上的存放目录, 即$instances_path/$instance['uuid']        target = libvirt_utils.get_instance_path(instance)        # 如果目录存在        if os.path.exists(target):            LOG.info(_('Deleting instance files %s'), target,                     instance=instance)            try:                # 递归删除目录, 类似rm -r                shutil.rmtree(target)            except OSError as e:                LOG.error(_('Failed to cleanup directory %(target)s: '                            '%(e)s'), {'target': target, 'e': e},                            instance=instance)        # 可能没删除成功        if os.path.exists(target):            LOG.info(_('Deletion of %s failed'), target, instance=instance)            return False        LOG.info(_('Deletion of %s complete'), target, instance=instance)        return True        def _delete_instance_files(self, instance):        context = nova_context.get_admin_context(read_deleted='yes')        inst_obj = instance_obj.Instance.get_by_uuid(context, instance['uuid'])        # 实例的元数据有系统和用户之分, 这里获取系统元数据中的clean_attempts即已尝试清理次数        attempts = int(inst_obj.system_metadata.get('clean_attempts', '0'))        # 删除存储的实例文件        success = self.delete_instance_files(inst_obj)        # 已尝试清理次数加1        inst_obj.system_metadata['clean_attempts'] = str(attempts + 1)        if success:            # 如果删除文件成功, 更新实例的数据库cleaned字段True            inst_obj.cleaned = True        # 保存数据到数据库        inst_obj.save(context)        # 获取实例的逻辑卷    def _lvm_disks(self, instance):        if CONF.libvirt.images_volume_group:            vg = os.path.join('/dev', CONF.libvirt.images_volume_group)            if not os.path.exists(vg):                return []            pattern = '%s_' % instance['uuid']            def belongs_to_instance_legacy(disk):                pattern = '%s_' % instance['name']                if disk.startswith(pattern):                    if CONF.instance_name_template == 'instance-%08x':                        return True                    else:                        LOG.warning(_('Volume %(disk)s possibly unsafe to '                                      'remove, please clean up manually'),                                    {'disk': disk})                return False            def belongs_to_instance(disk):                return disk.startswith(pattern)            def fullpath(name):                return os.path.join(vg, name)            logical_volumes = libvirt_utils.list_logical_volumes(vg)            disk_names = filter(belongs_to_instance, logical_volumes)            disk_names.extend(                filter(belongs_to_instance_legacy, logical_volumes)            )            disks = map(fullpath, disk_names)            return disks        return []        # 清理实例的逻辑卷    def _cleanup_lvm(self, instance):        # 获取实例的逻辑卷        disks = self._lvm_disks(instance)        if disks:            # 删除实例的逻辑卷            libvirt_utils.remove_logical_volumes(*disks)        # 清理实例的RBD    def _cleanup_rbd(self, instance):        pool = CONF.libvirt.images_rbd_pool        volumes = libvirt_utils.list_rbd_volumes(pool)        pattern = instance['uuid']        def belongs_to_instance(disk):            return disk.startswith(pattern)        volumes = filter(belongs_to_instance, volumes)        if volumes:            libvirt_utils.remove_rbd_volumes(pool, *volumes)        def cleanup(self, context, instance, network_info, block_device_info=None,                destroy_disks=True):        # 撤销instance对应的domain的定义        self._undefine_domain(instance)        # 从虚拟网络中删除实例的虚拟网卡        self.unplug_vifs(instance, network_info, ignore_errors=True)        retry = True        while retry:            try:                # 从iptables中删除与实例有关的规则                self.firewall_driver.unfilter_instance(instance,                                                    network_info=network_info)            except libvirt.libvirtError as e:                try:                    # 获取实例在hypervisor上的信息                    state = self.get_info(instance)['state']                except exception.InstanceNotFound:                    state = power_state.SHUTDOWN                if state != power_state.SHUTDOWN:                    LOG.warn(_("Instance may be still running, destroy "                               "it again."), instance=instance)                    self._destroy(instance)                else:                    retry = False                    errcode = e.get_error_code()                    LOG.exception(_('Error from libvirt during unfilter. '                                'Code=%(errcode)s Error=%(e)s') %                              {'errcode': errcode, 'e': e},                              instance=instance)                    reason = "Error unfiltering instance."                    raise exception.InstanceTerminationFailure(reason=reason)            except Exception:                # 如果是libvirtError以外的未知错误, 那就不继续重试, 并抛出异常                retry = False                raise            else:                retry = False        # 以下代码用于处理实例的volumes块设备, 首先从实例detach这些块设备,        # 然后断开对块设备的连接        block_device_mapping = driver.block_device_info_get_mapping(            block_device_info)        for vol in block_device_mapping:            connection_info = vol['connection_info']            disk_dev = vol['mount_device'].rpartition("/")[2]            if ('data' in connection_info and                    'volume_id' in connection_info['data']):                volume_id = connection_info['data']['volume_id']                encryption = encryptors.get_encryption_metadata(                    context, self._volume_api, volume_id, connection_info)                if encryption:                    encryptor = self._get_volume_encryptor(connection_info,                                                           encryption)                    encryptor.detach_volume(**encryption)            try:                self.volume_driver_method('disconnect_volume',                                          connection_info,                                          disk_dev)            except Exception as exc:                with excutils.save_and_reraise_exception() as ctxt:                    if destroy_disks:                        ctxt.reraise = False                        LOG.warn(_("Ignoring Volume Error on vol %(vol_id)s "                                   "during delete %(exc)s"),                                 {'vol_id': vol.get('volume_id'), 'exc': exc},                                 instance=instance)        if destroy_disks:            # 如果需要删除实例的磁盘文件, 那么此处就进行删除操作            self._delete_instance_files(instance)            # 像KVM这些hypervisor支持使用逻辑卷作为实例的虚拟磁盘, 那么也要进行对应的清理            self._cleanup_lvm(instance)            # 如果使用Ceph的RBD块设备作为实例的磁盘文件, 那么也要进行对应的清理            if CONF.libvirt.images_type == 'rbd':                self._cleanup_rbd(instance)        # 彻底删除domain    def destroy(self, context, instance, network_info, block_device_info=None,                destroy_disks=True):        # 首先关闭domain        self._destroy(instance)        # 删除domain并清理domain的磁盘文件、虚拟网卡、防火墙规则等        self.cleanup(context, instance, network_info, block_device_info,                     destroy_disks)        # 为实例在指定的虚拟网络中插入虚拟网卡    def plug_vifs(self, instance, network_info):        for vif in network_info:            # 这里其实就是保证虚拟网络的网桥存在;            # 这里也打消我之前的一个疑问, 在flatdhcp或vlan网络下, 如果multi_host为False, 也就是只有一个nova-network节点,            # 那么nova-compute能否正常工作, 因为网桥没有建? 原来nova-compute也可以创建网桥            self.vif_driver.plug(instance, vif)        # 还原迁移    def finish_revert_migration(self, context, instance, network_info,                                block_device_info=None, power_on=True):        LOG.debug(_("Starting finish_revert_migration"),                   instance=instance)        inst_base = libvirt_utils.get_instance_path(instance)        # resize操作也会使实例迁移        # 我们在迁移之前会备份原来的实例文件        inst_base_resize = inst_base + "_resize"        if os.path.exists(inst_base_resize):            # 如果备份文件在, 那么就清除现在的实例文件            # 并使用备份进行还原            self._cleanup_failed_migration(inst_base)            utils.execute('mv', inst_base_resize, inst_base)        disk_info = blockinfo.get_disk_info(CONF.libvirt.virt_type,                                            instance,                                            block_device_info)        # 这里将实例的网络、磁盘信息、块设备信息等转换为创建domain的Libvirt XML格式字符串        # 之后会详细讲解        xml = self.to_xml(context, instance, network_info, disk_info,                          block_device_info=block_device_info)        # 使用Libvirt XML格式字符串创建domain        self._create_domain_and_network(context, xml, instance, network_info,                                        block_device_info, power_on)        if power_on:            # 这里用于等待直至domain的电源状态为RUNNING            timer = loopingcall.FixedIntervalLoopingCall(                                                    self._wait_for_running,                                                    instance)            timer.start(interval=0.5).wait()        def init_host(self, host):        # 首先为Libvirt注册错误处理器        libvirt.registerErrorHandler(libvirt_error_handler, None)        # 基于poll系统调用注册默认的事件实现        libvirt.virEventRegisterDefaultImpl()        # 进行合适的警告工作        self._do_quality_warnings()        # 判断当前Libvirt库的版本是否高于$MIN_LIBVIRT_VERSION最低版本要求        if not self.has_min_version(MIN_LIBVIRT_VERSION):            # 如果低于, 那么报错            major = MIN_LIBVIRT_VERSION[0]            minor = MIN_LIBVIRT_VERSION[1]            micro = MIN_LIBVIRT_VERSION[2]            LOG.error(_('Nova requires libvirt version '                        '%(major)i.%(minor)i.%(micro)i or greater.'),                      {'major': major, 'minor': minor, 'micro': micro})        # 初始化事件子系统        self._init_events()class ComputeManager(manager.Manager):    def __init__(self, compute_driver=None, *args, **kwargs):        ...        # 加载虚拟化驱动, 我配置的compute_driver=libvirt.LibvirtDriver        # 所以这里加载的是LibvirtDriver        self.driver = driver.load_compute_driver(self.virtapi, compute_driver)        self.use_legacy_block_device_info = self.driver.need_legacy_block_device_info        # 同步实例的电源状态    def _sync_instance_power_state(self, context, db_instance, vm_power_state,                                   use_slave=False):        # 刷新实例的数据库信息        db_instance.refresh(use_slave=use_slave)        db_power_state = db_instance.power_state        vm_state = db_instance.vm_state        # 只负责对本主机的实例同步电源状态        if self.host != db_instance.host:            # 如果同步状态的时候, 实例所在的主机不是本主机了, 那么说明可能被迁移到其他主机了,            # 直接返回即可            LOG.info(_("During the sync_power process the "                       "instance has moved from "                       "host %(src)s to host %(dst)s") %                       {'src': self.host,                        'dst': db_instance.host},                     instance=db_instance)            return        elif db_instance.task_state is not None:            # 如果同步状态的时候, 实例有任务可做, 那么直接返回            LOG.info(_("During sync_power_state the instance has a "                       "pending task (%(task)s). Skip."),                     {'task': db_instance.task_state},                     instance=db_instance)            return        # 实例的电源状态记录与当前的电源状态不一致, 那么需要更新至数据库        if vm_power_state != db_power_state:            db_instance.power_state = vm_power_state            db_instance.save()            db_power_state = vm_power_state        # 不对处于以下状态的实例做操作        if vm_state in (vm_states.BUILDING,                        vm_states.RESCUED,                        vm_states.RESIZED,                        vm_states.SUSPENDED,                        vm_states.ERROR):            pass        elif vm_state == vm_states.ACTIVE:            if vm_power_state in (power_state.SHUTDOWN,                                  power_state.CRASHED):                # 如果实例状态为ACTIVE, 但是电源状态却为SHUTDOWN或CRASHED                LOG.warn(_("Instance shutdown by itself. Calling "                           "the stop API."), instance=db_instance)                try:                    # 关闭实例                    self.compute_api.stop(context, db_instance)                except Exception:                    LOG.exception(_("error during stop() in "                                    "sync_power_state."),                                  instance=db_instance)            elif vm_power_state == power_state.SUSPENDED:                # 如果实例状态为ACTIVE, 但是电源状态却为SUSPENDED                LOG.warn(_("Instance is suspended unexpectedly. Calling "                           "the stop API."), instance=db_instance)                try:                    self.compute_api.stop(context, db_instance)                except Exception:                    LOG.exception(_("error during stop() in "                                    "sync_power_state."),                                  instance=db_instance)            elif vm_power_state == power_state.PAUSED:                # 如果实例状态为ACTIVE, 但是电源状态却为PAUSED                LOG.warn(_("Instance is paused unexpectedly. Ignore."),                         instance=db_instance)            elif vm_power_state == power_state.NOSTATE:                # 如果实例状态为ACTIVE, 但是电源状态却为NOSTATE                LOG.warn(_("Instance is unexpectedly not found. Ignore."),                         instance=db_instance)        elif vm_state == vm_states.STOPPED:            if vm_power_state not in (power_state.NOSTATE,                                      power_state.SHUTDOWN,                                      power_state.CRASHED):                # 如果实例状态为STOPPED, 但是电源状态却不在NOSTATE、SHUTDOWN或CRASHED中                LOG.warn(_("Instance is not stopped. Calling "                           "the stop API."), instance=db_instance)                try:                    # 强制关闭实例, 正常关闭不会试图关闭一个处于STOPPED的实例                    self.compute_api.force_stop(context, db_instance)                except Exception:                    LOG.exception(_("error during stop() in "                                    "sync_power_state."),                                  instance=db_instance)        elif vm_state == vm_states.PAUSED:            if vm_power_state in (power_state.SHUTDOWN,                                  power_state.CRASHED):                # 如果实例状态为PAUSED, 但是电源状态却为SHUTDOWN或CRASHED                LOG.warn(_("Paused instance shutdown by itself. Calling "                           "the stop API."), instance=db_instance)                try:                    # 强制关闭实例                    self.compute_api.force_stop(context, db_instance)                except Exception:                    LOG.exception(_("error during stop() in "                                    "sync_power_state."),                                  instance=db_instance)        elif vm_state in (vm_states.SOFT_DELETED,                          vm_states.DELETED):            if vm_power_state not in (power_state.NOSTATE,                                      power_state.SHUTDOWN):                # 如果实例状态为SOFT_DELETED或DELETED, 但是电源状态却不为NOSTATE或SHUTDOWN                LOG.warn(_("Instance is not (soft-)deleted."),                         instance=db_instance)        # 用于处理实例生命周期内的事件    def handle_lifecycle_event(self, event):        LOG.info(_("VM %(state)s (Lifecycle Event)") %                  {'state': event.get_name()},                 instance_uuid=event.get_instance_uuid())        # 因为实例可能已经被删掉了, 所以需要read_deleted        context = nova.context.get_admin_context(read_deleted='yes')        # 获取发生事件的实例        instance = instance_obj.Instance.get_by_uuid(            context, event.get_instance_uuid())        # 通过事件类型判断实例的电源状态        vm_power_state = None        if event.get_transition() == virtevent.EVENT_LIFECYCLE_STOPPED:            vm_power_state = power_state.SHUTDOWN        elif event.get_transition() == virtevent.EVENT_LIFECYCLE_STARTED:            vm_power_state = power_state.RUNNING        elif event.get_transition() == virtevent.EVENT_LIFECYCLE_PAUSED:            vm_power_state = power_state.PAUSED        elif event.get_transition() == virtevent.EVENT_LIFECYCLE_RESUMED:            vm_power_state = power_state.RUNNING        else:            LOG.warning(_("Unexpected power state %d") %                        event.get_transition())        if vm_power_state is not None:            # 如果实例的电源状态是我们所关注的, 那么进行同步            self._sync_instance_power_state(context,                                            instance,                                            vm_power_state)        def handle_events(self, event):        # 判断事件是否是LifecycleEvent类的实例        if isinstance(event, virtevent.LifecycleEvent):            try:                # 调用handle_lifecycle_event处理事件                self.handle_lifecycle_event(event)            except exception.InstanceNotFound:                LOG.debug(_("Event %s arrived for non-existent instance. The "                            "instance was probably deleted.") % event)        else:            LOG.debug(_("Ignoring event %s") % event)        def init_virt_events(self):        # 注册handle_events为事件回调方法        self.driver.register_event_listener(self.handle_events)        # 获取hypervisor上的实例在数据库中的信息并进行过滤    def _get_instances_on_driver(self, context, filters=None):        if not filters:            filters = {}        try:            # 通过虚拟化驱动获取hypervisor上的全部实例的UUID            driver_uuids = self.driver.list_instance_uuids()            filters['uuid'] = driver_uuids            # 通过过滤条件查找数据库中的本地实例信息并返回            local_instances = instance_obj.InstanceList.get_by_filters(                context, filters, use_slave=True)            return local_instances        except NotImplementedError:            pass        driver_instances = self.driver.list_instances()        instances = instance_obj.InstanceList.get_by_filters(context, filters,                                                             use_slave=True)        name_map = dict((instance.name, instance) for instance in instances)        local_instances = []        for driver_instance in driver_instances:            instance = name_map.get(driver_instance)            if not instance:                continue            local_instances.append(instance)        return local_instances        # 将实例的块设备映射信息转换为驱动的块设备格式    def _get_instance_block_device_info(self, context, instance,                                        refresh_conn_info=False,                                        bdms=None):        if not bdms:            # 获取数据库中实例的块设备映射信息            bdms = (block_device_obj.BlockDeviceMappingList.                    get_by_instance_uuid(context, instance['uuid']))        # 将块设备映射信息转换为swap块设备驱动对象        swap = driver_block_device.convert_swap(bdms)        # 将块设备映射信息转换为ephemerals块设备驱动对象        ephemerals = driver_block_device.convert_ephemerals(bdms)        # volumes块设备驱动对象(snapshots块设备驱动对象和images块设备驱动对象可以归为一类)        block_device_mapping = (            driver_block_device.convert_volumes(bdms) +            driver_block_device.convert_snapshots(bdms) +            driver_block_device.convert_images(bdms))        # 如果不需要刷新volumes块设备驱动对象的连接信息, 那么过滤掉connection_info为空的;        # 如果需要刷新, 那么刷新每个volumes块设备驱动对象的connection_info;        # volumes块设备涉及到OpenStack Cinder块存储组件, 之后学习Cinder时再回归学习这里        if not refresh_conn_info:            block_device_mapping = [                bdm for bdm in block_device_mapping                if bdm.get('connection_info')]        else:            block_device_mapping = driver_block_device.refresh_conn_infos(                block_device_mapping, context, instance, self.volume_api,                self.driver)        # 是否使用遗留的块设备信息, 这里是False        if self.use_legacy_block_device_info:            swap = driver_block_device.legacy_block_devices(swap)            ephemerals = driver_block_device.legacy_block_devices(ephemerals)            block_device_mapping = driver_block_device.legacy_block_devices(                block_device_mapping)        # swap原本为列表, 我们需要swap是单个设备        swap = driver_block_device.get_swap(swap)        # 返回虚拟化驱动格式的块设备信息        return {'swap': swap,                'ephemerals': ephemerals,                'block_device_mapping': block_device_mapping}        # 确认实例是否被存储在共享存储上;    # 原理为: 首先根据使用的镜像格式来判断, 像我们用的Qcow2就不一定需要共享存储, 所以判断不出来;    #         然后, 在本主机的实例存储目录下创建一个临时文件; 然后通过rpc调用让另一主机的nova-compute判断    #         该文件是否存在, 如果存在即为共享存储, 不存在就不是共享存储    def _is_instance_storage_shared(self, context, instance):        shared_storage = True        data = None        try:            data = self.driver.check_instance_shared_storage_local(context,                                                       instance)            if data:                shared_storage = (self.compute_rpcapi.                                  check_instance_shared_storage(context,                                  obj_base.obj_to_primitive(instance),                                  data))        except NotImplementedError:            LOG.warning(_('Hypervisor driver does not support '                          'instance shared storage check, '                          'assuming it\'s not on shared storage'),                        instance=instance)            shared_storage = False        except Exception:            LOG.exception(_('Failed to check if instance shared'),                      instance=instance)        finally:            if data:                self.driver.check_instance_shared_storage_cleanup(context,                                                                  data)        return shared_storage        # 删除撤离的实例    def _destroy_evacuated_instances(self, context):        our_host = self.host        filters = {'deleted': False}        # 获取hypervisor上的实例在数据库中的信息        local_instances = self._get_instances_on_driver(context, filters)        for instance in local_instances:            if instance.host != our_host:                if instance.task_state in [task_states.MIGRATING]:                    # 如果实例的host与本主机不一致, 但是出于迁移状态, 这是正常的, 什么也不做                    LOG.debug('Will not delete instance as its host ('                              '%(instance_host)s) is not equal to our '                              'host (%(our_host)s) but its state is '                              '(%(task_state)s)',                              {'instance_host': instance.host,                               'our_host': our_host,                               'task_state': instance.task_state},                              instance=instance)                    continue                LOG.info(_('Deleting instance as its host ('                           '%(instance_host)s) is not equal to our '                           'host (%(our_host)s).'),                         {'instance_host': instance.host,                          'our_host': our_host}, instance=instance)                # 这种我们认为实例被撤离了, 或者信息有误, 那么删除实例的本地磁盘                destroy_disks = False                try:                    # 通过rpc从nova-network获取实例的网络信息                    network_info = self._get_instance_nw_info(context,                                                              instance)                    # 获取实例的块设备信息                    bdi = self._get_instance_block_device_info(context,                                                               instance)                    # 如果存储实例文件使用的共享存储, 那么就不删除其磁盘文件; 反之则删除                    destroy_disks = not (self._is_instance_storage_shared(                                                            context, instance))                except exception.InstanceNotFound:                    network_info = network_model.NetworkInfo()                    bdi = {}                    LOG.info(_('Instance has been marked deleted already, '                               'removing it from the hypervisor.'),                             instance=instance)                    # 如果实例找不到也就是被删除了, 那么需要删除其磁盘文件                    destroy_disks = True                # 使用Libvirt驱动彻底删除实例                self.driver.destroy(context, instance,                                    network_info,                                    bdi, destroy_disks)        # 在服务初始化时初始化实例    def _init_instance(self, context, instance):        # 如果实例状态为SOFT_DELETED, 或者实例状态为ERROR且任务状态不为RESIZE_MIGRATING或DELETING,        # 那么什么也不做        if (instance.vm_state == vm_states.SOFT_DELETED or            (instance.vm_state == vm_states.ERROR and            instance.task_state not in            (task_states.RESIZE_MIGRATING, task_states.DELETING))):            LOG.debug(_("Instance is in %s state."),                      instance.vm_state, instance=instance)            return        if instance.vm_state == vm_states.DELETED:            try:                # 如果实例的状态为DELETED, 那么完成部分删除工作                # 主要包括: (软)删除实例的数据库相关数据, 更新租户的配额信息                self._complete_partial_deletion(context, instance)            except Exception:                msg = _('Failed to complete a deletion')                LOG.exception(msg, instance=instance)            finally:                return        if (instance.vm_state == vm_states.BUILDING or            instance.task_state in [task_states.SCHEDULING,                                    task_states.BLOCK_DEVICE_MAPPING,                                    task_states.NETWORKING,                                    task_states.SPAWNING]):            # 如果实例状态为BUILDING, 或者实例的任务状态为SCHEDULING/BLOCK_DEVICE_MAPPING/NETWORKING/SPAWNING,            # 那么说明在实例的创建过程中, 本主机的nova-compute重新启动过, 这里安全的做法是设置实例的状态为ERROR            # 并清空实例的任务状态            LOG.debug(_("Instance failed to spawn correctly, "                        "setting to ERROR state"), instance=instance)            instance.task_state = None            instance.vm_state = vm_states.ERROR            instance.save()            return        if (instance.vm_state != vm_states.ERROR and            instance.task_state in [task_states.IMAGE_SNAPSHOT_PENDING,                                    task_states.IMAGE_PENDING_UPLOAD,                                    task_states.IMAGE_UPLOADING,                                    task_states.IMAGE_SNAPSHOT]):            # 如果实例状态不为ERROR, 且实例的任务状态为IMAGE_SNAPSHOT_PENDING/IMAGE_PENDING_UPLOAD/IMAGE_UPLOADING/IMAGE_SNAPSHOT,            # 那么清空实力的任务状态            LOG.debug(_("Instance in transitional state %s at start-up "                        "clearing task state"),                        instance['task_state'], instance=instance)            instance.task_state = None            instance.save()        if instance.task_state == task_states.DELETING:            try:                # 如果实例的任务状态为DELETING, 那我们重新进行删除工作                LOG.info(_('Service started deleting the instance during '                           'the previous run, but did not finish. Restarting '                           'the deletion now.'), instance=instance)                instance.obj_load_attr('metadata')                instance.obj_load_attr('system_metadata')                bdms = (block_device_obj.BlockDeviceMappingList.                        get_by_instance_uuid(context, instance.uuid))                quotas = quotas_obj.Quotas.from_reservations(                        context, None, instance=instance)                self._delete_instance(context, instance, bdms, quotas)            except Exception:                msg = _('Failed to complete a deletion')                LOG.exception(msg, instance=instance)                self._set_instance_error_state(context, instance['uuid'])            finally:                return        # 根据实例的当前电源状态判断其重启类型: 硬重启还是软重启        # 根据实例的当前状态和任务状态判断是否要尝试重启        try_reboot, reboot_type = self._retry_reboot(context, instance)        current_power_state = self._get_power_state(context, instance)        if try_reboot:            LOG.debug(_("Instance in transitional state (%(task_state)s) at "                        "start-up and power state is (%(power_state)s), "                        "triggering reboot"),                       {'task_state': instance['task_state'],                        'power_state': current_power_state},                       instance=instance)            # 如果需要尝试重启, 就进行rpc调用nova-compute的reboot_instance来重启实例;            # 其实这个实例目前就在本主机上, 为何要rpc呢?服务初始化时不能阻塞自己?等到服务启动完成后, 再处理?            self.compute_rpcapi.reboot_instance(context, instance,                                                block_device_info=None,                                                reboot_type=reboot_type)            return        elif (current_power_state == power_state.RUNNING and           instance.task_state in [task_states.REBOOT_STARTED,                                   task_states.REBOOT_STARTED_HARD]):            LOG.warning(_("Instance in transitional state "                          "(%(task_state)s) at start-up and power state "                          "is (%(power_state)s), clearing task state"),                        {'task_state': instance['task_state'],                         'power_state': current_power_state},                        instance=instance)            # 如果实例的当前电源状态为RUNNING且任务状态为REBOOT_STARTED/REBOOT_STARTED_HARD,            # 那么就修改实例状态为ACTIVE斌清空任务状态            instance.task_state = None            instance.vm_state = vm_states.ACTIVE            instance.save()        if instance.task_state == task_states.POWERING_OFF:            try:                LOG.debug(_("Instance in transitional state %s at start-up "                            "retrying stop request"),                            instance['task_state'], instance=instance)                # 为什么stop实例不用rpc呢?orz                self.stop_instance(context, instance)            except Exception:                msg = _('Failed to stop instance')                LOG.exception(msg, instance=instance)            finally:                return        if instance.task_state == task_states.POWERING_ON:            try:                LOG.debug(_("Instance in transitional state %s at start-up "                            "retrying start request"),                            instance['task_state'], instance=instance)                # start实例                self.start_instance(context, instance)            except Exception:                msg = _('Failed to start instance')                LOG.exception(msg, instance=instance)            finally:                return        net_info = compute_utils.get_nw_info_for_instance(instance)        try:            # 为实例在虚拟网络中插入虚拟网卡            self.driver.plug_vifs(instance, net_info)        except NotImplementedError as e:            LOG.debug(e, instance=instance)        except exception.VirtualInterfacePlugException:            LOG.exception(_("Vifs plug failed"), instance=instance)            self._set_instance_error_state(context, instance.uuid)            return        if instance.task_state == task_states.RESIZE_MIGRATING:            try:                power_on = (instance.system_metadata.get('old_vm_state') !=                            vm_states.STOPPED)                block_dev_info = self._get_instance_block_device_info(context,                                                                      instance)                # 如果实例的任务状态为RESIZE_MIGRATING, 说明RESIZE或者MIGRATING操作被中断了,                # 那么就进行恢复操作                self.driver.finish_revert_migration(context,                    instance, net_info, block_dev_info, power_on)            except Exception as e:                LOG.exception(_('Failed to revert crashed migration'),                              instance=instance)            finally:                LOG.info(_('Instance found in migrating state during '                           'startup. Resetting task_state'),                         instance=instance)                instance.task_state = None                instance.save()        db_state = instance.power_state        drv_state = self._get_power_state(context, instance)        expect_running = (db_state == power_state.RUNNING and                          drv_state != db_state)        LOG.debug(_('Current state is %(drv_state)s, state in DB is '                    '%(db_state)s.'),                  {'drv_state': drv_state, 'db_state': db_state},                  instance=instance)        if expect_running and CONF.resume_guests_state_on_host_boot:            LOG.info(_('Rebooting instance after nova-compute restart.'),                     instance=instance)            block_device_info = \                self._get_instance_block_device_info(context, instance)            try:                # 如果实例在数据库中的电源状态为RUNNING并与hypervisor上的电源状态不一致,                # 且配置为在主机启动时恢复客户机状态, 那么我们就恢复实例的状态                self.driver.resume_state_on_host_boot(                    context, instance, net_info, block_device_info)            except NotImplementedError:                LOG.warning(_('Hypervisor driver does not support '                              'resume guests'), instance=instance)            except Exception:                LOG.warning(_('Failed to resume instance'), instance=instance)                self._set_instance_error_state(context, instance.uuid)        elif drv_state == power_state.RUNNING:            try:                # 确保实例的防火墙规则存在                self.driver.ensure_filtering_rules_for_instance(                                       instance, net_info)            except NotImplementedError:                LOG.warning(_('Hypervisor driver does not support '                              'firewall rules'), instance=instance)        def init_host(self):        # 使用虚拟化驱动对本主机进行初始化, 这里调用的LibvirtDriver的init_host方法        self.driver.init_host(host=self.host)        context = nova.context.get_admin_context()        # 获取与本主机有关的所有实例        instances = instance_obj.InstanceList.get_by_host(            context, self.host, expected_attrs=['info_cache'])        # 是否延迟iptables规则的应用, 默认是False        if CONF.defer_iptables_apply:            self.driver.filter_defer_apply_on()                # 初始虚拟化事件子系统        self.init_virt_events()        try:            # 清理本主机上的已撤离或已删除的实例            self._destroy_evacuated_instances(context)            # 对每个与本主机有关的实例进行初始化工作            for instance in instances:                self._init_instance(context, instance)        finally:            if CONF.defer_iptables_apply:                # 如果配置为延迟iptables规则的应用, 那么就在此处应用iptables规则                self.driver.filter_defer_apply_off() 

原创粉丝点击