Openstack liberty 云主机迁移源码分析之静态迁移1

来源：互联网发布：国外科学网站知乎编辑：程序博客网时间：2024/05/18 10:54

虚拟机迁移使资源配置更加灵活，尤其是在线迁移，提供了虚拟机的可用性和可靠性。Openstack liberty中提供了两种类型的迁移实现：静态迁移（cold migration）和动态迁移（live migration）。在接下来的几篇文章中，我将详细分析两种迁移的实现过程，先来看静态迁移。

限于篇幅，静态迁移的源码分析将包含两篇文章:

第一篇：主要介绍迁移过程中nova-api及nova-conductor所在的工作
第二篇：重点介绍nova-compute的处理过程

下面请看第一篇的内容：

发起迁移

用户可以手动通过nova CLI命令行发起云主机迁移动作：

#nova --debug migrate 52e4d485-6ccf-47f3-a754-b62649e7b256

上述命令将id=52e4d485-6ccf-47f3-a754-b62649e7b256的云主机迁移到另外一个最优的nova-compute节点上，--debug选项用来显示执行日志：

......curl -g -i -X POST http://controller:8774/v2/eab72784b36040a186a6b88dac9ac0b2/servers/5a7d302f-f388-4ffb-af37-f1e6964b3a51/action -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}8e294a111a5deaa45f6cb0f3c58a600d2b1b0493" -d '{"migrate": null}......

上述截取的日志表明：novaclient通过http方式将迁移请求发送给nova-api并执行migrate动作（action），由nova-api启动时建立的路由映射，很容易的知道，该动作的入口函数为
nova/api/openstack/compute/migrate_server.py/MigrateServerController._migrate，下文具体分析。

源码分析

`nova-api`部分

如上分析，迁移入口如下：

#nova/api/openstack/compute/migrate_server.py/MigrateServerController._migrate， 省略装饰器定义def _migrate(self, req, id, body):    """Permit admins to migrate a server to a new host.    req 是Request对象，包含该次请求信息    id 是待迁移的云主机id 如：52e4d485-6ccf-47f3-a754-b62649e7b256    body 是该次请求的参数信息 {"migrate": null}    """    #从Request对象提取请求上下文    context = req.environ['nova.context']    """执行权限认证，默认会通过读取host节点/etc/nova/policy.json文件    中的权限规则完成认证，如果没有定义相关的规则，则表明认证失败抛抛异    这里对应的认证规则是：    "os_compute_api:os_migrate_server:migrate": rule:admin_api"    """    authorize(context, action='migrate')    #从nova数据库中获取id指向的云主机信息，返回一个InstanceV2对象    instance = common.get_instance(self.compute_api, context, id)    """省略异常处理代码    如果云主机不存在，找不到合适的目标主机，云主机处于锁定状态，    资源不足，云主机状态不对（只能是运行或者停止态）则抛异常    与‘调整云主机大小’（resize）操作一样，也是调用    `/nova/compute/api.py/API.resize`    执行迁移操作，resize是通过判断    是否指定了flavor_id参数来判断是执行‘调整云主机大小’还是‘迁移’操作,    请看下文的具体分析    """    self.compute_api.resize(req.environ['nova.context'], instance)---------------------------------------------------------------#接上文：/nova/compute/api.py/API.resize， 省略装饰器定义def resize(self, context, instance, flavor_id=None,                 clean_shutdown=True,               **extra_instance_updates):    """Resize (ie, migrate) a running instance.    If flavor_id is None, the process is considered a     migration, keeping the original flavor_id. If flavor_id is     not None, the instance should be migrated to a new host and     resized to the new flavor_id.    上面的注释是说：如果flavor_id = None, 则用原有的flavor（配置）执行    迁移操作。如果不为None，则应将云主机迁移到新的主机并应用flavor_id指    定的配置    conext 请求上下文    instance InstanceV2实例对象，包含云主机的详细配置信息    flavor_id 配置模板id，这里为None,因为是迁移操作    clean_shutdown = True, 静态迁移时开启关机重试，如果未能正常关闭云    主机会抛异常    """    #检查系统磁盘的‘自动配置磁盘’功能是否打开，否则抛异常    #迁移完成后，云主机需要能够自动配置系统磁盘    self._check_auto_disk_config(instance, **extra_instance_updates)    #获取云主机配置模板信息    current_instance_type = instance.get_flavor()    # If flavor_id is not provided, only migrate the instance.    #flavor_id = None, 执行迁移操作；打印日志并将当前配置作为迁移后云主    #机的配置    if not flavor_id:        LOG.debug("flavor_id is None. Assuming migration.",                      instance=instance)        new_instance_type = current_instance_type    else:        #从nova.instance_types数据表获取flavor_id指定的配置模板信息        #read_deleted="no"，表示读取数据库时过滤掉已经删除的配置模板        new_instance_type = flavors.get_flavor_by_flavor_id(                    flavor_id, read_deleted="no")        #如果云主机是从镜像启动的并且当前的配置模板中root_gb（根磁盘大        #小）不为0，而目标配置模板中的root_gb=0，则不支持resize操作        #因为不知道怎么分配系统磁盘大小了，抛异常        if (new_instance_type.get('root_gb') == 0 and            current_instance_type.get('root_gb') != 0 and            not self.is_volume_backed_instance(context, instance)):            reason = _('Resize to zero disk flavor is not'                                                     'allowed.')            raise exception.CannotResizeDisk(reason=reason)    #如果没有找到指定的配置模板，抛异常    if not new_instance_type:        raise exception.FlavorNotFound(flavor_id=flavor_id)    #打印debug日志    current_instance_type_name = current_instance_type['name']    new_instance_type_name = new_instance_type['name']    LOG.debug("Old instance type %(current_instance_type_name)s, "                  " new instance type %(new_instance_type_name)s",                  {'current_instance_type_name':                                     current_instance_type_name,                   'new_instance_type_name': new_instance_type_name},                                      instance=instance)    #判断是否是同一配置模板，迁移操作中肯定是同一配置模板    same_instance_type = (current_instance_type['id'] ==                              new_instance_type['id'])    """NOTE(sirp): We don't want to force a customer to change     their flavor when Ops is migrating off of a failed host.    """    #如果是resize操作，新的配置模板被disable了，抛异常    if not same_instance_type and new_instance_type.get('disabled'):            raise exception.FlavorNotFound(flavor_id=flavor_id)    #默认cell关闭，cell_type = None    #这里是说resize的时候，新旧配置模板不能是相同的，因为这样做没有意义    if same_instance_type and flavor_id and         self.cell_type != 'compute':        raise exception.CannotResizeToSameFlavor()    # ensure there is sufficient headroom for upsizes    #如果是resize操作，需要先保留资源配额    if flavor_id:        #获取vcpu和memory的增量配额（如果有的话，新旧配置模板的差值）        deltas = compute_utils.upsize_quota_delta(context,                                  new_instance_type,                                   current_instance_type)        try:            #为当前用户和项目保留资源（增量）配额，更新数据库            quotas = compute_utils.reserve_quota_delta(context,                                                         deltas,                                                      instance)        except exception.OverQuota as exc:            #统计资源不足信息，并打印日志            quotas = exc.kwargs['quotas']            overs = exc.kwargs['overs']            usages = exc.kwargs['usages']            headroom = self._get_headroom(quotas, usages,                                                     deltas)            (overs, reqs, total_alloweds,            useds) = self._get_over_quota_detail(headroom,                                          overs, quotas, deltas)            LOG.warning(_LW("%(overs)s quota exceeded for %"                    "(pid)s, tried to resize instance."),                   {'overs': overs, 'pid': context.project_id})            raise exception.TooManyInstances(overs=overs,                                                 req=reqs,                                                 used=useds,                                      allowed=total_alloweds)    #迁移操作，没有额外的资源需要保留    else:        quotas = objects.Quotas(context=context)    #更新与主机状态：主机状态：重建/迁移，任务状态：准备重建或者迁移    instance.task_state = task_states.RESIZE_PREP    instance.progress = 0    instance.update(extra_instance_updates)    instance.save(expected_task_state=[None])    """为nova-scheduler生成过滤选项，    CONF.allow_resize_to_same_host = true    表示允许迁移的目的主机与源主机相同,否则过滤掉源主机    """    filter_properties = {'ignore_hosts': []}    if not CONF.allow_resize_to_same_host:        filter_properties['ignore_hosts'].append(instance.host)    #默认cell_type = None,     if self.cell_type == 'api':        # Commit reservations early and create migration record.        self._resize_cells_support(context, quotas, instance,                                       current_instance_type,                                       new_instance_type)    #flavor_id = None, 执行迁移操作，否则执行resize    #记录实例操作，更新nova.instance_actions数据表，迁移结束后会更新数    #据库记录，反映迁移结果    if not flavor_id:        self._record_action_start(context, instance,                                      instance_actions.MIGRATE)    else:        self._record_action_start(context, instance,                                      instance_actions.RESIZE)    """将迁移请求转发给    `/nova/conductor/api.py/ComputeTaskAPI.resize_instance`，该    方法直接调用    `nova/conductor/rpcapi.py/ComputeTaskAPI.migrate_server`处理    请求，请看下文的分析    """    scheduler_hint = {'filter_properties': filter_properties}    self.compute_task_api.resize_instance(context, instance,                extra_instance_updates,                 scheduler_hint=scheduler_hint,                flavor=new_instance_type,                reservations=quotas.reservations or [],                clean_shutdown=clean_shutdown)------------------------------------------------------------#接上文：`nova/conductor/rpcapi.py/ComputeTaskAPI.migrate_server`def migrate_server(self, context, instance, scheduler_hint,                   live, rebuild,                  flavor, block_migration, disk_over_commit,                  reservations=None, clean_shutdown=True):    """输入参数如下：    live = False， 静态迁移    rebuild = false， 迁移，而不是resize    block_migration = None, 不是块迁移    disk_over_commit = None    reservations = [] 迁移操作，没有增量保留资源    """    #生成请求参数字典    kw = {'instance': instance, 'scheduler_hint':                                             scheduler_hint,          'live': live, 'rebuild': rebuild, 'flavor': flavor,          'block_migration': block_migration,          'disk_over_commit': disk_over_commit,          'reservations': reservations,          'clean_shutdown': clean_shutdown}    #根据RPCClient的版本兼容性，选择客户端版本。    #在初始化rpc的时候会设置版本兼容特性    version = '1.11'    if not self.client.can_send_version(version):        del kw['clean_shutdown']        version = '1.10'    if not self.client.can_send_version(version):        kw['flavor'] = objects_base.obj_to_primitive(flavor)        version = '1.6'   if not self.client.can_send_version(version):        kw['instance'] = jsonutils.to_primitive(                    objects_base.obj_to_primitive(instance))        version = '1.4'   #通过同步rpc调用将`migrate_server`消息发送给rabbitmq，   #消费者`nova-conductor`将会收到该消息   cctxt = self.client.prepare(version=version)   return cctxt.call(context, 'migrate_server', **kw)

小结：nova-api主要完成实例状态、相关条件检查，之后更新云主机状态及添加nova.instance_actions数据库记录，最后通过同步rpc将请求转发给nova-conductor处理

`nova-conductor`部分

由前述的分析，我们很容易就知道nova-conductor处理迁移请求的入口：

#/nova/conductor/manager.py/ComputeTaskManager.migrate_serverdef migrate_server(self, context, instance, scheduler_hint,             live, rebuild,            flavor, block_migration, disk_over_commit,             reservations=None,            clean_shutdown=True):    """各输入参数来自`nova-api`，如下：    scheduler_hint 调度选项，{u'filter_properties':     {u'ignore_hosts': []}}    live = False， 静态迁移    rebuild = Flase， 迁移而不是调整云主机大小    block_migration = None, 非块迁移    disk_over_commit = None    reservations = [] ，迁移操作没有增量保留资源    """    #如果输入的instance参数不是非法的NovaObject对象，就先从数据库获取    #云主机信息，然后生成InstanceV2对象    if instance and not isinstance(instance, nova_object.NovaObject):        # NOTE(danms): Until v2 of the RPC API, we need to tolerate        # old-world instance objects here        attrs = ['metadata', 'system_metadata', 'info_cache',                     'security_groups']        instance = objects.Instance._from_db_object(                context, objects.Instance(), instance,                expected_attrs=attrs)    # NOTE: Remove this when we drop support for v1 of the RPC API    #如果输入的flavor参数不是合法的Flavor对象，就先从数据库提取指定id    #的配置模板，然后生成Flavor对象    if flavor and not isinstance(flavor, objects.Flavor):        # Code downstream may expect extra_specs to be         #populated since it is receiving an object, so lookup         #the flavor to ensure this.        flavor = objects.Flavor.get_by_id(context, flavor['id'])    #动态迁移，在另外一篇文章中详述    if live and not rebuild and not flavor:        self._live_migrate(context, instance, scheduler_hint,                               block_migration, disk_over_commit)    #调用_cold_migrate执行静态迁移，下文具体分析    elif not live and not rebuild and flavor:        instance_uuid = instance.uuid        #with语句，在迁移前记录迁移事件记录到数据库        #(nova.instance_actions_events)，迁移后更新数据库迁移记录        with compute_utils.EventReporter(context, 'cold_migrate',                                             instance_uuid):        self._cold_migrate(context, instance, flavor,                            scheduler_hint['filter_properties'],                                   reservations, clean_shutdown)    #未知类型                                   else:        raise NotImplementedError()-------------------------------------------------------------#接上文：def _cold_migrate(self, context, instance, flavor,                         filter_properties,                      reservations, clean_shutdown):    #从实例对象中获取所使用的镜像信息，示例如下：    """    {u'min_disk': u'20', u'container_format': u'bare',     u'min_ram': u'0', u'disk_format': u'raw', 'properties':     {u'base_image_ref': u'e0cc468f-6501-4a85-9b19-    70e782861387'}}    """    image = utils.get_image_from_system_metadata(            instance.system_metadata)    #通过镜像属性、云主机属性、云主机配置模板生成请求参数字典，格式如下：    """    request_spec = {            'image': image,            'instance_properties': instance,            'instance_type': flavor,            'num_instances': 1}    """    request_spec = scheduler_utils.build_request_spec(            context, image, [instance], instance_type=flavor)    #生成迁移任务对象    #`/nova/conductor/tasks/migrate.py/MigrationTask    task = self._build_cold_migrate_task(context, instance,                                             flavor,                                             filter_properties,                                              request_spec,                                             reservations,                                              clean_shutdown)    """省略异常处理代码    如果未找到合适的目标主机，策略不合法等异常，则退出    在退出前会更新数据库，设置云主机的状态并打印日志及发送    `compute_task.migrate_server`通知    """    #执行迁移，下文具体分析    task.execute()---------------------------------------------------------------#接上文：`nova/conductor/tasks/migrate.py/MigrationTask._executedef _execute(self):    #从请求参数中获取所使用的镜像信息    image = self.request_spec.get('image')    #根据self.reservations保留配额生成配额对象，    #迁移操作没有保留配额 self.reservations = []    self.quotas = objects.Quotas.from_reservations(self.context,                                       self.reservations,                                     instance=self.instance)    #添加组(group_hosts)及组策略(group_polices)信息到过滤属性（如果有    #的话）    scheduler_utils.setup_instance_group(self.context,                                             self.request_spec,                                             self.filter_properties)    """添加重试参数到过滤属性（如果配置的重试次数     CONF.scheduler_max_attempts 〉1的话），修改后的过滤属性如下：    {'retry': {'num_attempts': 1, 'hosts': []},     u'ignore_hosts': []}    如果是`nova-compute`发送过来的重试请求，输入的filter_properties过    滤属性中的retry字典中包含    前一次请求的异常信息，再次选择目标主机的时候会排除`hosts`中的主机，在    populate_retry过程中，会打印该条异常日志;如果重试超过了最大重试次    数，也会抛异常    """    scheduler_utils.populate_retry(                                    self.filter_properties,                                       self.instance.uuid)    #发送请求给`nova-scheduler`，根据过滤规则选择合适的目标主机，    #如果超时会根据前文的重试参数重试。如果成功，返回合适的目标主机列表    #如果找不到合适的目标主机，抛异常    hosts = self.scheduler_client.select_destinations(            self.context, self.request_spec, self.filter_properties)    #选取第一个    host_state = hosts[0]    #添加目标主机到过滤属性的重试列表（重试的时候'hosts'中的主机被忽    略），示例如下：    """    {'retry': {'num_attempts': 1, 'hosts': [[u'devstack',     u'devstack']]}, 'limits': {u'memory_mb': 11733.0,     u'disk_gb': 1182.0}, u'ignore_hosts': []}    """    scheduler_utils.populate_filter_properties(                                        self.filter_properties,                                                   host_state)    # context is not serializable    self.filter_properties.pop('context', None)    #通过异步rpc调用发送`prep_resize`消息到消息队列，`nova-compute`会    #处理该请求（`nova/compute/rpcapi.py/ComputeAPI`）    (host, node) = (host_state['host'], host_state['nodename'])    self.compute_rpcapi.prep_resize(            self.context, image, self.instance, self.flavor, host,            self.reservations, request_spec=self.request_spec,            filter_properties=self.filter_properties, node=node,            clean_shutdown=self.clean_shutdown)

小结：nova-conductor主要是借助nova-scheduler选择合适的目标主机，同时也会更新nova.instance_actions_events数据表，最后发起异步rpc调用将迁移请求转交给nova-compute处理

到这里静态迁移的前篇就介绍完成了，过程还是比较简单的：主要完成一些条件判断，更新数据库记录，通过nova-scheduler选主，最后将请求转交给nova-compute处理。敬请期待:
Openstack liberty 云主机迁移源码分析之静态迁移2

0 0

Openstack liberty 云主机迁移源码分析之静态迁移1

发起迁移

源码分析

nova-api部分

nova-conductor部分

`nova-api`部分

`nova-conductor`部分