Swift源码分析----swift-account-audit(1)

来源：互联网发布：权志龙同款项链淘宝编辑：程序博客网时间：2024/05/21 14:43

感谢朋友支持本博客，欢迎共同探讨交流，由于能力和时间有限，错误之处在所难免，欢迎指正！

如果转载，请保留作者信息。
博客地址：http://blog.csdn.net/gaoxingnengjisuan
邮箱地址：dong.liu@siat.ac.cn

PS：最近没有登录博客，很多朋友的留言没有看见，这里道歉！还有就是本人较少上QQ，可以邮件交流。

由于各种原因，近两个多月没有写博客了，发现以前读源码时领会的东西，渐渐都忘了，所以打算恢复多记录多回顾这个习惯。从这篇博客开始，我将把以前读swift源码过程中领会的东西简单整理一下（之前都只是写在源码注释里面了），不奢求能给大家带来帮助，只是自己的一个记录吧，便于以后回顾之用！理解的错误之处在所难免，希望大家谅解！

概述部分：

这个脚本实现命令行指定账户或容器或对象的审计验证操作；
根据具体参数情况实现操作：
指定object的审计验证；
指定container的审计验证，并实现递归验证container下每个object；
指定account的审计验证，并实现递归验证account下每个container，并且进一步实现递归验证container下每个object；

Examples:
/usr/bin/swift-account-audit SOSO_88ad0b83-b2c5-4fa1-b2d6-60c597202076
/usr/bin/swift-account-audit SOSO_88ad0b83-b2c5-4fa1-b2d6-60c597202076/container/object
/usr/bin/swift-account-audit -e errors.txt SOSO_88ad0b83-b2c5-4fa1-b2d6-60c597202076/container
/usr/bin/swift-account-audit < errors.txt
/usr/bin/swift-account-audit -c 25 -d < errors.txt
这个服务并不是一个守护进程，只有命令行中有/usr/bin/swift-account-audit之后，就会调用这个脚本；

源码解析部分：

if __name__ == '__main__':    try:        optlist, args = getopt.getopt(sys.argv[1:], 'c:r:e:d')    except getopt.GetoptError as err:        print str(err)        print usage        sys.exit(2)    if not args and os.isatty(sys.stdin.fileno()):        print usage        sys.exit()    opts = dict(optlist)    options = {        'concurrency': int(opts.get('-c', 50)),        'error_file': opts.get('-e', None),        'swift_dir': opts.get('-r', '/etc/swift'),        'deep': '-d' in opts,    }    auditor = Auditor(**options)    if not os.isatty(sys.stdin.fileno()):        args = chain(args, sys.stdin)         # 这个循环说明可以在一个命令行中同时进行多个目标的审计验证操作；    for path in args:        path = '/' + path.rstrip('\r\n').lstrip('/')          # 根据具体参数情况实现操作：          # 指定object的审计验证；          # 指定container的审计验证，并实现递归验证container下每个object；          # 指定account的审计验证，并实现递归验证account下每个container，并且进一步实现递归验证container下每个object；        auditor.audit(*split_path(path, 1, 3, True))    auditor.wait()    auditor.print_stats()

1.命令行选项处理；
2.获取类Auditor的实例化对象；
3.auditor.audit(*split_path(path, 1, 3, True))根据命令行中account/container/object参数的不同情况，调用不同的方法，实现account/container/object的审计操作；
4.输出审计结果；

转到3，来看方法audit：

def audit(self, account, container=None, obj=None):        """        根据具体参数情况实现操作：        指定object的审计验证；        指定container的审计验证，并实现递归验证container下每个object；        指定account的审计验证，并实现递归验证account下每个container，并且进一步实现递归验证container下每个object；        """        # 指定object的审计验证；        if obj and container:            self.pool.spawn_n(self.audit_object, account, container, obj)        # 指定container的审计验证，并实现递归验证container下每个object；        elif container:            self.pool.spawn_n(self.audit_container, account, container, True)        # 指定account的审计验证，并实现递归验证account下每个container，并且进一步实现递归验证container下每个object；        else:            self.pool.spawn_n(self.audit_account, account, True)

3.1 audit_object方法实现指定object的审计验证；
3.2 audit_container方法实现指定指定container的审计验证，并实现递归验证container下每个object；
3.3 audit_account方法实现指定account的审计验证，并实现递归验证account下每个container，并且进一步实现递归验证container下每个object；

转到3.1，来看方法audit_object的实现：

def audit_object(self, account, container, name):        """        指定object的审计验证；        """      # 获取指定account和container下的对象具体路径；      path = '/%s/%s/%s' % (account, container, name)              # 获取指定name对象的所有副本的相关节点和分区号；      # 获取account/container/object所对应的分区号和节点（可能是多个，因为分区副本有多个，可能位于不同的节点上）；      # 返回元组（分区，节点信息列表）；      # 在节点信息列表中至少包含id、weight、zone、ip、port、device、meta；      part, nodes = self.object_ring.get_nodes(account, container.encode('utf-8'), name.encode('utf-8'))              # 获取指定account和container下的对象列表；      container_listing = self.audit_container(account, container)      consistent = True      if name not in container_listing:          print "  Object %s missing in container listing!" % path          consistent = False          hash = None      else:          hash = container_listing[name]['hash']              etags = []              #查询每个节点上指定part的信息；      for node in nodes:          try:              if self.deep:                  # 获取到服务的连接；                  conn = http_connect(node['ip'], node['port'], node['device'], part, 'GET', path, {})                  resp = conn.getresponse()                  calc_hash = md5()                  chunk = True                  while chunk:                      chunk = resp.read(8192)                      calc_hash.update(chunk)                  calc_hash = calc_hash.hexdigest()                  if resp.status // 100 != 2:                      self.object_not_found += 1                      consistent = False                      print '  Bad status GETting object "%s" on %s/%s' % (path, node['ip'], node['device'])                      continue                  if resp.getheader('ETag').strip('"') != calc_hash:                      self.object_checksum_mismatch += 1                      consistent = False                      print '  MD5 does not match etag for "%s" on %s/%s' % (path, node['ip'], node['device'])                  etags.append(resp.getheader('ETag'))              else:                  conn = http_connect(node['ip'], node['port'],                                      node['device'], part, 'HEAD',                                      path.encode('utf-8'), {})                  resp = conn.getresponse()                  if resp.status // 100 != 2:                      self.object_not_found += 1                      consistent = False                      print '  Bad status HEADing object "%s" on %s/%s' % (path, node['ip'], node['device'])                      continue                  etags.append(resp.getheader('ETag'))          except Exception:              self.object_exceptions += 1              consistent = False              print '  Exception fetching object "%s" on %s/%s' % (path, node['ip'], node['device'])              continue      if not etags:          consistent = False          print "  Failed fo fetch object %s at all!" % path      elif hash:          for etag in etags:              if resp.getheader('ETag').strip('"') != hash:                  consistent = False                  self.object_checksum_mismatch += 1                  print '  ETag mismatch for "%s" on %s/%s' % (path, node['ip'], node['device'])      if not consistent and self.error_file:          print >>open(self.error_file, 'a'), path      self.objects_checked += 1

3.1.1 获取指定account和container下的对象具体路径；
3.1.2 获取指定name对象的所有副本的相关节点和分区号；
3.1.3 调用方法audit_container实现获取指定account和container下的对象列表，验证当前指定对象是否包含其中；如果确实包含其中，获取对象的hash值；
3.1.4 针对对象的所有副本相关节点，进行遍历，对于每个节点执行以下操作：
（1）如果deep值为True，说明进行深度验证，则通过HTTP应用GET方法远程获取节点的验证响应信息，首先通过响应信息的状态值，判断远程副本对象节点是否存在，再通过比较远程副本对象的ETag和MD5哈希值，判断远程副本对象是否有效；
（2）如果deep值为False，说明不进行深度验证，则通过HTTP应用HEAD方法远程获取节点的响应头信息，通过响应信息的状态值，判断远程副本对象节点是否存在；
3.1.5 比较本地对象的哈希值和各个远程副本对象的ETag，以判断远程副本对象是否有效；

本想把内容都写在一篇博客中，但是几次下来发现篇幅太长真的难以维护格式，所以只能分开多篇来实现了！

0 0