Android源码解析之repo仓库

来源:互联网 发布:js onclick事件传参 编辑:程序博客网 时间:2024/06/08 03:45

    • Summary
    • About Repo
    • Introduction
      • Repo仓库
      • Manifest仓库
      • Projects仓库集
      • 创建分支
    • Reference

1.Summary

  首先说一下为什么会想分享这篇博客。出发点很简单,只是想学习一下Python在AOSP中的应用。repo应用就是一个研究的切入点。其次Python在深度学习、大数据都有一定的支持,后续会研究一下这方面的技术。最后就是个人喜好,无他。

2.About Repo

  repo就是通过Python封装git命令的应用。什么是repo?简单来说就是对AOSP含有git仓库的各个项目的批处理。repo应用包括repo仓库(仓库也可以叫做项目)、manifest仓库、projectsc仓库集这三个核心。repo仓库都是一些Python文件,manifest仓库只有一个存放AOSP各个子项目元数据的xml文件。projects仓库集是AOSP各个子项目对应的git仓库。
下面用一张图片表示一下。

architecture

补充一点,git是允许repository和working directory分布在不同的目录下的。所以就会看到AOSP的working directory在项目根目录而.git目录在.repo/projects目录

3.Introduction

  先来草率的分析一下,拉取一套AOSP代码应该按照如下流程:

mkdir testsource  #创建AOSP目录。用于存放.repo应用和源码cd testsourcerepo init   -u  https://android.googlesource.com/platform/manifest -b android-4.0.1_r1      cmd     #初始化repo仓库和manifest仓库                      repo sync -j 8         cmd     #同步projects仓库集repo start master --all       cmd     #创建并且切换到新分支上repo仓库初始化--->manifest仓库初始化--->project仓库集初始化--->创建并切换到新分支上

  从数据流自上而下看:

repo command line --->optparse--->git command line

  在Python中使用的是optparse模块(后续将被argparse模块取代)解析命令行,所以optparse模块相当于数据转换中心将repo命名行转成git命令行

repo init

Repo仓库

  接下来就看看具体的细节处理,从repo模块的入口函数main开始,执行的命令行如下:

repo init -u https://android.googlesource.com/platform/manifest -b android-4.0.1_r1

def main(orig_args):  cmd, opt, args = _ParseArguments(orig_args)  repo_main, rel_repo_dir = None, None  # Don't use the local repo copy, make sure to switch to the gitc client first.  if cmd != 'gitc-init':    repo_main, rel_repo_dir = _FindRepo()  wrapper_path = os.path.abspath(__file__)  my_main, my_git = _RunSelf(wrapper_path)  cwd = os.getcwd()  ...  if not repo_main:    if opt.help:      _Usage()    if cmd == 'help':      _Help(args)    if not cmd:      _NotInstalled()    if cmd == 'init' or cmd == 'gitc-init':      if my_git:        _SetDefaultsTo(my_git)      try:        _Init(args, gitc_init=(cmd == 'gitc-init'))      except CloneFailure:        ...        sys.exit(1)      repo_main, rel_repo_dir = _FindRepo()    else:      _NoCommands(cmd)  if my_main:    repo_main = my_main  ver_str = '.'.join(map(str, VERSION))  me = [sys.executable, repo_main,        '--repo-dir=%s' % rel_repo_dir,        '--wrapper-version=%s' % ver_str,        '--wrapper-path=%s' % wrapper_path,        '--']  me.extend(orig_args)  me.extend(extra_args)  try:    os.execv(sys.executable, me)  except OSError as e:    ...    sys.exit(148)

  repo模块函数main(sys.argv[1:]) 参数sys.argv[1:]就是由command、options组成,然后由_ParseArguments函数解析。由于main函数流程复杂,我们考虑的是初次初始化。main函数在调用_Init函数之前对环境进行了检查:repo模块的版本号和路径、.repo/repo/路径下的main模块和git仓库。在_Init函数之后就是执行main模块中的入口函数_Main

_ParseArguments函数的代码如下:

def _ParseArguments(args):  cmd = None  opt = _Options()  arg = []  for i in range(len(args)):    a = args[i]    if a == '-h' or a == '--help':      opt.help = True    elif not a.startswith('-'):      cmd = a      arg = args[i + 1:]      break  return cmd, opt, arg

  _ParseArguments函数解析出cmd、opt、args,其中,cmd是init,args是command(init)后面的参数(-u https://android.googlesource.com/platform/manifest -b android-4.0.1_r1),而opt特指-h(–help)这样的用意在于当你输入repo -h,–help时就可以弹出一些帮助文档。

_FindRepo函数的代码如下:

def _FindRepo():  """Look for a repo installation, starting at the current directory.  """  curdir = os.getcwd()  repo = None  olddir = None  while curdir != '/' \          and curdir != olddir \          and not repo:    repo = os.path.join(curdir, repodir, REPO_MAIN)    if not os.path.isfile(repo):      repo = None      olddir = curdir      curdir = os.path.dirname(curdir)  return (repo, os.path.join(curdir, repodir))

  _FindRepo函数查找当前执行repo命令的目录下.repo/repo/main.py和.repo目录两者是否都存在。

_RunSelf函数的代码如下:

def _RunSelf(wrapper_path):  my_dir = os.path.dirname(wrapper_path)  my_main = os.path.join(my_dir, 'main.py')  my_git = os.path.join(my_dir, '.git')  if os.path.isfile(my_main) and os.path.isdir(my_git):    for name in ['git_config.py',                 'project.py',                 'subcmds']:      if not os.path.exists(os.path.join(my_dir, name)):        return None, None    return my_main, my_git  return None, None

  _RunSelf函数检查repo模块的同级目录里是否有三个文件main.py 、git_config.py、project.py 和两个目录subcmds、.git。这次是查找运行中模块repo的同级目录,是否具备三个文件两个目录,如有具备这些,则.repo仓库之前就已经被初始化过了。反之,接下去就会初始化仓库。

  接下来的各种控制流判断,取其中两个关键函数_SetDefaultsTo和_Init来详细讲解

  ...  if not repo_main:    ...    if cmd == 'init' or cmd == 'gitc-init':      if my_git:        _SetDefaultsTo(my_git)      try:        _Init(args, gitc_init=(cmd == 'gitc-init'))      ...

  repo_main,cmd,my_git这三个变量我们前面已经说过了它们的由来,其中的my_git如果存在,调用_SetDefaultsTo函数会设置数据源,反之,就是初次初始化,使用默认的数据源(REPO_URL = ‘https://gerrit.googlesource.com/git-repo’ ),那么就会克隆一个.repo/repo/仓库

_SetDefaultsTo函数

def _SetDefaultsTo(gitdir):  global REPO_URL  global REPO_REV  REPO_URL = gitdir  proc = subprocess.Popen([GIT,                           '--git-dir=%s' % gitdir,                           'symbolic-ref',                           'HEAD'],                          stdout=subprocess.PIPE,                          stderr=subprocess.PIPE)  REPO_REV = proc.stdout.read().strip()  proc.stdout.close()  proc.stderr.read()  proc.stderr.close()  if proc.wait() != 0:    _print('fatal: %s has no current branch' % gitdir, file=sys.stderr)    sys.exit(1)

  –git-dir 指定git仓库的位置,symbolic-ref 指定当前分支为克隆分支这两者的值通过关键词global变成全局变量。

接下来就是核心函数_Init如下:

def _Init(args, gitc_init=False):  """Installs repo by cloning it over the network.  """  ...  opt, args = init_optparse.parse_args(args)  if args:    init_optparse.print_usage()    sys.exit(1)  url = opt.repo_url  if not url:    url = REPO_URL    extra_args.append('--repo-url=%s' % url)  branch = opt.repo_branch  if not branch:    branch = REPO_REV    extra_args.append('--repo-branch=%s' % branch)  if branch.startswith('refs/heads/'):    branch = branch[len('refs/heads/'):]  ...  try:    ...    os.mkdir(repodir)  except OSError as e:    if e.errno != errno.EEXIST:      ...      sys.exit(1)  _CheckGitVersion()  try:    if NeedSetupGnuPG():      can_verify = SetupGnuPG(opt.quiet)    else:      can_verify = True    dst = os.path.abspath(os.path.join(repodir, S_repo))    _Clone(url, dst, opt.quiet, not opt.no_clone_bundle)    if not os.path.isfile('%s/repo' % dst):      _print("warning: '%s' does not look like a git-repo repository, is "             "REPO_URL set correctly?" % url, file=sys.stderr)    if can_verify and not opt.no_repo_verify:      rev = _Verify(dst, branch, opt.quiet)    else:      rev = 'refs/remotes/origin/%s^0' % branch    _Checkout(dst, branch, rev, opt.quiet)  except CloneFailure:    ...

  _Init函数的参数 args=[-u,https://android.googlesource.com/platform/manifest,-b,android-4.0.1_r1] ,使用OptionParse类的成员函数parse_args解析得到opt对象(存储有url地址和分支号)和args列表(其值不为空时,便会停止创建仓库的进程)。既然得到了opt对象,那么接下来就要通过指定url地址和分支号去获取repo仓库、manifest仓库,由于命令行只有manifest仓库的地址,那么是不是就没有办法获取repo仓库了吗?google提供了自家repo仓库的url地址(https://gerrit.googlesource.com/git-repo)供开发者使用,也可以使用–repo-url选项指定自家公司repo仓库的url地址。接下来检查和配置环境

  • 1.需要支持1.7.2以上的git版本
  • 2.没有配置GnuPG的环境下,自动生成GnuPG文件

  当GnuPG的环境配置好了,就会返回一个值can_verify,用于判断克隆完repo仓库后验证最新的tag是否被GunPG签过名,然后将克隆下来的repo仓库使用_Checkout函数切换到这个最新的tag,以便于使用最新的release版本(一般发布一个release版本都会打上一个tag),这就是这个tag的用意。

  接下来我们就来看看函数_Init调用的两个核心函数_Clone和_Checkout

在这之前我们来看看git对远程仓库的操作图

git operation flowchart

_Clone函数的代码如下:

def _Clone(url, local, quiet, clone_bundle):  """Clones a git repository to a new subdirectory of repodir  """  try:    os.mkdir(local)  except OSError as e:    ...    raise CloneFailure()  cmd = [GIT, 'init', '--quiet']  try:    proc = subprocess.Popen(cmd, cwd=local)  except OSError as e:    ...    raise CloneFailure()  if proc.wait() != 0:    ...    raise CloneFailure()  _InitHttp()  _SetConfig(local, 'remote.origin.url', url)  _SetConfig(local,             'remote.origin.fetch',             '+refs/heads/*:refs/remotes/origin/*')  if clone_bundle and _DownloadBundle(url, local, quiet):    _ImportBundle(local)  _Fetch(url, local, 'origin', quiet)

  这里简单说一下_Clone函数的流程图。

创建git仓库(git init)---> 初始化http网络 ----> 配置远程仓库url地址、分支名(git config) ---> fetch记录从remote repository到local repository(git fetch)

  在有网络的条件下可以从远程仓库克隆代码,但是如果离线了怎么办?git给我们提供了一种bundle机制。
_DownloadBundle函数的代码如下:

def _DownloadBundle(url, local, quiet):  if not url.endswith('/'):    url += '/'  url += 'clone.bundle'  proc = subprocess.Popen(      [GIT, 'config', '--get-regexp', 'url.*.insteadof'],      cwd=local,      stdout=subprocess.PIPE)  for line in proc.stdout:    m = re.compile(r'^url\.(.*)\.insteadof (.*)$').match(line)    if m:      new_url = m.group(1)      old_url = m.group(2)      if url.startswith(old_url):        url = new_url + url[len(old_url):]        break  proc.stdout.close()  proc.wait()  if not url.startswith('http:') and not url.startswith('https:'):    return False  dest = open(os.path.join(local, '.git', 'clone.bundle'), 'w+b')  try:    try:      r = urllib.request.urlopen(url)    except urllib.error.HTTPError as e:      if e.code in [401, 403, 404, 501]:        return False      ...      raise CloneFailure()    except urllib.error.URLError as e:      ...      raise CloneFailure()    try:      if not quiet:        _print('Get %s' % url, file=sys.stderr)      while True:        buf = r.read(8192)        if buf == '':          return True        dest.write(buf)    finally:      r.close()  finally:    dest.close()

  在使用git fetch获取remote repository记录到local repository之前,其实代码的来源还可以从bundle获取。生成bundle之前需要在有网络的条件下,将远程仓库的记录存储在bundle中。
  最后会调用_ImportBundle函数导入数据。这种导入方式的应用场景在于环境处于脱机状态,便可以从其他的机器拷贝一份bundle导入到自己的仓库中。_ImportBundle函数是对_Fetch函数进行包装,其中最为重要的就是第三个参数,指定了要导入到local repository的数据来源路径,可以是网络的 url 的仓库名,也可以是本地的bundle路径

_Checkout函数

def _Checkout(cwd, branch, rev, quiet):  """Checkout an upstream branch into the repository and track it.  """  cmd = [GIT, 'update-ref', 'refs/heads/default', rev]  if subprocess.Popen(cmd, cwd=cwd).wait() != 0:    raise CloneFailure()  _SetConfig(cwd, 'branch.default.remote', 'origin')  _SetConfig(cwd, 'branch.default.merge', 'refs/heads/%s' % branch)  cmd = [GIT, 'symbolic-ref', 'HEAD', 'refs/heads/default']  if subprocess.Popen(cmd, cwd=cwd).wait() != 0:    raise CloneFailure()  cmd = [GIT, 'read-tree', '--reset', '-u']  if not quiet:    cmd.append('-v')  cmd.append('HEAD')  if subprocess.Popen(cmd, cwd=cwd).wait() != 0:    raise CloneFailure()

  该函数对git chechout的底层函数进行封装,功能和git checkout切分支是一样的,至此我们的_Init函数就执行完了,并且得到了repo仓库了那么接下来就是要得到manifest仓库了

  ...  ver_str = '.'.join(map(str, VERSION))  me = [sys.executable, repo_main,        '--repo-dir=%s' % rel_repo_dir,        '--wrapper-version=%s' % ver_str,        '--wrapper-path=%s' % wrapper_path,        '--']  me.extend(orig_args)  me.extend(extra_args)  try:    os.execv(sys.executable, me)  ...

Manifest仓库

  接下来就是执行main模块函数_Main,执行时命令行如下:

/home/.../.repo/repo/main.py --repo-dir=/home/.../.repo --wrapper-version=1.0 --wrapper-path=/usr/bin/repo -- init -u xxxx -b xxx

  其参数argv经过repo模块的扩展,添加了三个信息

  • .repo目录的绝对路径
  • repo模块内部定义的版本号
  • repo模块的绝对路径

经过repo模块添加的信息用来检查是否有可用的repo和执行main模块,这是命令行的前部分,而后半部分(init -u xxxx -b xxx)供直接或间接以Command为基类的衍生类的成员函数Execute调用放置在.repo/repo/subcmds/目录下的*.py模块。repo脚本能执行的命令都是放在该目录下的,一个Python文件对应一个repo命令。比如:”repo init”表示要执行的模块在.repo/repo/subcmds/init.py。

_Main函数的代码如下:

def _Main(argv):  result = 0  opt = optparse.OptionParser(usage="repo wrapperinfo -- ...")  opt.add_option("--repo-dir", dest="repodir",                 help="path to .repo/")  opt.add_option("--wrapper-version", dest="wrapper_version",                 help="version of the wrapper script")  opt.add_option("--wrapper-path", dest="wrapper_path",                 help="location of the wrapper script")  _PruneOptions(argv, opt)  opt, argv = opt.parse_args(argv)  _CheckWrapperVersion(opt.wrapper_version, opt.wrapper_path)  _CheckRepoDir(opt.repodir)  Version.wrapper_version = opt.wrapper_version  Version.wrapper_path = opt.wrapper_path  repo = _Repo(opt.repodir)  try:    try:      init_ssh()      init_http()      result = repo._Run(argv) or 0    finally:      close_ssh()  except KeyboardInterrupt:    ...    result = 1  except ManifestParseError as mpe:    ...    result = 1  except RepoChangedException as rce:    # If repo changed, re-exec ourselves.    #    argv = list(sys.argv)    argv.extend(rce.extra_args)    try:      os.execv(__file__, argv)    except OSError as e:      ...      result = 128  sys.exit(result)if __name__ == '__main__':  _Main(sys.argv[1:])

  _Main函数的重点部分在于repo调用_Repo类中的成员函数_Run,而前期也如repo和main两个模块一样做一些必要的检查。修剪命令行的_PruneOptions函数、解析命令的parse_args函数(opt为”–”之前的内容,argv”为–”之后的内容)、检查repo模块版本的_CheckWrapperVersion函数、检查 .repo目录是否存在的_CheckRepoDir函数。

_Repo类的代码如下:

from subcmds import all_commandsclass _Repo(object):  def __init__(self, repodir):    self.repodir = repodir    self.commands = all_commands    # add 'branch' as an alias for 'branches'    all_commands['branch'] = all_commands['branches']  def _Run(self, argv):    result = 0    name = None    glob = []    for i in range(len(argv)):      if not argv[i].startswith('-'):        name = argv[i]        if i > 0:          glob = argv[:i]        argv = argv[i + 1:]        break    if not name:      glob = argv      name = 'help'      argv = []    gopts, _gargs = global_options.parse_args(glob)    ...    try:      cmd = self.commands[name]    except KeyError:      ...      return 1    cmd.repodir = self.repodir    cmd.manifest = XmlManifest(cmd.repodir)    ...    Editor.globalConfig = cmd.manifest.globalConfig    ...    try:      copts, cargs = cmd.OptionParser.parse_args(argv)      copts = cmd.ReadEnvironmentOptions(copts)    except NoManifestException as e:      ...      return 1    ...    start = time.time()    try:      result = cmd.Execute(copts, cargs)    except (DownloadError, ManifestInvalidRevisionError,        NoManifestException) as e:      ...      result = 1    except NoSuchProjectError as e:      ...      result = 1    except InvalidProjectGroupsError as e:      ...      result = 1    finally:      elapsed = time.time() - start      hours, remainder = divmod(elapsed, 3600)      minutes, seconds = divmod(remainder, 60)      if gopts.time:        if hours == 0:          print('real\t%dm%.3fs' % (minutes, seconds), file=sys.stderr)        else:          print('real\t%dh%dm%.3fs' % (hours, minutes, seconds),                file=sys.stderr)    return result

  _Repo类有两个成员变量repodir、commands和一个类变量all_commands,其中all_commands字典的值是一些repo脚本能够执行命令的类名。那这些值是怎么来的呢 ? 在 from subcmds import all_commands 时,就会初始化subcmds包,将subcmds目录下所有模块名的首字母转化为大写其余字母不变,就成了命令的类名。再结合成员函数_Run,可以知道,该类的作用在于,将解析后的cmd分发到包subcmds下所对应的模块里面的类(比如:init指令—>subcmds/init.py里面的Init类)。
  _Repo类的成员函数_Run主要是初始化XmlManifest,获取某个指令独有OptionParse并解析指令,调用Command类的成员函数Execute。

其中XmlManifest类用于管理 .repo,XmlManifest类的代码如下:

class XmlManifest(object):  """manages the repo configuration file"""  def __init__(self, repodir):    self.repodir = os.path.abspath(repodir)    self.topdir = os.path.dirname(self.repodir)    self.manifestFile = os.path.join(self.repodir, MANIFEST_FILE_NAME)    self.globalConfig = GitConfig.ForUser()    self.localManifestWarning = False    self.isGitcClient = False    self.repoProject = MetaProject(self, 'repo',      gitdir   = os.path.join(repodir, 'repo/.git'),      worktree = os.path.join(repodir, 'repo'))    self.manifestProject = MetaProject(self, 'manifests',      gitdir   = os.path.join(repodir, 'manifests.git'),      worktree = os.path.join(repodir, 'manifests'))

XmlManifest类在manifest_xml模块里面,XmlManifest类的主要成员变量有:

  • repodir:.repo目录的绝对路径
  • topdir:AOSP项目的绝对路径(testsource目录绝对路径)
  • manifestFile:.repo目录下的链接文件manifest.xml
  • repoProject: .repo目录下的repo仓库
  • manifestProject:.repo目录下的manifest仓库

类中还提供了对.repo的属性值和对属性值操作的成员函数,比如加载数据到XmlManifest对象(_Load成员函数)和重置数据(_Unload成员函数),创建manifest.xml链接文件(Link成员函数),获取projects目录下的仓库对象(GetProjectsWithName,GetProjectPaths成员函数)。所以不难看出该类就是对.repo目录的管理工具。我们在继续看一下该类中重要的成员变量repoProject、manifestProject,都是MetaProject类的对象.

MetaProject类的代码如下

class MetaProject(Project):  """A special project housed under .repo.  """  def __init__(self, manifest, name, gitdir, worktree):    Project.__init__(self,                     manifest=manifest,                     name=name,                     gitdir=gitdir,                     objdir=gitdir,                     worktree=worktree,                     remote=RemoteSpec('origin'),                     relpath='.repo/%s' % name,                     revisionExpr='refs/heads/master',                     revisionId=None,                     groups=None)

成员变量如下:

  • manifest:是XmlManifest类的对象
  • name:创建新仓库的名字
  • gitdir: .git仓库的绝对路径
  • worktree:工作目录
  • remote:远程仓库
  • relpath:创建新仓库的相对于.repo目录的路径
  • revisionExpr: 分支

MetaProject和Project对于仓库的操作逻辑差不多一样,不过为了体现这两个仓库(repo仓库和manifest仓库)在AOSP项目整个仓库集的重要性,才会有这样的命名。

Project类的代码如下:

class Project(object):  # These objects can be shared between several working trees.  shareable_files = ['description', 'info']  shareable_dirs = ['hooks', 'objects', 'rr-cache', 'svn']  # These objects can only be used by a single working tree.  working_tree_files = ['config', 'packed-refs', 'shallow']  working_tree_dirs = ['logs', 'refs']  def __init__(self,               manifest,               name,               remote,               gitdir,               objdir,               worktree,               relpath,               revisionExpr,               revisionId,               rebase=True,               groups=None,               sync_c=False,               sync_s=False,               clone_depth=None,               upstream=None,               parent=None,               is_derived=False,               dest_branch=None,               optimized_fetch=False,               old_revision=None):    """Init a Project object.    Args:      manifest: The XmlManifest object.      name: The `name` attribute of manifest.xml's project element.      remote: RemoteSpec object specifying its remote's properties.      gitdir: Absolute path of git directory.      objdir: Absolute path of directory to store git objects.      worktree: Absolute path of git working tree.      relpath: Relative path of git working tree to repo's top directory.      revisionExpr: The `revision` attribute of manifest.xml's project element.      revisionId: git commit id for checking out.      rebase: The `rebase` attribute of manifest.xml's project element.      groups: The `groups` attribute of manifest.xml's project element.      sync_c: The `sync-c` attribute of manifest.xml's project element.      sync_s: The `sync-s` attribute of manifest.xml's project element.      upstream: The `upstream` attribute of manifest.xml's project element.      parent: The parent Project object.      is_derived: False if the project was explicitly defined in the manifest;                  True if the project is a discovered submodule.      dest_branch: The branch to which to push changes for review by default.      optimized_fetch: If True, when a project is set to a sha1 revision, only                       fetch from the remote if the sha1 is not present locally.      old_revision: saved git commit id for open GITC projects.    """    self.manifest = manifest    self.name = name    self.remote = remote    self.gitdir = gitdir.replace('\\', '/')    self.objdir = objdir.replace('\\', '/')    if worktree:      self.worktree = worktree.replace('\\', '/')    else:      self.worktree = None    self.relpath = relpath    self.revisionExpr = revisionExpr    if   revisionId is None \     and revisionExpr \     and IsId(revisionExpr):      self.revisionId = revisionExpr    else:      self.revisionId = revisionId    self.rebase = rebase    self.groups = groups    self.sync_c = sync_c    self.sync_s = sync_s    self.clone_depth = clone_depth    self.upstream = upstream    self.parent = parent    self.is_derived = is_derived    self.optimized_fetch = optimized_fetch    self.subprojects = []    self.snapshots = {}    self.copyfiles = []    self.linkfiles = []    self.annotations = []    self.config = GitConfig.ForRepository(                    gitdir=self.gitdir,                    defaults=self.manifest.globalConfig)    if self.worktree:      self.work_git = self._GitGetByExec(self, bare=False, gitdir=gitdir)    else:      self.work_git = None    self.bare_git = self._GitGetByExec(self, bare=True, gitdir=gitdir)    self.bare_ref = GitRefs(gitdir)    self.bare_objdir = self._GitGetByExec(self, bare=True, gitdir=objdir)    self.dest_branch = dest_branch    self.old_revision = old_revision    # This will be filled in if a project is later identified to be the    # project containing repo hooks.    self.enabled_repo_hooks = []

  Project是用来描述AOSP项目某一个仓库(或者说项目),其中有几个重要的值是来源于manifest.xml, name,revisionExpr,rebase,groups,sync_c,sync_s,upstream 这几个值对应到manifest.xml中某个标签的属性值,后续我们在克隆projects仓库集会讲解manifest标签和属性的用途。所以AOSP项目的仓库信息都在manifest.xml,除了repo仓库和manifest仓库,这些信息是在我们使用”repo sync”时会用到。

  现在我们回到成员函数_Run的流程中,XmlManifest类已经构造完了。cmd.OptionParser.parse_args(argv) ,再去获取每个指令独有的OptionParser并且解析指令 init -u xxxx -b xxx

OptionParser属性函数的代码如下:

class Command(object):  """Base class for any command line action in repo.  """  ... @property  def OptionParser(self):    if self._optparse is None:      try:        me = 'repo %s' % self.NAME        usage = self.helpUsage.strip().replace('%prog', me)      except AttributeError:        usage = 'repo %s' % self.NAME      self._optparse = optparse.OptionParser(usage=usage)      self._Options(self._optparse)    return self._optparse  ...  def _Options(self, p):    """Initialize the option parser.    """

Command的衍生类重写了基类的_Options,定义了属于自己的options,先留个坑后面讲到”repo sync”的时候再分析。

创建完XmlManifest类,解析命令行后,接下来就是调用Execute。

  Command类是所有命令(init、sync、start)的基类,其成员函数Execute被其衍生类重写,故调用成员函数Execute就可以执行某个命令对应的成员函数Execute。所以,执行到这一行 result = cmd.Execute(copts, cargs) 的时候,就是整个架构的分水岭了。下面的图片是对前面的总结。

![repo _Repo#_Run flowchart]({{site.baseurl}}/images/2017-04-12/2017-04-12-repo__Repo_Run_flowchart.png){:.white-bg-image}

接下来就是执行init模块中Init类的成员函数Execute:

class Init(InteractiveCommand, MirrorSafeCommand):  ...  def Execute(self, opt, args):    git_require(MIN_GIT_VERSION, fail=True)    if opt.reference:      opt.reference = os.path.expanduser(opt.reference)    # Check this here, else manifest will be tagged "not new" and init won't be    # possible anymore without removing the .repo/manifests directory.    if opt.archive and opt.mirror:      ...      sys.exit(1)    self._SyncManifest(opt)    self._LinkManifest(opt.manifest_name)    if os.isatty(0) and os.isatty(1) and not self.manifest.IsMirror:      if opt.config_name or self._ShouldConfigureUser():        self._ConfigureUser()      self._ConfigureColor()    self._ConfigureDepth(opt)    self._DisplayResult()

  Init类的成员函数Execute的重点在于两个成员函数_SyncManifest和_LinkManifest,前者会克隆出manifest仓库并且切换到可用的分支上,后者会通过os模块symlink函数生成链接文件manifest.xml。

_SyncManifest函数的代码如下:

class Init(InteractiveCommand, MirrorSafeCommand):  ...  def _SyncManifest(self, opt):    m = self.manifest.manifestProject    is_new = not m.Exists    if is_new:      ...      m._InitGitDir(mirror_git=mirrored_manifest_git)      if opt.manifest_branch:        m.revisionExpr = opt.manifest_branch      else:        m.revisionExpr = 'refs/heads/master'    else:      if opt.manifest_branch:        m.revisionExpr = opt.manifest_branch      else:        m.PreSync()    ...    if not m.Sync_NetworkHalf(is_new=is_new, quiet=opt.quiet,        clone_bundle=not opt.no_clone_bundle,        current_branch_only=opt.current_branch_only,        no_tags=opt.no_tags):      r = m.GetRemote(m.remote.name)      print('fatal: cannot obtain manifest %s' % r.url, file=sys.stderr)      # Better delete the manifest git dir if we created it; otherwise next      # time (when user fixes problems) we won't go through the "is_new" logic.      if is_new:        shutil.rmtree(m.gitdir)      sys.exit(1)    if opt.manifest_branch:      m.MetaBranchSwitch()    syncbuf = SyncBuffer(m.config)    m.Sync_LocalHalf(syncbuf)    syncbuf.Finish()    if is_new or m.CurrentBranch is None:      if not m.StartBranch('default'):        print('fatal: cannot create default in manifest', file=sys.stderr)        sys.exit(1)

  Init类的成员函数_SyncManifest会克隆一个仓库,流程一般如下: git init--->git fetch--->git checkout branch_name 。对应的Project类成员函数就是_InitGitDir,Sync_NetworkHalf,Sync_LocalHalf,是不是很熟悉,跟克隆repo仓库的流程是一样的,其实repo仓库、manifest仓库、projects仓库集这些仓库克隆出来的方式是一样的。

class Project(object):   ...  def _InitGitDir(self, mirror_git=None, force_sync=False):    init_git_dir = not os.path.exists(self.gitdir)    init_obj_dir = not os.path.exists(self.objdir)    try:      # Initialize the bare repository, which contains all of the objects.      if init_obj_dir:        os.makedirs(self.objdir)        self.bare_objdir.init()     ...

  InitGitDir,初始化的仓库为manifest.git,manifest目录下的.git仓库是manifest的复制品,通过Project类的成员函数_InitWorkTree创建。接着再说类_GitGetByExec,_GitGetByExec的对象bare_objdir封装了操作仓库的命令。比如git init。但是却找不到成员函数init,原来成员函数init是动态定义的。关键的地方就在于_GitGetByExec类的成员函数_getattr

_GitGetByExec类的成员函数getattr代码如下:

class Project(object):   ...  class _GitGetByExec(object):   ...    def __getattr__(self, name):      ...      name = name.replace('_', '-')      def runner(*args, **kwargs):        cmdv = []        config = kwargs.pop('config', None)        ...        if config is not None:          if not git_require((1, 7, 2)):            ...          for k, v in config.items():            cmdv.append('-c')            cmdv.append('%s=%s' % (k, v))        cmdv.append(name)        cmdv.extend(args)        p = GitCommand(self._project,                       cmdv,                       bare=self._bare,                       gitdir=self._gitdir,                       capture_stdout=True,                       capture_stderr=True)        if p.Wait() != 0:          ...        r = p.stdout        try:          r = r.decode('utf-8')        except AttributeError:          pass        if r.endswith('\n') and r.index('\n') == len(r) - 1:          return r[:-1]        return r      return runner

  runner闭包用来处理调用者提供的参数,比如bare_git.describe(project.GetRevisionId())中的”project.GetRevisionId()”,对应的git命令就是 git describe args
所以_GitGetByExec类通过成员函数getattr可以向工厂一样生产一些执行git命令的成员函数。既然仓库已经初始化好了,那么接下来就是fetch仓库了。

Sync_NetworkHalf成员函数的代码如下:

class Project(object):    ...   def Sync_NetworkHalf(self,                       quiet=False,                       is_new=None,                       current_branch_only=False,                       force_sync=False,                       clone_bundle=True,                       no_tags=False,                       archive=False,                       optimized_fetch=False,                       prune=False):    ...    if (need_to_fetch and        not self._RemoteFetch(initial=is_new, quiet=quiet, alt_dir=alt_dir,                              current_branch_only=current_branch_only,                              no_tags=no_tags, prune=prune, depth=depth)):      return False    if self.worktree:      self._InitMRef()    else:      self._InitMirrorHead()      try:        os.remove(os.path.join(self.gitdir, 'FETCH_HEAD'))      except OSError:        pass    return True

  Project类Sync_NetworkHalf方法调用_RemoteFetch方法实现了从远程仓库fetch记录到本地仓库,_RemoteFetch函数其实是”git fetch”命令的封装。

Sync_LocalHalf成员函数的代码如下:

class Project(object):    ...  def Sync_LocalHalf(self, syncbuf, force_sync=False):    """Perform only the local IO portion of the sync process.       Network access is not required.    """    self._InitWorkTree(force_sync=force_sync)    ...    revid = self.GetRevisionId(all_refs)    def _doff():      self._FastForward(revid)      self._CopyAndLinkFiles()    head = self.work_git.GetHead()    ...    if branch is None or syncbuf.detach_head:      # Currently on a detached HEAD.  The user is assumed to      # not have any local modifications worth worrying about.      #      ...      if head == revid:        # No changes; don't do anything further.        # Except if the head needs to be detached        #        if not syncbuf.detach_head:          # The copy/linkfile config may have changed.          self._CopyAndLinkFiles()          return      else:        lost = self._revlist(not_rev(revid), HEAD)        if lost:          syncbuf.info(self, "discarding %d commits", len(lost))      try:        self._Checkout(revid, quiet=True)      except GitError as e:        syncbuf.fail(self, e)        return      self._CopyAndLinkFiles()      return    if head == revid:      # No changes; don't do anything further.      #      # The copy/linkfile config may have changed.      self._CopyAndLinkFiles()      return    branch = self.GetBranch(branch)    if not branch.LocalMerge:      # The current branch has no tracking configuration.      # Jump off it to a detached HEAD.      #      ...      try:        self._Checkout(revid, quiet=True)      except GitError as e:        syncbuf.fail(self, e)        return      self._CopyAndLinkFiles()      return    upstream_gain = self._revlist(not_rev(HEAD), revid)    pub = self.WasPublished(branch.name, all_refs)    if pub:      not_merged = self._revlist(not_rev(revid), pub)      if not_merged:        ...        return      elif pub == head:        # All published commits are merged, and thus we are a        # strict subset.  We can fast-forward safely.        #        syncbuf.later1(self, _doff)        return    # Examine the local commits not in the remote.  Find the    # last one attributed to this user, if any.    #    local_changes = self._revlist(not_rev(revid), HEAD, format='%H %ce')    last_mine = None    cnt_mine = 0    for commit in local_changes:      commit_id, committer_email = commit.decode('utf-8').split(' ', 1)      if committer_email == self.UserEmail:        last_mine = commit_id        cnt_mine += 1    if not upstream_gain and cnt_mine == len(local_changes):      return    ...    branch.remote = self.GetRemote(self.remote.name)    if not ID_RE.match(self.revisionExpr):      # in case of manifest sync the revisionExpr might be a SHA1      branch.merge = self.revisionExpr      if not branch.merge.startswith('refs/'):        branch.merge = R_HEADS + branch.merge    branch.Save()    if cnt_mine > 0 and self.rebase:      def _dorebase():        self._Rebase(upstream='%s^1' % last_mine, onto=revid)        self._CopyAndLinkFiles()      syncbuf.later2(self, _dorebase)    elif local_changes:      try:        self._ResetHard(revid)        self._CopyAndLinkFiles()      except GitError as e:        syncbuf.fail(self, e)        return    else:      syncbuf.later1(self, _doff)

  Project类的成员函数Sync_LocalHalf内部流程较为复杂,这里我们只讲checkout到一个干净的分支。

  • _InitWorkTree成员函数:初始化manifest工作目录下的.git仓库
  • _Checkout成员函数:通过”git checkout”切换分支。

_LinkManifest成员函数的代码如下:

class Init(InteractiveCommand, MirrorSafeCommand):    ...  def _LinkManifest(self, name):    if not name:      print('fatal: manifest name (-m) is required.', file=sys.stderr)      sys.exit(1)    try:      self.manifest.Link(name)    except ManifestParseError as e:      print("fatal: manifest '%s' not available" % name, file=sys.stderr)      print('fatal: %s' % str(e), file=sys.stderr)      sys.exit(1)

  成员函数_LinkManifest最终会调用os.symlink,创建manifest工作目录下default.xml的链接文件manifest.xml到 .repo目录下,这样方便访问manifest.xml文件

Projects仓库集

  执行完repo init就获取到了repo仓库和manifest仓库了,接下来就要通过manifest.xml链接文件中的AOSP各个项目的元数据,获取projects仓库集。先来看看其内容:

<?xml version="1.0" encoding="UTF-8"?>  <manifest>    <remote  name="aosp"             fetch=".."             review="https://android-review.googlesource.com/" />    <default revision="refs/tags/android-4.2_r1"             remote="aosp"             sync-j="4" />    <project path="build" name="platform/build" >      <copyfile src="core/root.mk" dest="Makefile" />    </project>    <project path="abi/cpp" name="platform/abi/cpp" />    <project path="bionic" name="platform/bionic" />    ......  </manifest> 

  想了解更多的manifest.xml可以查看.repo/repo/docs/manifest-format.txt。这里我们只做简单了解
manifest.xml定义了四种标签:

  • remote:该标签描述的是远程仓库信息。其中的fetch值相当于url路径的前缀。就好比 git@github.com:HawksJamesf/blog.git 中的 git@github.com: ,标明了服务器。属性review用于code review服务器地址。
  • default:属性revision表明了AOSP项目使用的开发分支,属性sync-j表明了拉取代码时使用的cpu核数
  • project: 描述了相对于远程仓库的位置和相对于AOSP根目录的目录名字。比如想要获取build仓库,远程仓库的url为 https://android.googlesource.com/platform,那么该仓库的对应的远程仓库的url就是 https://android.googlesource.com/platform/build
  • copyfile: 属性src为某个文件在远程仓库的位置,属性dest为本地仓库的位置。

  执行 repo sync -j 8 命令行,流程就跟执行repo init是一样的,到了main模块里面的_Repo类的成员函数_Run调用cmd.Execute(copts, cargs)这个风水岭,才会执行属于sync模块的代码。但是关于copts,cargs参数的如何获取,我们还得先看cmd.OptionParser.parse_args(argv)。

_Options成员函数的代码如下:

class Sync(Command, MirrorSafeCommand):  ...  def _Options(self, p, show_smart=True):    try:      self.jobs = self.manifest.default.sync_j    except ManifestParseError:      self.jobs = 1    ...    p.add_option('-l', '--local-only',                 dest='local_only', action='store_true',                 help="only update working tree, don't fetch")    p.add_option('-n', '--network-only',                 dest='network_only', action='store_true',                 help="fetch only, don't update working tree")     ...    p.add_option('-m', '--manifest-name',                 dest='manifest_name',                 help='temporary manifest to use for this sync', metavar='NAME.xml')    ...    p.add_option('-u', '--manifest-server-username', action='store',                 dest='manifest_server_username',                 help='username to authenticate with the manifest server')    p.add_option('-p', '--manifest-server-password', action='store',                 dest='manifest_server_password',                 help='password to authenticate with the manifest server')    p.add_option('--fetch-submodules',                 dest='fetch_submodules', action='store_true',                 help='fetch submodules from server')    ...    if show_smart:      p.add_option('-s', '--smart-sync',                   dest='smart_sync', action='store_true',                   help='smart sync using manifest from the latest known good build')      p.add_option('-t', '--smart-tag',                   dest='smart_tag', action='store',                   help='smart sync using manifest from a known tag')    g = p.add_option_group('repo Version options')    g.add_option('--no-repo-verify',                 dest='no_repo_verify', action='store_true',                 help='do not verify repo source code')    g.add_option('--repo-upgraded',                 dest='repo_upgraded', action='store_true',                 help=SUPPRESS_HELP)

  在分析repo init时已经说用,Command的衍生类override成员函数_Options,才能得到独有的OptionParser。还记得 repo command line --->optparse--->git command line这个流程吗 ? 每种命令都有其对应的OptionParser,这样才能做到各个命令模块有自己处理repo command line的逻辑。紧接着把解析后的值传给成员函数Execute。

Execute成员函数的代码如下:

class Sync(Command, MirrorSafeCommand):  ...  def Execute(self, opt, args):    ...    manifest_name = opt.manifest_name    ...    rp = self.manifest.repoProject    rp.PreSync()    mp = self.manifest.manifestProject    mp.PreSync()    ...    if not opt.local_only:      mp.Sync_NetworkHalf(quiet=opt.quiet,                          current_branch_only=opt.current_branch_only,                          no_tags=opt.no_tags,                          optimized_fetch=opt.optimized_fetch)    if mp.HasChanges:      syncbuf = SyncBuffer(mp.config)      mp.Sync_LocalHalf(syncbuf)      if not syncbuf.Finish():        sys.exit(1)      self._ReloadManifest(manifest_name)      if opt.jobs is None:        self.jobs = self.manifest.default.sync_j    ...    all_projects = self.GetProjects(args,                                    missing_ok=True,                                    submodules_ok=opt.fetch_submodules)    self._fetch_times = _FetchTimes(self.manifest)    if not opt.local_only:      to_fetch = []      now = time.time()      if _ONE_DAY_S <= (now - rp.LastFetch):        to_fetch.append(rp)      to_fetch.extend(all_projects)      to_fetch.sort(key=self._fetch_times.Get, reverse=True)      fetched = self._Fetch(to_fetch, opt)      _PostRepoFetch(rp, opt.no_repo_verify)      if opt.network_only:        # bail out now; the rest touches the working tree        return      # Iteratively fetch missing and/or nested unregistered submodules      previously_missing_set = set()      while True:        ...        all_projects = self.GetProjects(args,                                        missing_ok=True,                                        submodules_ok=opt.fetch_submodules)        missing = []        for project in all_projects:          if project.gitdir not in fetched:            missing.append(project)        if not missing:          break        # Stop us from non-stopped fetching actually-missing repos: If set of        # missing repos has not been changed from last fetch, we break.        missing_set = set(p.name for p in missing)        if previously_missing_set == missing_set:          break        previously_missing_set = missing_set        fetched.update(self._Fetch(missing, opt))    ...    if self.UpdateProjectList():      sys.exit(1)    ...    for project in all_projects:      ...      if project.worktree:        project.Sync_LocalHalf(syncbuf, force_sync=opt.force_sync)    ...    ...

  前期会有一些更新检查repo仓库和manifest仓库的工作,后期就会拉去projects仓库集,那么下面我们就来粗糙的理解执行成员函数Execute的流程:

  • 一开始获取manifest仓库和repo仓库的对象,然后都会调用之前分析过的PreSync,如果opt.local_only不存在,就会调用Sync_NetworkHalf成员函数更新manifest仓库。紧接着如果manifest本地仓库相对于远程仓库有变化,就会调用Sync_LocalHalf做一些merge或者rebase操作,然后调用ReloadManifest成员函数重新从manifest.xml载入数据到对象。
  • 接下来就是_Fetch成员函数,其中除了manifest仓库,其他的仓库都会更新。如果开启的合数大于1的话就会创建新的线程,防止主线程阻塞,其中关于线程同步机制可以查看这里。而获取projects仓库集,主要使用的是Project类的成员函数Sync_NetworkHalf,接着调用_PostRepoFetch函数判断repo仓库是否有变化,如果是,则调用Sync_LocalHalf成员函数做一些merge或者rebase操作。
  • 紧接着通过GetProjects成员函数从 .repo/projects目录下得到AOSP所有仓库对象的列表,不包括repo仓库和manifest仓库,这样就可以调用Sync_LocalHalf成员函数。如果参数args指定了具体的仓库(即repo sync project_name),那么GetProjects成员函数就只能得到指定的仓库。获取仓库的方式有两种:一种_GetProjectByPath,另一种是GetProjectsWithName。

  至此,repo仓库、manifest仓库、projects仓库集连同其对应的工作目录都已经初始化完成。那么是不是就可以开始开发呢 ? 其实接下来还要为AOSP项目的仓库创建一个新的开发分支,只有这样我们才能够在分支上面提交、上传自己的代码。也只有这样当我们发布了一个版本就可以给这个版本打上一个tag。当项目可以量产时,就可以在这个分支基础上创建出一个量产分支,并上传到服务器。

创建分支

repo start master --all 其实挺好理解这个命令行的,就是 git checkout -b name 命令行的批量操作。留个坑,以后填吧。

class Start(Command):  ...  def _Options(self, p):    p.add_option('--all',                 dest='all', action='store_true',                 help='begin branch in all projects')  def Execute(self, opt, args):    if not args:      self.Usage()    nb = args[0]    if not git.check_ref_format('heads/%s' % nb):      print("error: '%s' is not a valid name" % nb, file=sys.stderr)      sys.exit(1)    err = []    projects = []    if not opt.all:      projects = args[1:]      if len(projects) < 1:        projects = ['.',]  # start it in the local project by default    all_projects = self.GetProjects(projects,                                    missing_ok=bool(self.gitc_manifest))    # This must happen after we find all_projects, since GetProjects may need    # the local directory, which will disappear once we save the GITC manifest.    if self.gitc_manifest:      gitc_projects = self.GetProjects(projects, manifest=self.gitc_manifest,                                       missing_ok=True)      for project in gitc_projects:        if project.old_revision:          project.already_synced = True        else:          project.already_synced = False          project.old_revision = project.revisionExpr        project.revisionExpr = None      # Save the GITC manifest.      gitc_utils.save_manifest(self.gitc_manifest)      # Make sure we have a valid CWD      if not os.path.exists(os.getcwd()):        os.chdir(self.manifest.topdir)    pm = Progress('Starting %s' % nb, len(all_projects))    for project in all_projects:      pm.update()      if self.gitc_manifest:        gitc_project = self.gitc_manifest.paths[project.relpath]        # Sync projects that have not been opened.        if not gitc_project.already_synced:          proj_localdir = os.path.join(self.gitc_manifest.gitc_client_dir,                                       project.relpath)          project.worktree = proj_localdir          if not os.path.exists(proj_localdir):            os.makedirs(proj_localdir)          project.Sync_NetworkHalf()          sync_buf = SyncBuffer(self.manifest.manifestProject.config)          project.Sync_LocalHalf(sync_buf)          project.revisionId = gitc_project.old_revision      # If the current revision is a specific SHA1 then we can't push back      # to it; so substitute with dest_branch if defined, or with manifest      # default revision instead.      branch_merge = ''      if IsId(project.revisionExpr):        if project.dest_branch:          branch_merge = project.dest_branch        else:          branch_merge = self.manifest.default.revisionExpr      if not project.StartBranch(nb, branch_merge=branch_merge):        err.append(project)    pm.end()    if err:      for p in err:        print("error: %s/: cannot start %s" % (p.relpath, nb),              file=sys.stderr)      sys.exit(1)

4.Reference

Android Open Source Project
repo仓库源码
清华大学提供的AOSP
Android源代码仓库及其管理工具Repo分析

0 0
原创粉丝点击